Sandia LabNews

Sandians can't let Y2K stop nuclear detonation detection


Image of y2k_pix

If anybody is motivated to deal with Year 2000 problems, it’s the Sandians working on the US Nuclear Detonation Detection System. They have to be. If the system stops working, the US loses its ability to use satellites to monitor for nuclear test ban treaty violations.

"ICADS is a good example of what we do," says John Williams, Manager of Satellite Data Processing Dept. 5707. He explains that ICADS (Integrated Correlation and Display System) takes data from satellite-based sensors and, by way of a ground-based data-processing system, turns it into information for the Air Force. "Along with our other Sandia-supplied systems like the Advanced Radiation Detection Data Unit (ARDU) and the Ground Nuclear Detonation Detection System Terminal (GNT), it’s part of the nation’s Integrated Tactical Warning and Attack Assessment (ITWAA) network."

In addition to developing the next-generation ICADS, GNT, and ARDU systems, Sandia maintains the current hardware and software – about 1.5 million lines of code that must continue working problem-free through the year 2000 rollover. "We’ve put significant effort into making sure that our current systems are Y2K-compliant," says John. The Y2K part of the work is costing about $450,000 this year. He says the Air Force customer directed Y2K costs to be tracked even though there’s no additional funding for Y2K problems.

The year 2000 is a problem when a two-digit field has been used in a computer to represent a year, such as 4/10/98. Does 4/10/00 then represent 1900 or 2000? Such a simple uncertainty, and the possibly erroneous calculations that result, can cause a computer or its software to stop working or, worse, give wrong answers without warning.

Because Sandia has regularly been doing system updates, the project team (consisting of staff in 5700, 6500, 2600, and at local contractors Tech Reps and Applied Physics) is in a much better position than someone trying to repair an old system that hasn’t been maintained for years. "We didn’t need to do a lot of code overhaul," says Walt Huebner (5707). "We looked first at interfaces to see if there were date-related problems. You might call it going outside in, from the exterior to the interior of the software."

As the original designers of the software, the Sandians knew where to look for problems. But testing was a necessity, says Walt: "Even if we felt sure something would work, we tested it anyway."

Besides being good Y2K remediation practice, testing is an Air Force requirement. The Sandia team has to prepare a certification package. Once the Air Force has certified the system, the team will receive a certification number to attest to success in Y2K testing. The Air Force will also attempt to test all ITWAA network systems together at a mission level, says Walt, but doing it that way is proving difficult and may not be possible.

In the case of ICADS, GNT, and ARDU, testing is done with a simulator that generates data to mimic the downlinks from all satellites in the constellation. The testers can roll the system date forward and set an event for, say, a minute before the critical time, at the time, and a minute afterward. Besides the transition from 1999 to 2000, the testers looked at cases such as 9/9/99 (sometimes used by software designers to indicate "never expires"), Feb. 29, 2000 (2000 is one of the "exception" centuries that is a leap year), and Oct. 10, 2000 (the first post-1999 date with 10 characters when written in the form 2000-10-10).

Commercial software tested, too

Beyond the Sandia-written code, the Sandia team has to consider commercial software that makes up part of the overall system. For example, commercial products are being used for the computer operating system, graphics, database, problem-report tracking, configuration management, and several other purposes. "It would be a big problem if we needed to make an emergency change in 2000 and configuration management wasn’t working," says Walt. Even if a software vendor assured its customers of Y2K compliance, the Sandians checked it themselves.

For ICADS, GNT, and ARDU, analysis and testing have revealed four problem areas in the Sandia code and two in commercial software. The biggest problem, says Walt, was the commercially supplied operating system used in these systems. Some operating system functions did not handle the year 2000 well. The vendor’s initial answer was that the operating system simply would not be supported past Dec. 31, 1999. However, the Sandians continued negotiating and were able to get an agreement that the vendor would fix the problems and continue supporting the operating system.

Most of the problems discovered in the Sandia systems were printouts and monitor displays, says Walt. That seems typical, judging from briefings he has seen. Even though a display problem would not halt the system, the Air Force considers a fix to be important. The system is operated by enlisted personnel, and there is high turnover among them. Although the personnel could be trained to properly interpret or convert the displays, each new person would require a new round of training. A system fix is the preferable solution.

Repair or terminate?

Again and again, says Walt, commercial software has caused problems for other groups doing Air Force work. "At one briefing, a program reported $25 million in Y2K costs. They had hundreds of commercial off-the-shelf products to deal with." In some cases, the Air Force has decided that its best course is simply to terminate a program because the fix would be too expensive. Such decisions are taken seriously, says John, explaining that usually general-officer level commanders in the Pentagon have to direct such a termination.

Walt and John’s experience convinces them that something, somewhere, probably something out of their control, will go wrong. Will it be a telephone line? A communication satellite carrying data from Australia to the US? Something else? By making sure the Sandia part of the overall system is working, says Walt, "we’re trying to minimize what could happen."

The scale of the Y2K problem is partly an effect of an immovable deadline. At any given "normal" time, says Walt, five to eight percent of the software in the US is undergoing change. For a complete Y2K fix, all the software is involved.

The Air Force also recognizes that things can go wrong despite efforts at fixes. There must be contingency plans. The Air Force requires that all Y2K certification testing be completed by Oct. 1, 1998. What if, during the testing, Sandia’s software proves to have a problem, or a problem is exposed when Jan. 1, 2000, rolls around?

"Our contingency plan is basically that our group stops work on everything else and fixes the problem," says Walt. That’s the price of being sure the US can detect a nuclear detonation.