[About Sandia]
[Unique Solutions]
[Working With Us]
[Contacting Us]
[News Center]
[Search]
[Home]
[navigation panel]
[Surety for the 21st Century]
Surety Solutions for the 21st Century
[Sandia National Laboratories]

Surety home page

Proceedings home page

Speaker Biographies

Surety Science and Engineering Workshop
Proceedings

Surety Science and Engineering Methodology for Developing New Surety Programs
Dr. Pace VanDevender
Sandia Surety Leadership Team, and
Chief Information Officer
Sandia National Laboratories

(Slide 1)

Thank you and good morning. This is a workshop in the true sense of the word. It’s also an experiment in the true sense of the word…

(Slide 2)

I’m delighted to take you through some of the methodology of Surety Science and Engineering. I’d like to remind you that surety has three components: reliability in normal circumstances, safety in abnormal and security in use control in malevolent circumstances. All of these need to be a system.

(Slide 3)

As we talked to people who are experts in many disciplines, we found that we were talking about the same thing, and in fact surety could have a common methodology if we were smart enough to deduce it. In that deduction we realized what we needed in surety was something that is the equivalent of the simple machines and mechanical design. If you have the concepts of the screw, the inclined plain, the lever, the pulley, etc., firmly in your mind, it lets you invent more complex machines. The same discoveries occur in physics. Once you have mastered the concepts of mass, distance, velocity, momentum, and energy, then you have the conceptual framework for solving much more complex kinetics problems.

Our goal in this workshop is to define the equivalent of those concepts for the complexities of Surety Science and Engineering. Those equivalent concepts are the four levels and eight approaches. My purpose is to discuss those eight approaches and how they help define and illuminate the four levels for application later today.

(Slide 4)

This discipline is applicable at various levels of aggregation of complexity. For instance, in the defense establishment, there’s a hierarchy starting with the Armed Services. Below that is weapon platforms, weapon systems, subsystems, and components. Surety can be applied to each of those levels. Determining where the opportunity for change and improvement is the first real step. That requires judgement on the next best target for improving overall surety.

It’s also generally true the higher the degree of aggregation, the more things have to come together to make a change. If you want to make a fundamental change to the defense establishment, take a deep breath. If you want to take a fundamental change to a component, you have a much better chance. Determining the level of the opportunity is the first task for discretion and judgement.

(Slide 5)

Joan has already mentioned that surety comes in four levels:

  1. working sufficiently as expected and then buying insurance to cover the upsets,
  2. surety by proactive human intervention,

  3. surety by positive measures from science and engineering

  4. and last, surety from laws of nature and mathematics, relying on the laws to the extent possible.

Now, in fact, everything relies on the laws of nature and mathematics. Everything also has some human element in it. The value added by this four-level view is to ask: Where is the centroid of concern, or the centroid of vulnerability or opportunity? Does it rely mostly with the humans controlling the system? Or, when it really matters, does it rely on built-in science and engineering, with perhaps no humans? Have we eliminated those ambiguities of engineered positive measures and instead relied in design principles on the fundamental laws of nature?

(Slide 6)

I would like you to take out this sheet to make some notes on it. This is an interactive workshop. As I go through the levels and the approaches to them, think of your own work area. If the level or approach stimulates a new idea or an option, write it down. These notes will be part of the substance that you’ll be contributing later on this afternoon. The worksheet has the levels of surety on the left hand side, the eight approaches that are matched to those levels and help define them in the middle column, and then your own example to be noted on the right hand side.

(Slide 7)

Level 1. Working sufficiently as expected and buying insurance to cover the upsets. These are systems designed for liable operation. There are no special considerations for off-normal conditions. You insure as a normal business cost against most of the upsets. It’s a fairly reactive response to mitigate the consequences that may occur, and you rely on foresight to insure safety. The usual examples are most industrial applications. Now that doesn’t demean them. Level 1 surety is not Level 0 surety, as it takes a lot of effort to do this right. Highway safety is an example, in the sense that we license teenagers–and I used to be one–essentially for life, with just a intermediate eye check at later times in life. Fix that level of surety in your mind.

(Slide 8)

There are two approaches within Level 1. The first is reliance on foresight of designers and the good practices of people. This is the world of warrantees and insurance. The design and manufacturing of consumer products is appropriate for this regime. I was somewhat to our surprise to find that nuclear nonproliferation in India and Pakistan was de facto Level 1 surety when we thought it was much higher–and inappropriately so. That illustrates mistakes at Level 1.

(Slide 9)

Level 1 has another approach and now I want you to start registering some views. Please reach under your chair and pull out a Newton [computer]. This is a dialog, you see. I’m going to say something, you’re going to say something through your Newton, and we’ll all see how it came out. This way I can see if we’re on track…In this part I’ll be showing you two recommendations and, under recommendation 1 and 2, you will tell me the degree to which that recommendation illustrates the particular approach in the title.

[Discussion of recommendations for Level 1, Approach 2.]

(Slide 10)

Approach 2 is mitigation after the fact by coordinated emergency response and correcting what went wrong. This is the world of response teams, of investigations, of lessons learned and of corrective action. Within that framework there are two recommendations. The first pertains to investigations of airline and nuclear reactor incidents and accidents, and the retrofit of units or systems to correct failures. The second is school security in the sense of kids killing kids and the reaction to that in this last year. [(Slide 11) Interactive voting, see Results]

(Slide 12)

Level 2 surety is fundamentally different. At Level 2 surety we’re talking about surety by proactive human intervention. That means a system is designed with continual human actions to help insure safety. It requires people cognizant and especially adapted for safety purposes. A plan is in place relying upon human actions to control the environment for operations. The plan is to perform operations reliably, and to respond in the case of emergency. Most aircraft safety and most military operations fall in this regime. The consequences are too high for Level 1 surety. The attributes of Level 2 surety have been differentiated in the same sense as, say, the Malcolm Baldrige Quality Award attributes. There are defined (Slide 13) attributes to make progress. The (Slide 14) ways to improve with rigor within Level 2 and increase the level of surety without changing levels…

(Slide 15)

Approach No. 3. Surety is maintained by proper operation with thorough science-based understanding, independent assessment, and continuous improvement. It is a much more focused, dedicated effort than Level 1. This is the world of validated databases of computer simulations, of extensive, continual training in simulations, of systemic analysis and predictive understanding. [Discussion of recommendations for Level 2, Approach 3.] Recommendation 1. The design-deploy-fix style of debugging of software.
Recommendation 2. Continual simulator training and flight requalification of airline pilots.

[(Slide 16) Interactive voting, see Results] Recommendation 2 shows much more positive than the vote on recommendation 1 for, indeed, the continual simulator training and flight requalification of airline pilots is the correct one.

(Slide 17)

Approach 4. [Discussion of recommendations for Level 2, Approach 4.] Here administrative controls reduce the probability of deleterious environments occurring. This is the world of the person-in-the-loop, of preventive action, control systems, diagnostics all aimed at prevention of the occurrence. To what degree are these are based on Approach 4? [(Slide 18) Interactive voting, see Results]

I see we’ve got quite a few airline travelers who recognize that recommendation No. 2, x-ray screening and metal screening at the airports, does rely on administrative control of personnel to insure that system works…

(Slide 19)

Approach 4, that is preventing the occurrence from happening actually spans two levels because Level 2 often has problems and a process sometimes can take too long, response times are too long and people make errors.

Now I need your help here, I need some kind of fast response, and this is a verbal response, so will you give me a fast response? [Audience: Yes.] There are three questions in this. [Pure white slide.] What color do you see? [Audience: White.] What do cows drink? [Audience: Milk.] Is that right? [Audience: No.]

Why in the world would we say cows drink milk? It’s because the brain takes about 3 seconds to go from being miscued into giving the right answer. Smart people who are quick don’t let the brain process that 3 seconds to find they’re being miscued and that’s part of the problem with Level 2–when it counts people make errors.

In that sense then Level 3 is surety by positive measures from science and engineering, which is partly is why we’re at the National Academy of Engineering.

(Slide 20)

At Level 3, engineering and scientific measures are in place to control the environment for the operation, to ensure reliable performance, and to respond in case of emergency. We’re moving into higher consequence endeavors. That is not to say that airline crashes are low consequence events but, from a social impact, nuclear reactors, our ballistic missile defense, self-healing communication routers to insure reliability of our telecommunication infrastructure, and nuclear weapons without modern safety features have huge consequences. Level 3 surety–science and positive measures, characteristically handles them.

(Slide 21)

As described in the White Paper, there are, as in Level 2, these attributes in Level 3:

  • predictability,

  • range of effectiveness,

  • a theme and reliance on principles,

  • design implementation

  • environmental controls,

  • and emergency response.

Here an analogy with the quality award is important. Surety is in its infancy compared to quality in its deployment throughout industry. You may remember when we talked about "quality costs." After a while, "quality is free." And then finally, "quality pays." We hope, in ten years, we will see that there was a time when "surety cost." And then, "surety is free." And then finally, "surety pays." A Surety Award is something that we might well consider.

(Slide 22: Varying manifestations of Level III attributes define three sublevels)

The approach at 4.5 is aimed at reducing the probability of deleterious environment occurring. This includes engineered controls similar to lower levels, but these are automated, autonomous, preventive action control systems, and diagnostics aimed at prevention. [(Slide 23) Discussion of recommendations for Level 3, Approach 4.5.]

[(Slide 24) Interactive voting on Level 3, Approach 5.]

[(Slide 25) Discussion of recommendations for Level 3, Approach 5.]

[(Slide 26) Discussion of recommendations for Level 3, Approach 5.]

As you see, we have a great diversity of opinion on this because the measures do have something to do with both. On the one hand, in the automated breathalyzer and alcohol blood monitors that enable someone to start a car, it is engineering controls that reduce a probability of a deleterious environment occurring–that is a drunk on the highway. On the other hand, you can also see that, in order to keep a drunk off the highway, all relevant positive measures must work–the blood alcohol monitor has to work. In the case of one intercept in a ballistic missile defense system, it is an engineered approach to reduce the probability of deleterious environment occurring. Since you have one shot at it, everything must work. It is a one-layer defense.

You’ll find that many of the surety approaches can satisfy both. In this case we see the better example is "one intercept ballistic missile defense," as opposed to the subsequent approach, where only one of many positive measures is necessary for success.

(Slide 27)

Approach 6. [Discussion of voting for "Only one of many positive measures is necessary for success. Gas, air, compression, and spark in internal combustion engine versus Rec. 2. Multi-tier ballistic missile defense"]

Of course, Approach 5 and Approach 6 represent the additional surety acquired by having a multiple capability so that a breach of one does not constitute a move in the direction of deleterious consequences. Not all cases can you afford the multiple independent actions.

(Slide 27)

[Discussion of voting results]

(Slide 28)

(Slide 29)

Approach No. 6. Let’s do it graphically. This represents Internet connectivity or coolant and loss-of-coolant systems for nuclear reactors. There are multiple paths from start to Mission Success, and only one has to work at a time. Thus, in addition to the barrier kind of model that we had before, the multiple parallel paths to independent success are conceptually the same model.

(Slide 30)

Approach 7 is substantively different from previous approaches in that you have cumulative comparative adaptive positive measures. It is cognitively different. There is an event or process that has an input and output, with a comparator that monitors what happens and then intervenes to insure that the output is what should happen given the input, to some extent regardless of what happens internally to the process.

[Discussion of voting for " Space shuttle computers voting to assure 2 of the 3 give same answer before acting" versus "Redundant components for reliability"]

[(Slide 31)Discussion of Results]

(Slide 32)

Level 3 can also have problems. Designs age or are flawed; software has bugs; hardware fails; sequences unfold at unexpected and escalating ways. Therefore it’s a great advantage if you can minimize to the extent possible your reliance on these details of the science and engineering.

(Slide 33)

Level 4 surety is reliance to the extent possible on the laws of nature and mathematics in your design. You tailor the parameter space to minimize any ambiguities. The long-term goal of Surety Science and Engineering is indeed to rely to the extent possible only on the laws of nature and mathematics. But it’s hard. Flawless foresight is difficult to achieve. There are sublevels from a first deployment in which the intent is to shape the allowed parameter space bounded by laws of nature and mathematics to an ideal of absolute surety. Level 4 is not absolute surety: it is the reliance to the extent possible on nature and mathematics and because times change a periodic surety assessment is performed to uncover any new vulnerabilities.

(Slide 34)

The attributes of Level 4 surety are reliance upon the laws of nature and mathematics as your centroid. It’s a principles-based design to approach the physical impossibility of undesired consequences. Then continuous assessment strives for absolute surety–continuous assessment must be in place. Approach 8 relies as much as possible on the laws of nature to approach physical impossibility in high consequence systems.

(Slide 35)

The parameter space of Approach 8 can be diagrammed by using physics, chemistry, and material science to bound the permitted operation so that the untoward or high consequences are precluded because they are outside of that bound.

[Voting on "Anti-lock brakes" versus "Hang glider air foil that becomes a parachute instead of stalling"]

(Slide 36)

In this case the overwhelming majority but not everyone saw that recommendation 2– which is the hang glider air flow becoming a parachute instead of stalling, by changing the design so that the stall is precluded from the area of operation–would be an approach of Approach 8. The antilock brakes are intended to be predictable, a cumulative comparative adaptive positive measure in which the braking system senses that it’s about to fail and lock and then pumps the brake automatically.

(Slide 37)

We’ve talked about hit-and-miss all the way through presentation. Now, without trying to bias the work of reactor operations, I and a few of my friends–none of whom are reactor operators–took a look at what the public would see as reactor safety. Our intent was to show how these approaches and the corresponding level of surety would play against reactor safety. For instance:

  • under good practices, at Level 1–a standard operating procedure

  • under Approach 2 a conduct of operations–to mitigate and to fold back into the operation for that event the lessons learned

  • at Level 2, Approach 3, proactive understanding–brings in the eyes of the outsider, assessments and emergency operational exercises with proactive human intervention tied to continuous operation

  • the prevention by administrative controls–watchers watching the watchers

  • the prevention by engineering controls–active independent parallel coolant system for loss of coolant accident, automated in this case and autonomous

  • all positive measures must succeed–the passive independent cooling system and a loss of coolant accident

  • one of many positive measures must succeed–cost effective IMEMS (integrated microelectromechanical machine system) based on a strong link/weak link system to assure predictable operation in such a fashion that Chernobyl could not occur because those systems could not have been bypassed

  • cumulative comparative adaptive–using new technology like the integrated microelectromechanical machines for sensor actuator systems to monitor the state of health and automatically adapt the system to maintain a high surety condition, a technology that is not being deployed today

  • the laws of nature and mathematics–passively self-safing reactor dynamics of any kind so that if temperature excursions occurred then the neutron dynamics would passively cool this system

(Slide 38)

We are in fact going beyond our best practices to create Surety Science and Engineering through this workshop.

(Slide 39)

This slide summarizes the levels and approaches that we have been discussing. When I talk to people from reliability, from security, and from safety, I found we were all talking about the same things, and that there could be a common set of approaches for single strategy to address them all as a system. That is the challenge of this workshop.

 

Let me introduce Jim Rice. I’ve had the pleasure of working with the 60 or so people on the Sandia Surety Leadership Team, whose work I’ve had the pleasure to present to you today. I have a new assignment as the Chief Information Officer of Sandia and therefore I’m pleased to introduce my successor. We both have the same hairline. This is Jim Rice, who will be taking over. I hope you enjoy working with him as much as I have. Thank you.



Back to top of page

Questions and Comments || Acknowledgment and Disclaimer