Spacecraft reliability modeling is plagued by data scarcity and lack of data applicability. Systems tend to be one-of-a-kind, and observed failures tend to be the result of systemic defects or human errors, instead of component failures. The result is too often a gap between two extreme estimating approaches: probabilistic risk assessments (PRA) that are component-based lead to optimistic estimates by ignoring system-level failure modes; while history-based failure frequencies can lead to pessimistic estimates by neglecting non-homogeneity (between vehicles and vehicle configurations), reliability growth, and improvements in design. The problem of non-homogeneity is often considered solved once a system has a sufficiently long history. But in reality, rarely can tens of launches be considered samples of the same probability distribution. Launch vehicles undergo design changes in their history; more accurate estimates of reliability need to account for the risk introduced by design changes and for two types of reliability growth: growth of a given system via systematic tracking, assessing, and correction of the causes of failure uncovered in flights; and general technological or knowledge growth over subsequent generations of the system. Using the interesting history of the Centaur upper stage as an example, this paper proposes a pragmatic approach for the estimation of reliability growth over successive flights and configurations, which is applicable to any system with a history of several tens of flights. First considering the Centaur history as a single family, the paper compares the total success frequency to the `instantaneous' success frequency over intervals of increasing flight number. This analysis shows that as a result of the reliability growth experienced by Centaur, the total success frequency underestimates the risk of the first Centaur launches by a factor of almost 10, and overestimates the risk of the last Centaur launches by a factor of more than 3. But a closer analysis of Centaur history reveals that a number of failures were the results of design changes, as the stage design was improved or adapted for flight on new launch vehicle models. Understanding the risk introduced by design changes is important in the use of historical failure data as a surrogate for new systems. The second part of the paper shows that the `interval' growth curve of the Centaur family is the average of distinct growth curves for each configuration. Over a given flight interval, the average success frequency can underestimate the risk of the newest generation of Centaur, and overestimate that of the older operating Centaur, by a factor of 2 to 5. The net result is that after almost 200 flights, the most reliable Centaur presented 10 times less risk than suggested by the total failure frequency, and 100 times less risk than the initial launches. Thus the `mature' reliability was close to typical values generated by some bottom-up PRAs; but it was reached only after a long flight experience and the character of the residual failures is different. The authors hope that the practical approach presented in this paper can be of use to the industry in bridging the gap between forecasts based solely on historical failure frequencies and the results of component-based PRAs; and that it can foster a better understanding of the uncertainty bounds associated with various estimation methods, generally improving the relevance of reliability estimates to the problems faced by launch program decision makers.
The efficient development of a highly reliable system, such as a new crew launch vehicle, cannot afford to ignore the lessons of history. A number of interesting studies of launch vehicle failures provide very valuable, albeit qualitative "lessons learned" on measures that a risk-informed program should take. If schedule and funds were unlimited, a very intensive and exhaustive test program would be the course to follow before the first flight of a new launcher. But when a program is faced with stringent schedule and cost constraints, it needs to optimize its test planning so as to meet constraints without sacrificing safety. Making such trade-offs intelligently requires having a way to quantify the relationship between the initial unreliability of a system, and the array of risk-mitigating measures on hand. This paper proposes several analysis steps beyond the existing studies of historical launch vehicle failures, which can form the basis for quantifying the lessons of history. Firstly, risk cannot be quantified accurately by summing all failures across history, because systems were not exposed to the same design deficiencies at each flight. Early failures typically represent sources of high risk, which are eliminated by corrective actions after the early flights, while late failures are often indicative of low-risk, design deficiencies that remain present for many flights. Thus failures occurring in the early launches of a system actually represent more risk than failures occurring later in history. Quantifying historical risk properly requires taking into account the reality of reliability growth. Secondly, knowing what failed in the past does not provide direct guidance as to how to reduce the risk of a new design. Of utmost relevance are the kinds of measures that could have prevented the failures in the first place. Simplistically put, knowing that the majority of launch vehicle failures originated in propulsion systems is of limited use to designers and managers, who already pay tremendous attention to that central subsystem. By contrast, a quantification of the potential risk reduction possible by submitting an engine to stress testing, for example, could be valuable in supporting the cost and schedule trade-offs that decision makers are unavoidably faced with. This paper proposes a method for re-considering the failures of historical launchers in that new light and illustrates its application to two historical examples, the Ariane and Centaur systems. The results provide an approximate quantification of the risk reduction potentially offered by improvements in areas such as: sufficient flight-like testing at the system level; definition of, and testing for, margins that consider all phases of flight, including not only steady-state but also transient conditions; stress testing and testing for variability at the component and engine levels; analysis of the results of every single flight with an eye towards uncovering design defects: "post-success investigations" re-examination of the margins of all components and systems (including software) and re-qualification after every single change in design, configuration, or mission profile; and maintenance of very rigorous levels of electrical and cabling parts control, quality assurance and contamination control in all phases of manufacturing, assembly and launch operations. The authors hope that the techniques and insights presented in this paper can be of use to the aerospace industry as it embarks on the flight certification program for the next-generation crewed launcher.
The historical success and failure record of launch vehicles clearly demonstrates the presence of reliability growth over successive launches. The reality of reliability growth is critical to decisions on ground and flight testing programs, and is a much greater driver of the expected number of failures over a campaign, than the best analysis of mature reliability can ever be. While mathematical models exist that match the reliability growth demonstrated by historical systems, the space industry is still lacking a practical method to develop forecasts of reliability growth for new systems, update those forecasts on the basis of early tests and flight results, and accurately estimate integrated campaign metrics over several launches. Modeling the failure probability as originating from potential “defects” in the system, each with a probability of trigger, a conditional probability of causing loss of mission, and a probability of detection and correction, provides a starting place to address this need. The method provides a model of reliability growth that is mathematically sound, matches historical results, is directly amenable to system engineering inputs, clearly identifies and quantifies the drivers of reliability growth, and provides a clear basis for uncertainty analysis and Bayesian updating.
Under its Constellation program, NASA is preparing to send humans back to the Moon with a flight system architecture very similar to Apollo. However, changes since the Apollo era have taken place in the areas of risk and safety, crew comfort and composition, longterm program goals, programmatic constraints, and state of technology. Considering these changes together, a new design approach is warranted to put early emphasis on operational concept development and the design implications of human factors. Given the complexity of the Constellation system of systems being anticipated, the allocation and transfer of control authority among multiple Constellation systems is of particular importance. This paper proposes a modeling approach to integrate various fields of humanand system-centered expertise and analyze control authority from the viewpoint of operations and human factors to demonstrate the needs and benefits this modeling approach.
In order to return humans to the Moon, the constellation program will be required to operate a complex network of humans and spacecraft in several locations. This requires an early look at how decision-making authority will be allocated and transferred between humans and computers, for each of the many decision steps required for the various mission phases. This paper presents an overview of such a control authority analysis, along with an example based upon a lunar outpost deployment scenario. The results illustrate how choosing an optimal control authority architecture can serve to significantly reduce overall mission risk, when applied early in the design process.
This paper addresses the importance of considering the initial reliability and reliability growth as opposed to only the mature risk estimate when making relative comparisons among developmental launch vehicle (LV) alternatives and introduces the current model used to perform this type of analysis. Probabilistic risk assessments (PRA) often focus on modeling the mature state of a system under consideration; however, in the aerospace field of LV design such an assessment can be dangerously misleading. Due to the low flight rate, a given LV may never reach maturity prior to retirement and will fly mostly in an immature state. The historical record of early LV flights suggests a risk posture well above the mature estimate predicted through the standard PRA approach. Thus, any decision based upon the mature estimates may be significantly different than a decision based upon the predicted risk during the bulk of its useful life while it is still maturing. In order to make an informed decision about the relative merits of competing LV architectures, decision makers must consider not only the mature system risk, but also the reliability growth for the system along the path to maturity. The current model described in this paper uses a reliability growth methodology, which has expanded the scope of risk influencing factors and has been able to provided Loss of Mission (LOM) and Loss of Crew (LOC) risk estimates for over 20 LVs in a period of less than two months. The advantage of employing such a methodology to conceptual LV de signs is that it enables a more realistic estimate of campaign success during early flights without the need for detailed de sign information. This model captures the reality that element heritage and maturity are more important to early flight success than first order component reliability calculations while yielding valuable insights for designers of future vehicles.
Orbiting Astronomical Satellite for Investigating Stellar Systems (OASIS) is a mission concept being developed in preparation for the 2021 MidEX Announcement of Opportunity. This paper describes the key features of the OASIS architecture as they are currently understood. OASIS's choice of a large inflatable primary reflector results in large collection areas at very high mass efficiency enabling the science mission. We describe the spacecraft bus, based on Northrop Grumman's LEOstar-2, and the receiver, a heritage design based on the GUSTO balloon heterodyne system. We also discuss the observing strategy and pointing requirements from its planned L1 location. Particular emphasis is placed on challenges to the design, such as momentum management, balancing consumable mass allocations, thermal management, and testing.