Plug & Play?

The challenge of deploying commercial off the shelf (COTS) products into Australian railways.

Business transformation in response to changing customer demands and technological improvement is often deeply challenging. Organisations like rail operators will frequently leverage the use of Commercial Off The Shelf (COTS) products as integral components of transformational change.

It is commonly thought that a proven design will help shorten the transitional period and simplify the process. However, complex systems like railways do not lend themselves well to the introduction of COTS products without significant effort in integration.

A “simple” COTS implementation is often made difficult in rail organisations as the interfaces and limitations within the organisation’s systems are not well understood. In response to complexity, it becomes necessary to provide easily digestible and representative visualisations of the system. Put simply – complex systems must be understood and represented simply before any change can be successfully deployed.

We propose the use of the SHELL model (see section 5) to increase the ability of an organisation to more completely and holistically understand their current state, and to model the impact of a proposed change in operations.

Introduction

COTS products are regularly lauded as a panacea for organisations. Lower CAPEX and OPEX costs, increased support, reliability, and maintainability [1] [2] are among the common justifications given by proponents and supporters alike. The potential benefits outlined by manufacturers will regularly venture into the almost mythical ‘cheaper, faster, better’ territory [3]. However, the introduction of COTS products into a complex and oftentimes complicated system such as those found in railways across the globe is rarely a low-cost, low-effort process.

There are myriad reasons behind this observation. Railways are not sufficiently like one another that a COTS product will be ‘plug and play’. Network designs are incredibly diverse across the globe, internal governance and local processes are sufficiently variable that assumptions made by COTS designers may be inappropriate or would force a change to core business design. Industrial relations issues may also be such that the COTS product cannot function as intended or may even introduce unintended complexity or error producing conditions.

Perhaps primary above all others is the reality that railways have (and continue to) evolve to meet local and time-critical requirements. These are oftentimes technical problems, funding issues, the prevailing political environment, or related to customer demand.

Solutions are therefore centred around solving the immediate problem quickly, and the drive for whole-of-system thinking is low or, for practical purposes, non-existent. In simple terms, business development is regularly driven by externality, and planning is limited to evolutionary change to existing systems. Exacerbating the outcomes of this drive towards evolutionary change is the reality that disciplines within railways have been traditionally siloed, with expertise, knowledge, and authority de-centralised and dispersed.

To Purchase, Develop or Design?

Railways have, until recently, tended to develop and design their solutions in house. The major railways across Australia have many examples of custom solutions that are in service to this day. Indeed, the various signalling and gauge standards across the country are simply the most visible indicator that solutions are customised between states and railways.

The drive to improve safety and increase capacity within Australia’s existing networks is creating pressure to move to state-of-the-art systems like the European Train Control System (ETCS) [4]. A significant feature of these systems is that, according to manufacturers, they are practically turnkey, and hardware is interoperable amongst the major suppliers.

They are also extremely complex, requiring significant design and development investment. This means that the only viable option for ETCS systems in Australia are COTS products. The rail industry is therefore at a generational crossroads. For the first time in recent memory, revolutionary change is set to occur, and the only viable option for railways is to rely on COTS installations.

COTS projects in Australia face a fundamental challenge. The drive to utilise solutions that promise to reduce upfront costs and to import expertise is understandable. However, COTS products often deliver inferior performance compared to a bespoke design [4]. This will significantly impact the cost-benefit equation, and will ultimately increase the CAPEX for the organisation, assuming that further investment is required to realise the promised outcomes.

In addition, the acceptance of the COTS product by the operator or maintainer represent a significant OPEX risk, as costs related to operational change and ‘knock-on’ effects are rarely attributed to the central causal factor (major changes caused by a new system) and are absorbed as much as is practical. Plainly put, given the risk involved with changes that involve COTS products, it is wise to be wary of claims and to prepare an organisation as far as is practical.

Doing COTS Correctly – Knowledge From Other Contexts

The literature available regarding COTS installations is relatively substantial. Government acquisition and purchasing policy changes across the globe have driven the move to preferential treatment of COTS products in several jurisdictions [5] [6]. This has, in turn, driven the increase in material produced and available regarding best-practice for COTS product usage, the pros and cons of COTS, and of specific experiences within industries [5] [2] [6]. Given the relative ease with which software solutions can be integrated, the experience with COTS software is significantly more mature than with hardware or mixed-mode integrations.

A large segment of the analysis of COTS software utilisation has focused on the technical issues and conflicts that necessarily arise when introducing a “foreign” sub-system. Issues like technical (computer) interfaces, software reliability, the ability to limit control and influence of software are covered in detail. Findings gravitate towards a similar conclusion – that an organisation should use previously proven COTS products, that they should not use COTS for safety critical issues, that COTS products should be treated with a degree of suspicion and be actively secured, and that COTS installations are not as simple as they appear on first glance [5] [6].

Within Australia, COTS products have somewhat of a chequered history. Government agencies are increasingly depending on COTS products in the realm of Information Communications Technology (ICT), with these products ranging in specialisation from HR systems to Project and Program Management applications.

It is of little doubt most rail workers in Australia currently work with COTS products like iOS or Android, SAP or the ubiquitous Windows. These are relatively successful implementations of COTS products into a business, though the work and cost experienced by IT departments to ensure security and usability is non-trivial [6]. Those that worked through the transition to computerisation would also note that business norms were radically changed in response to the technology. Again, a non-trivial cost to organisations across the globe.

There are numerous examples of COTS product implementations in the ICT field that have run over-budget and/or not delivered on the promised functionality [7] [8] [9]. Unfortunately, these failures are rarely publicised, and lessons are therefore learned repeatedly, often at public expense.

However, there are notable exceptions to this rule. The Victorian Department of Justice and Community Safety (DJCS) rolled out the Victorian Infringements Enforcement Warrant (VIEW) system in 2017 [10]. This was in direct response to the need to change the agencies technological capability to satisfy regulatory change made by the Victorian government in 2014. This was a notably short lead time for major change. It was also notable in that business development was being driven by external stimuli, rather than a planned internal change process.

At launch, the system had 5% of its expected functionality, had overrun budget by nearly $80 million, and was not expected to provide full functionality into the future. The Victorian Auditor-General’s Office investigated the VIEW project and found that there were multiple failures in the project that ultimately lead to its failure. These included [10]:

Failures of internal governance
Lack of adequate project and technical expertise
Ineffective oversight
Operating as a functionally uninformed buyer
Bias based on experience with poorly executed custom software projects
Inadequate understanding of the organisation itself and of the needs for the system
Poor vendor evaluation
Settling for a solution that was not fit for purpose
Lack of due diligence over vendors
Procuring an outdated solution
Failure to manage risks
Project management issues
Loss of focus on intended benefits
Contractor conflict of interests
Failure to manage increasing customisation
Misleading and overly optimistic reporting.

This list is reminiscent of the causes noted in most accident investigations across industries. Indeed, the list of issues could be lifted from many famous fatal accidents in history.

Of note in the case of VIEW, is that the system did not need to meaningfully integrate and interface with ‘real world’ equipment. It did not control (or monitor) safety-critical hardware, nor was there significant need for the interface with end-users to be intrinsically safe and intuitive, and environmental challenges were non-existent. It was, in effect, an automated replacement for what would have been an extremely large workforce undertaking administrative tasks. This is in stark contrast to the rail environment, in which all these challenges are present and must be controlled for.

COTS implementation in the rail industry is a significantly greater challenge than found in the ICT realm [11]. Railways are dependent on the effective interface between humans and technology to achieve their objective of reliable and safe operation. Railways, clearly, face significant challenges when implementing COTS systems – greater than those in most other contexts.

Simply Complicated

The drivers behind the generic push to utilise COTS products, as previously covered, are the potential for reduced costs, need for specialised expertise, reduced resource usage, and increased support, reliability and maintainability [12]. In short, the COTS choice would appear to make the complicated and resource intensive process of design and delivery become simple – spend money to import proven technology and expertise, rather than investing heavily to (hopefully) complete the work in-house.

However, creating simplicity is rarely an inexpensive or painless proposition.

Railways are necessarily complex systems [13]. They are characterised by the intricate collaboration of many highly trained specialists, the utilisation of complex and sophisticated equipment, and the collective design and technology knowledge of several hundred years of engineering technology. That rail works to the degree of precision that is experienced in Australia is no mean feat.

That said, the railways of the nation are complicated by several historical factors [14] that have impacted design and interoperability across the country. Not least of which being the governance of the nation before federation, leading to the infamous gauge differences across the continent.

Even within railways, design choices made over a hundred years ago will often determine the limits for design choices in the current context. Sydney, as an example, is still electrified using 1500v DC [15]. This was a decision made in the early 20^th century, ostensibly because of contemporary popularity around the world. In a similar vein, any equipment and systems that are installed are likely to be in active usage for at least 20 years, with design impacts likely to reverberate throughout the network for many years beyond that. The ATRICS [16] system in use in Sydney is a prime example of this, having been in place for over 20 years at the time of writing, with an expected lifespan well into the middle of the 21^st century.

Understanding that design decisions made in the current context will have lasting impacts, and that there has been an exponential increase in the complexity of system designs has changed the norms through which rail projects are managed and controlled. In response to the increasing complexity of the challenges faced, the System Engineering (SE), System Assurance (SA) and Human Factors (HF) disciplines have increasingly come to the fore in project work. This has increased the quality and safety of the industry, but there are trade-offs that justify acknowledgment.

Safety-criticality causes the rail industry to focus on assessment and assurance of products to a significantly higher level than for consumer or normal business requirements. Cost is therefore somewhat ‘hidden’ in rail-specific COTS integration projects [11] [12], as assurance is seen as a “cost of business” in rail.

Assurance is an excellent method to determine that a product can and does satisfy a requirement. However, it cannot readily determine whether the proposed method is the best possible option, nor does it look at how the product will fit into the wider ‘eco-system’ that the product inhibits. This is where SE and HF are necessary. SE to ensure that the technical design is maximally efficient and capable, HF to design the interfaces with the system such that the system is easy to use correctly, and difficult to use erroneously.

HF has been in use in complex operating environments like rail, aviation and medicine for many decades. There are consequently many extremely useful models that have been devised to help to describe and visualise systems. One such model is the “SHELL” model. First conceived in the 1970s, it was further developed in the 1980s into the representation below [13].

The SHELL Model

The SHELL model was originally designed and utilised to aid in the understanding of complex systems, like aviation. Though it was not an accident investigation tool, per se, the understanding of complex systems that the model provides was useful in providing accurate and simple causative statements.

It also allowed practitioners of the relatively new discipline of HF to portray complex relationships within a system in a visual manner. In effect, the model allowed HF practitioners to share a language with other technical disciplines, and to bring the theoretical ideas of HF into the practical realm. The model has been used in research within the rail industry, though it is not in widespread use [19].

The SHELL model (see Figure 1) shows that a system is comprised of five component and necessary parts. These are the “Software”, “Hardware”, “Environment”, and two separate but equally important “Liveware”. Each of these components exist both as independent entities and as elements that are necessarily dependent upon the other parts of the system. That is, a system’s hardware will stand alone in any point in time, but for hardware to be of use to the greater system, the other component parts are necessary, and will impact on the hardware’s design and operation over time.

It is useful to define the components of the SHELL model, as the understanding of the component pieces is integral to the utility of the model.

“Software” is the part of the system that is generically interacted with cognitively. This can range from computer software, to internal processes and procedure, to the unwritten norms ingrained in a system (“how we do things around here”).

“Hardware” refers to the components of the system that are interacted with physically. This can range from the touch points within a train cab, to high voltage switching equipment, to the civil engineering plant used to maintain the railroad. In simple terms, Hardware is the things that users must touch and physically use for the system to work.

“Environment” is the element of the system that is partly self-explanatory – the local set of climatic, geographic, and built environment conditions that exist despite the system’s existence. However, the system will often also have created environments, like Rail Operations Centres, in which there is almost total environmental control. There is also a third component of environment, the political and societal environment in which the system must operate. Environment therefore consists of a wide variety of inputs both in and out of direct control of the operator.

“Liveware” is (although admittedly awkwardly named) the people that comprise and operate the system. The model shows that there are at least two groups of liveware in any system. Those at the centre of the system that are expected to exert control in a given circumstance, and those that are peripheral to the situation that are expected to support and provide needed information, guidance and input for the system to function correctly. An apt example of the two would be a driver being central to the driving task, with signallers providing essential information and guidance.

It is perhaps more subtle, though of extreme importance, that each component of the system is designed to ‘fit’ the central user. That is, the design of the system is such that the user’s requirements are central, and other components of the system should be modified to fit their needs, rather than the user being expected to modify their behaviour to suit system irregularity or oddity.

In other words, hardware should be designed such that it suits user needs and expectations, software should be intuitive and error tolerant, the environment should be conducive to sustaining the performance of the people using the system, and the people using the system should be fit and capable to perform their duties, and they should be able to effectively communicate to ensure performance.

There are everyday examples of the failure to design using the SHELL model, with car manufacturers being amongst the more visible offenders. Most drivers that use multiple vehicles have experienced the windscreen wiper controls being placed on the left-hand side of a car’s steering wheel in Asian vehicles, and on the right (incorrect) side of the wheel on European cars.

The reasons for this choice of design are simple – European manufacturers are catering for their much larger left-hand drive markets. They simply utilise the left-hand drive steering assembly on their correct hand drive vehicle to save cost in design and development. They can use existing components to reduce cost and time to market, and any development in the larger market items will be backwards compatible. It is, in essence, an in-house COTS choice. The consequence of this design choice is usually limited to an embarrassing ‘wave’ of the wipers at a roundabout, which is likely to cause no damage beyond a hit to the driver’s ego. However, the consequences in a more safety critical environment are open to the imagination.

How to Find & Fix the Railway’s ‘Wipers’

It would be folly to assume that railways did not have their own version of the car wiper problem. Indeed, in a system with so many moving and interrelated parts, it is likely that there are a great number of them latent and waiting for the correct set of circumstances to cause an issue.

In order to find the latent issues within a system it is necessary to have a full and accurate understanding of the current state of play. It is necessary to both ‘turn over the rocks’ and understand why they were placed where they were. The euphemistic ‘kicking of the tyres’, just as for buying a used car, is not enough to meaningfully comprehend the number and magnitude of the problems present.

One of the findings regarding the failure of the VIEW project was the lack of understanding of the organisation and the wider system in which they proposed to implement a COTS product. The DJCS plainly did not understand the current state of their organisation and those they wished to interface with. It follows that they could not predict the impact of the COTS product implementation in any meaningful way.

The lessons learned from this example serve as a prescient and free warning for the railways around Australia as they begin to bring COTS products into their systems at an increasing pace. A lack of current system knowledge and organisational preparation before the attempt to introduce a COTS product increases the likelihood of cost overrun, negative system impacts, and ultimately project failure.

In plain terms, it is becoming absolutely necessary for a railway to “know thyself”. Although the ancient Greek aphorism was focussed on the individual, it is no less useful and true for a complex system.

The ability to clearly “know” a complex railway will circumvent a vast number of the issues commonly found with COTS product integration. Indeed, a clear and correct understanding of the railway and the underlying systems will be of clear and material benefit to the railway regardless of product integration. Decisions and dependencies will be made more obvious to management and technical staff. Similarly, issues that are latent within the system will be increasingly visible to the organisation, allowing them to be treated before they have a negative impact.

To be able to direct the necessary resources to undertake what is ultimately a relatively low-cost but high benefit exercise, is a luxurious position to occupy in the current environment of generational change. Though a cogent argument could be made that it is a vital one before any attempt to integrate complex COTS systems.

We propose that one of the most effective ways that a railway might come to know thyself is utilising the SHELL model.

Coming Out of the SHELL – Modelling the Railway

The SHELL model, as discussed above, provides a useful and simple basis for the shared understanding of complex systems. The use of a common language for the analysis – Software, Hardware, Environment, Liveware (this is commonly modified to “People” in practice) allows for both technical and non-technical staff to be meaningfully involved in analysis.

At its most simplistic, a SHELL analysis can be undertaken using basic materials – a whiteboard or butcher’s paper.

Undertaken in much the same way as a directed brainstorm, participants will nominate the components of the system that the central user/s requires to undertake their task. Once the lists are complete, the interfaces and relationships that are critical can be nominated, as can those that are known to be deficient or under performing.

If the analysis is designed to generate a picture of the current state of the system, then the interfaces, systems, and relationships identified as requiring remediation can be treated and improved using the organisation’s processes. Periodic re-runs of the SHELL analysis will provide feedback as to the success of the work to rectify or modify the issues originally identified.

Should the analysis be designed to understand the impact of a change, then a supplementary SHELL analysis must take place. In this analysis, the future and intermediate (should there be one) states need to be modelled and understood by the people impacted. Outputs of the analysis that predict issues are then placed into the appropriate register/s for the change or project teams to rectify or address. Again, supplementary analyses will provide assurance that the project has addressed the issues raised/predicted in the initial SHELL work.

It is important to have representatives from a wide range of the sections of the railway, as each will have a unique and insightful perspective on the issues at hand. This practice also promotes knowledge sharing and management. Customer experience staff may become aware of interactions or impacts between their department and the drivers and signallers, because they are responsible for the organisational focus of on time performance. Technical staff might come to better understand the impact hardware choices have on the maintenance workload for the railway. The outcomes are as variable as the problem that is being analysed.

As with all models, the fidelity with which an organisation or individual chooses to map out their system is variable. A railway could choose to understand their system wholistically, remaining at a strategic level, or they could delve as far as mapping out the individual bit interfaces between computer systems. Neither is wrong, and each have their purpose. In fact, it is necessary to have many layers of understanding of the system so that decisions can be made with appropriate governance and strategic context.

However, there is a constant within the SHELL analysis. The user in the centre of the model must be very well defined, unchanging within an analysis, and understood by all people that are part of the analysis team.

For strategic analysis, it may be appropriate to include whole sections of a railway. For example, treating guards as though a uniform group could facilitate more effective and efficient analysis. Strategic analysis is usually hampered by focus on detail, making the wider focus appropriate. However, treating whole groups as one is constrained by the need to assume uniform behaviour – this must be known and controlled for in the analysis.

For tactical analyses, those characterised by the need to understand the current state or potential impact of a change to one or more subsystem, a single user (or defined role) must be in the centre of the model. There may be a temptation to include a group, e.g., all ‘signallers’, regardless of role description in an analysis, but this would be a mistake. Each role performs different tasks and has different inputs and expectations that are unique to their experience.

For detailed analysis, it is advantageous that the analysis focusses on a single user and a single task or action. This will help to ensure that the analysis remains focussed at the correct level of detail. If the analysis is not focussed on the singular, the analysis is unlikely to gather useful information, as the interaction between subsystems will remain undefined and changeable.

It bears noting that the definition of strategic, tactical and detailed are fluid. They exist on a spectrum and are invariably linked to the style and needs of the organisation in which they are conducted. In a similar vein, it is almost always necessary to run multiple SHELL analyses with different users in the centre box.

Indeed, the likelihood is that the majority of the groups/roles/individuals that make up the left-hand side Liveware box will benefit from being the subject of a SHELL analysis in the centre. The interfaces between each person and the system are different, and the best way to determine the difference is to analyse independently. In complex systems, it may also become useful to display the interactions between the various SHELL analyses, as there are predictable contemporaneous interactions.

For the installation of COTS products, the necessary fidelity for which a model is produced will change depending on the position in the life-cycle of the project

During the project ideation phase, an understanding of the ‘bones’ of the system is necessary. That is, major components of the system, such as the hardware used for signalling, rolling stock, power supply, rail maintenance etc., must be modelled. The major interactions between each subsystem should also be understood. Depending on the complexity of the incumbent system, this would be undertaken as a strategic analysis, with the option to investigate the ‘complicated’ sections in a more focussed way.

For example, a change to a signalling system such as that seen in the current move to ETCS systems is likely to have wide ranging impacts. This is despite the signalling system being a relatively small component of the vast number of systems that are necessary for a modern railway to function.

Utilising the SHELL model, it will become immediately apparent that there is more than one direct user impacted by a change to this standard. Each of these users would therefore be modelled individually to best understand the impacts of the proposed change. Indeed, the technical issues can and do become vast and complex with signalling upgrades.

An analysis focussed on train drivers may uncover issues related to in-cab systems, operational procedures for safe working, emergency procedures, or efficiency concerns. An analysis focussed on maintenance staff might instead focus on increased of changes to workload, WHS related concerns, changed communication protocols, or concern regarding the need for greater skill, knowledge and training of existing and new maintenance staff. Signallers are likely to focus on the changes to the mental model required for their work, the loss of experience and changes to their procedures, and potential issues related to interfaces with a new signalling system.

Even at the ideation stage there clearly exists a wide range of impacts that must be known, understood, and catered for. Analysis of the situation will both uncover these issues and allow for a shared language between strategic management and the technical staff that will be directly affected.

When the project is in the requirement definition stage, a more detailed analysis must take place. The SHELL model will provide the model and language in that each of the five components are included and understood. However, each component piece must be analysed in greater detail, and interactions be described, verified with SMEs, and knock-on effects noted.

Returning to the signalling example, maintenance staff might note that the current signalling hardware requires redesign or replacement as there are awkward lifts involved and that work at height is required regularly to replace signal aspects damaged by vandalism. They might also note that the interlocking hardware is situated in a poor working environment that is awkward and difficult to work in causing errors in their work. The analysis may also reveal that there is a strained working relationship between signallers and maintenance due to the frequency with which maintenance related issues impact the smooth running of the railway.

This information is invaluable, as requirements can be devised to correct these issues, and organisational process and design can be modified to improve understanding and communication between the teams responsible. It is much simpler to treat the core issues that one is aware of, opposed to treating symptoms blindly.

The integration and testing phase of the project is likely the final opportunity for meaningful changes before project implementation. A task-level analysis can be undertaken at this point to ensure that system components and interfaces are well understood, and that impact at the individual level is modelled and controls are in place.

There are clearly opportunities at each phase of a project, and outside of active change, for a railway to better “know” itself and to embrace this knowledge as a core competence for the organisation.

Into the Future

COTS is, as has been explored and experienced by other industries in the country, a double-edged sword. With it comes the opportunity for simplicity and rapid change, financial savings, and greater overall capacity for the system. However, there exists the real risk that “cheaper, faster, better” will devolve into the precise opposite. The control available to organisations to counter this risk is simple – they must know thyself. We propose that the most effective way that this is achieved is to utilise the SHELL model and the associated shared language of “Software, Hardware, Environment and Liveware”, that allows both technical and non-technical staff to visualise, understand, and contribute to analysis, decision making, and organisational knowledge.

Justin Drinkwater | Senior Consultant

References:

[1] D. J. Slford, “The Problem with Aviation COTS,” Aviation Review Quarterly , pp. 297-304, Summer 1999.

[2] Data & Analysis Center for Software, “Commercial-Off-The-Shelf (COTS): A Survey,” Data & Analysis Center for Software, New York, 2000.

[3] R. L. Dillon and P. M. Madsen, “Faster-Better-Cheaper Projects: Too Much Risk or Overreaction to Perceived Failure?,” IEEE TRANSACTIONS ON ENGINEERING MANAGEMENT, vol. 62, no. 2, pp. 141-149, 2015.

[4] J. Behesti and J. Dupuis, “Problems with COTS Software: A Case Study,” in Proceedings of the Annual Conference of CAIS, 2013.

[5] J. Gansler and W. Lucyshyn, “COMMERCIAL-OFF-THE-SHELF (COTS): Doing it Right,” Center for Public Policy and Private Enterprise – School of Public Policy, 2008.

[6] P. Lindsay and G. Smith, “Safety Assurance of Commercial-Off-The-Shelf Software,” The University of Queensland – School of Information Technology, 2000.

[7] R. Chesterman, “Queensland Health Payroll System Commission of Inquiry,” Queensland Health Payroll System Commission of Inquiry, 2013.

[8] Committee of Public Accounts, “The dismantled National Programme for IT in the NHS,” House of Commons, London, 2013.

[9] The Treasury, “Independent Review of the Modernising Business Registers Program,” Australian Government, 2023. [Online]. Available: https://treasury.gov.au/review/independent-review-mbr-program. [Accessed 16 May 2023].

[10] Victorian Auditor General’s Office, “Implementing a New Infringement Management System,” Victorian Government Printer, Melbourne, 2021.

[11] A. Clough, “Commercial Off-The-Shelf (COTS) Hardware and Software for Train Control Applications: System Safety Considerations,” U.S. Deparatment of Transportation, Cambridge, 2003.

[12] Research and Technology Organization North Atlantic Treaty Organization, “Commercial Off-the-Shelf Products in Defence Applications “The Ruthless Pursuit of COTS”,” in Information Systems Technology Panel (IST) Symposium, Brussels, 2000.

Plug & Play?

Expertise

For more information