In this interview, we will delve into a recent train disaster in India and discuss how it can be prevented and if there are any lessons learned for Industry4.0 solutions providers.
Arti: Welcome Akshay! As an Engineer, an industry veteran with over 30 years in a variety of industries, CTO for Kovair, former Chief Architect at Siemens, and former Industry analyst from Gartner, how could Indian Railways use Industry4.0 solutions to solve the train disaster from early June 2023, as an example? How can any Industrial IoT and Industry4.0 solutions provider learn from this?
Akshay: Thanks, Arti; glad to be here! Let’s first define Industry 4.0
Industry 4.0 is the information-intensive transformation of manufacturing (and related industries) in a connected environment of big data, people, processes, services, systems, and IoT-enabled industrial assets with the generation, leverage, and utilization of actionable data and information as a way and means to realize smart industry and ecosystems of industrial innovation and collaboration.
On Industry 4.0 and the different Industrial Revolutions, I believe we have gone from the Steam Engine to the Combustion Engine, to the Cloud-based Search Engine, to now the IoT-enabled Automated Workflow and AI-based Value Engine.
Now regarding Indian Railways, as we have seen in all the major news outlets, “India train crash: More than 275 dead and injured! Odisha incident!” My heartfelt condolences to all those affected!
But now the engineer in me is appalled to hear that a triple train collision occurred, where one passenger train is thought to have derailed, hitting a goods or freight train and then being struck by another on the adjacent track late on Friday, June 2nd, 2023. It’s been reported the 3rd train crashed into the 2nd and 1st derailed trains due to an electronics issue with the control systems. It’s been reported that the Safety System, the “Kavach,” was non-operational, and the signaling system was not properly working. Shouldn’t this have been detected earlier and other signaling methods used? If malicious inputs were inputted by a rogue engineer into the electronic interlocking system, why wasn’t this verified, reported in real-time, and prevented? How can this not be over-ruled or over-written by a validation system if it’s to lead a train into a parked goods or freight train? It’s been reported the passenger train was moving at roughly 128 km/h and should not have been redirected to the loop line because not only it couldn’t support a train at that speed, but especially because the freight train was there. A real-time validation system with IoT sensors providing real-time data was needed. We have autonomous cars and even autonomous locomotives with LIDAR and other technologies; why aren’t these technologies leveraged?
To solve this, specifically, and for leaders in Industry4.0 settings generically, individual Root Cause Analyses can be done and captured for various issues in the projects. Formal hand-offs are needed to be seamlessly integrated, and siloed root cause analysis can now be performed at a more holistic viewpoint of Retrospective Analysis to see if any macro trends are occurring. Once the root cause is determined, I would recommend a new over-arching platform occur with centralized governance!
On a personal level, in the early 90s, I was a Manager of Networks Engineering at Bell Canada and had transferred to Nortel (BNR), where Bell had a major stake in it at the time, to work on Model-based Engineering solutions: ObjecTime and Telelogic/DOORS both which became IBM Rational Suite, and was a part-time facilitator of RCA: Root Cause Analysis, with over 50 sessions held.
Later, in 1995, after joining Telelogic/DOORS, I advised ABB Signal in meetings in Sweden, and Elmira, NY, Bombardier in Montreal, and Adtranz Westinghouse in Pittsburgh, PA (many of these companies are now part of Alstom) on the use of Telelogic’s SDT toolset for signaling systems in train control systems. The automated people-mover in Newark Airport is a result of that work. That same year I also advised Boeing, the FAA’s ARINC, on fly-by-wire standards in the avionics industry, leading to ARINC 629 Databus, in use in the Boeing 777 and the new 777X. In addition, I also advised Ericsson, Motorola, Nortel, and Lucent on cellular standards like GSM, as well as chip companies like National Semiconductor on what became DECT cordless phones and Intersil on what became 802.11 or WiFi.
Now, Telelogic’s SDT tools provide SDL-based model-based simulation and code generation as well ISO 9646 test case generation in a language called TTCN: Tree and Tabular Combined Notation for unit-level, integration-level, and systems-level conformance testing, simulation functions, but also real-time analysis using what’s known as State-Space Exploration to determine unknown issues, like deadlock detection, buffer overflow issues or race condition detection. These are the hairy bugs that you can’t detect from unit-level testing, as it requires a holistic analysis of the communicating systems.
Incidentally, regarding train control systems, I looked into what’s been done recently using the SDL/TTCN systems that we worked on in the mid to late ’90s. Fortunately, there’s updated information on what we developed, where these authors: M. Banci, M. Becucci, A. Fantechi, E. Spinicci, published an article entitled: “Validation Coverage for a Component-based SDL model of a Railway Signaling System,” Electronic Notes in Theoretical Computer Science, Volume 116, 2005, using General Electric’s Transportation Systems. I also noticed a more recent publication on the use of TTCN for conformance testing of train systems, which was published in 2016, in the article entitled: “Validation tool of functional safety for train control secure software” by J.-G. Hwang & H.-J. Jo Korea Railroad Research Institute (KRRI), Korea.
I believe we need to get back to basics with Model-based Systems Engineering tools, simulation solutions, test management systems, and for operational systems, real-time monitoring, controls, and validation systems in place, with real-time IoT sensor data provided and policy-based governance in place.
On the Indian Railways disaster, whether it is the lack of the Kavach Safety System, non-functional or erratic signaling, or the latest root cause in the media, incorrect electronic inter-lock components being erroneously or maliciously configured, which implies some non-conformant behavior, either way, a thorough analysis is needed of the entire train control systems, communications systems, and dare I say the engineers in control, and what they were doing, and how they obtained their jobs, and training, with a platform of policy controls, real-time monitoring and validation, enforcement, and governance be put in place.
As CTO of a solutions provider (Kovair.com) to Indian Railways, CRIS: Centre for Railway Information System, it is somewhat ironic we just hired new engineers from Bhubaneswar who would take that train to Kolkata, en route through Balasore, Odisha, where the train disaster occurred, and it’s ironic many of them will work on integrated solutions with Kovair and our partners like IBM’s Rational toolset, Siemens Teamcenter and much more, to prevent these disasters!
Arti: Akshay, Wow, I hope your co-workers are all right. How can clients embark on Root Cause Analysis and Retrospective Analysis?
Akshay: Good Question, Arti, and thankfully our co-workers are all right!
I would recommend we get back to rigorous analysis, design, testing, and operations management with processes, methods, tools, and education for all those involved and apply RCA (Root Cause Analysis) techniques.
On testing, my perspective is the entire systems need to be re-assessed, including the newer Vande Bharat high-speed trains, and generically for all mission-critical settings, I would recommend the following:
- Unit-Level Conformance Testing: Unit-level and Component level testing, which addresses functional compliance at the unit or component level.
- Integration Testing: Addresses interoperability between the components that are integrated.
- Systems Testing: Addresses conformance and interoperability of the entire system.
- Protocol Testing: Bandwidth, conformance, performance, real-time aspects (latency, queuing delays, deadlock detection, prioritization/pre-emption issues, to resiliency testing.
- User-centric testing: Acceptance testing, UI testing, etc.
- Security testing: Addresses vulnerabilities in the system.
Additionally, Test Management systems are needed to document the test cases exercised and defects discovered, which can then be tracked, monitored, and managed.
Tools from established vendors like IBM and start-ups like Kovair are needed with complete Product Lifecycle Management, with task-based process engines allowing users to automate and execute test runs, retrieve test results, and capture defects automatically. Full end-to-end traceability is required between multiple tools’ data, with test coverage reports and test results reporting that helps continuously improve product quality.
Actual Operations Management data is needed to be logged and assessed with Predictive Analytics to predict possible failures and apply preventative maintenance to avoid issues.
A holistic approach is needed, from Requirements Traceability of customer needs to engineering designs, be it software, mechanical, or electrical, to supplier management, with quality metrics measured continuously. A feedback loop occurs where Root Cause Analysis can be done for any issues found within that phase. But this doesn’t address whether the right product is being made or whether certain issues are trending and requires a Retrospective Analysis! Other examples exist, as requirements are often lost, misinterpreted, or found within isolated requirements and PLM platforms. For example, clients may have basic requirements captured in Word documents, PowerPoints or PDFs, and Excel Spreadsheets, and for more detailed functional specifications, the following may be used:
– IBM DOORS for ALM requirements, classic Waterfall development features,
– JAMA for Product Variants and capturing the Requirements for each version,
– Jira for newer Agile requirements,
– PTC Windchill for Mechanical requirements,
– Siemens Teamcenter for Electrical requirements,
– Aras Innovator for Systems-level and Industrial Engineering requirements.
While fine at the time, often it would become apparent that an individual siloed Root Cause Analysis may miss or not have the justification to warrant a bigger solution, but collectively after seeing the recurring trends, the bigger solution is found and justified.
Often, in my advisory roles with firms in the avionics, automotive, railway, telecom, defense, and industrial arenas, requirements are not adequately captured or misinterpreted and may be in different siloed platforms. This is often the case in Cybersecurity, as later in my career, I was asked to architect the Cybersecurity solutions for Siemens Mobile, with Cisco Systems, for Cingular Wireless (now AT&T), where we now know Cybersecurity is more than patching a single hole, but requires a holistic approach.
The process flow is complex, from ideation to design to verification/validation to delivery, with operational aspects and end-to-end product lifecycle management aspects integrated, where each phase, unfortunately, can have errors occurring throughout the process.
Arti: Akshay, When it comes to Root Cause Analysis and Retrospective Analysis, these are backward-looking, while Industry4.0 Innovation teams are forward-looking; how do they embark on this journey transitioning to Industry4.0?
Akshay: Good Question,
What Tools Are Needed, now, in the 2020s, for Industry 4.0?
In a nutshell, solutions are needed that can provide a platform for Root Cause Analysis for individual issues while providing trend analysis at a macro level for Retrospective Analysis but can seamlessly provide integrations to all of the above requirements tools, Model-based Systems Engineering tools, and ERP and CRM systems… to development, test, and delivery, as well as Applications Performance Monitoring in the Cloud, while being a platform for innovation as well. It’s not a single vendor but a best-of-breed platform of platforms from vendors like IBM, Siemens, PTC, ARAS, JAMA, Kovair, SAP, Industrial IoT vendors, and the Cloud Providers.
The recommendations I would give for CIOs, CTOs, and IT Leaders include the following:
- First: They need to form an Industry4.0 Team structure that identifies the volume, variety, and velocity of data from systems, business processes, and internal data lakes, and then the team can capture what Industrial applications are needed, as well as performance, and resiliency requirements, and where these need to run.
- Second: They need to identify all the data sources in motion and at rest, where and how it flows, the networks it runs over, and where it will be used and consumed while protecting data privacy and addressing security vulnerabilities. A firewall alone is insufficient. Role-based Access Controls, ZTNA: Zero Trust Network Access methods, and secure encryption of all data with policy-controlled APIs need to be implemented.
- Third: Newer Innovations can be explored, like Advanced Risk Management through appropriate risk management analysis, by building valid data models and scenarios, with resiliencies and backup plans identified: supply chains, logistics, etc. This can be accomplished through product life cycle management (PLM), with Risk and Hazard Management, along with the development of digital twins of processes, product and user behavior patterns, as well as for societal and environmental risk awareness, and applying newer AI/ML or predictive analytics, to anticipate needs and mitigate when needed.
- Fourth: Monetization. Monetize data, people, processes, services, as well as APIs in newer B2B systems by including contextualization of data exchanges to fuel the right data models together with the right business models. Data and analytics governance for industry organizations engaging in Industry4.0 and its ecosystems can harness the full value of information assets, including newer APIs.
Once the Industry4.0 Team is formed and the internal s/w solutions are identified for those needing localized edge computing support due to ultra-low latency requirements and resiliency requirements, the solutions can be made into “5G Edge Computing Cloud-based” products provided as a Service, some applications will be ALM-based like on-prem embedded systems, and some can be external cloud-based in the public cloud offerings, and if External B2B APIs are needed to be developed whether it’s B2B, Business to Business, or B2C: Business to Consumer, these APIs must be made secure, process-compliant with the correct dashboards, with role-based policy controls to ensure the correct usage is occurring, with ideally DRaaS: disaster recovery as a service built-in…
Typically Industry4.0 will require Industrial Applications like SAP for Logistics or for Industrial Engineering solutions and may require solutions from Dassault Systems, Siemens Teamcenter, PTC Windchill, and Aras Innovator; all of these can be connected via an iPaaS: integration Platform as a Service, or an Enterprise Service Bus which allows for pre-built integrations, with additional benefits including:
- Process Controls, with dashboards and metrics, including Revenue Management.
- Security with Resiliency: inter-application delivery of services and information is often required to be secure, with role-based policy controls and DRaaS: disaster recovery as a service, built-in.
All this should enable a better collaborative environment for better quality, better productivity, better security, and support for better operational resiliency.
Policy-based Data Synchronization and Integration Platform
Regarding the Indian Railways train disaster, specifically, it’s been reported an erroneous configuration was inputted into the electronic interlock system that is either in error or with malicious intent. I believe what is needed specifically here, and can be applied generically, is the following.
- Role-Based Access Controls to ensure only the correct engineers are inputting configurations and signing off on all engineering activities, with quality through centralized governance
- Behavioral Analysis and Security of all engineering activities to ensure rogue elements are avoided,
- Policy Rules Engine and Workflow Engine: real-time policy rules configuration and enforcement with real-time escalation if any policy is in violation.
- Automated Electronic Configuration Checker: a real-time validation engine that generates real-time electronic configuration validation and escalation if any policy is in violation.
- A Governance Controller: reporting and escalation of reports with actionable insights, escalation for any compliance violations, including enforcing a ground halt if required.
These recommendations can be applied to any industrial setting with mission-critical configurations.
Arti: Akshay, how can Indian Railways innovate and ideate? How can any Industrial IoT provider do the same?
Akshay: Good Question, Arti,
Now about innovating for the future!
What if Indian Railways adopts the ISO standards for managing Autonomous Vehicles and leads by example operationalizing this within the Vande Bharat high-speed trains?
What if the same platform above supports the management of ISO 26262, SAE 3016, and SAE 3018 ASPICE®, which are the key standards to be used in automotive development but applied to locomotive trains now?
Visual Monitoring through Reports and Dashboards
- What if Indian Railways leads the world with a wide range of customizable graphical and textual reports using the data from different artifacts used in the entire lifecycle phase? These reports can be included in the different gadgets in various visual dashboards with drill-down capabilities to monitor the KPIs and support decision-making.
- What if Indian Railways leads the world by leveraging the benefit of custom template-based word reports for generating any compliance-related documents with the click of a button, significantly reducing compliance audit time and costs?
- What if Indian Railways leads the world with real-time reports & dashboards, providing a complete audit trail of all changes?
- What if Indian Railways leads the world with real-time supply chain Risk and Hazard Management reports & dashboards, providing a complete audit trail of all suppliers?
Advantages of Automotive Software Engineering applied to Locomotive Software Engineering and Industry4.0 generically:
- Framework modeling, testing, and its re-creation within the closed loop.
- Software frameworks that combine quality prerequisites, such as security, convenience, and performance, with governance and controls.
- Service‐oriented OSs with secure communication stages with Disaster Recovery built-in,
- AI in multi-sensor IoT networks for driver‐assistance frameworks and independent full self-driving
- Distributed end‐to‐end security for software program upgrades into the system’s firmware
- Connectivity of cloud innovations and IT backbones online apps, and emergency‐call processing
This was a major tragedy, but with innovative firms like Kovair, and its ecosystem partners like Siemens, IBM, and others, we can solve this!
Arti: Thanks, Akshay, Any final thoughts? Where can our audience get more information on Improving Industry 4.0? And how can the industry help Indian Railways with the train tragedy?
Akshay: Thanks, Arti; kovair.com has a wealth of information there. Under the whitepapers sub-section, a more detailed description of how Indian Railways can solve this problem is found there.
These are challenging times now, but with newer solutions and thinking, hopefully, this will all be alleviated with better solutions from platform providers like Kovair, and its partners, as we discussed. Hopefully, the result is better harmony within IT teams and better business outcomes… from innovations by the Industry 4.0 community…So hopefully, this is a Game-changer!
Arti: Indeed it is! Thank you, Akshay!
Read the full article here