Thermal Runaway Is Not Just a Cell Problem: The Missing Link in BESS Safety Testing
This article is authored by Dr. Pradyumna Gupta, Founder and Chief Scientist of Infinita Lab and Infinita Materials. In this article, he explores the importance of bridging materials science, reliability engineering, and system-level safety to enable more resilient and sustainable next-generation technologies.
In April 2019, the commissioning test of a battery energy storage system in Surprise, AZ, caused a thermal runaway event within the battery at the substation. This event ultimately caused an explosion, which injured four firefighters and demolished the installation. Every cell in that system had passed their respective certification testing. Every module had been validated under its respective requirements. System-wide propagation behavior had never been characterized, despite it being the most critical scenario.
This system failure is not a standalone incident. There have been numerous grid-scale battery energy storage fires within the U.S., South Korea, UK and Australia over the last six years. Battery fire frequency has a positive correlation with total installed capacity. Nearly every one of the post-fire investigation results follow the same pattern: Current cell and module certifications existed; however, the behavior of the system under propagating thermal event conditions was never characterized.
This is not a manufacturing defect; it is a testing design failure.
What the Standards Actually Test
The main stationary battery storage certifications in place- IEC 62619, UL 1973, and IEC 62933-were built on an intuitive engineering idea-that if the building block is qualified, then the system built with that block is also qualified. Cell level abusive testing consists of overcharge, over-discharge, external short circuit, crush, and thermal testing, among others. Module level qualification includes many of these for multi-cell systems. UL 9540A attempts to include propagation testing for the modules and the unit level device.
However UL 9540A, even when conducted to its highest severity level, tests the propagation of thermal runaway across the limits of a test specimen, not a complete installed system. This standard was never designed to characterize the transport of heat, gas and pressure in a given rack arrangement, with a given thermal configuration, at a given state-of-charge distribution, and no standard today allows for that.
Meanwhile, the devices that are being deployed are increasing in energy density and scale. A utility-scale BESS today could be tens of thousands of cells over hundreds of racks in a single enclosure. It is not engineering conclusion that component level safety data scales linearly to system behavior; it is the fact that this testing framework has been carried forward, not overcome.
Three Variables That Cell Tests Cannot Predict
Three parameters at the system level dictate thermal runaway propagation, and simply cannot be measured with current, standalone module testing methodology:
- Pack Geometry: The way the modules are arranged thermally couples neighboring modules. Whether or not an event triggered by one cell becomes an adjacent module fire or a local, one-module runaway depends on such things as module spacing, orientation in the rack, path of airflow through the enclosure, distances between modules, and so forth. These are a function of installation not component specification. There is little information a design engineer can glean about the propagation risk of the qualified module placed in the 12th position of a forty-rack enclosure with a common cooling plenum by testing it individually in a calorimetric chamber.
- Thermal Management Design: A design that utilizes active liquid cooling, passive air cooling, phase change materials or a hybrid system architecture reacts to the thermal output of an event in one of three ways; it could accelerate propagation, it could retard propagation or it could redirect propagation. How a system designed to cool under normal operating conditions responds to a propagating thermal event is a system-level phenomenon and not something that can be deduced from module-level tests.
- State of Charge distribution at time of failure initiation: This is arguably the least understood. Cells that are at a higher SOC carry a greater amount of energy with them, and are therefore at lower temperatures with respect to an exothermic reaction threshold. In a practical system, cells in a pack will never be at uniform SOC.
Why Propagation Is a Systems Problem
Thermal runaway in a lithium-ion cell generates heat, a flammable gas, and potentially ejected material. The rate at which the cells adjacent to a runaway event subsequently enter runaway is determined by the heat transfer coefficient between cells, the thermal mass of intermediate structures and the time-temperature history of the original event.
No test of individual cells, modules, or unit-level UL 9540A data can predict this combination. In every case investigated by fire authorities for significant BESS incidents, the path of the fire followed the geometry of the enclosure and the cooling system rather than the chemistry of the cell. The cell chemistry dictated what can occur, while the system determined what occurred.
What This Means for Procurement and Deployment
The downstream result of this divide is that asset owners, utilities, and project developers are making deployment decisions using certifications that don’t describe the system they are installing’s actual risk profile.
Without understanding how propagation behaves in a specific system configuration, it is impossible for an underwriter evaluating BESS facilities to accurately underwrite probable maximum loss. A fire department preparing a pre-incident plan will be ineffective if the only data points available are module datasheets.
Treating Safety Testing as a Systems Discipline
What should not be done is abandoning cell and module certification. That process has real, valuable uses. What should be done is institutionalizing system-level propagation characterization as a distinct and mandatory step in BESS qualification, with the same level of rigor used for component testing.
This means performing propagation testing at scale appropriate to system size using the rack geometry and thermal management configuration of the actual installation. It means characterizing initial-event scenarios across the range of operating SOC, not just at design conditions.
The most important shift, then, is conceptual: the safety of BESS must be viewed as a system property, not as a cell property. The same cells, configured within two different system designs, can have widely different risk profiles for thermal propagation.
A runaway does not occur with the start of ignition. A runaway occurs when the test program failed to describe the system that was supposed to be tested.
Also Read: BESS: Replacing Diesel Gensets with Green Power Backup
Subscribe to our Newsletter
Subscribe today for free and stay on top of latest developments in Cleantech domain.
