AI at Industrial Scale: How Industrial Data Science is Different from Other Sectors
Emeli Dral
Chief Data Scientist, Mechanica AI
As every data scientist knows, mathematics is quite universal. Whether you are building machine learning models for retail, banking or telecom company, you are using pretty much the same set of algorithms. One may expect that, as a result, it is very easy to jump from one sector to another, with all your experience being fully transferable.

Truth is, though mathematics is indeed the same, there are many nuances that matter in different domains - and sometimes are even decisive for the success of a specific project. Industrial sector has its unique traits. While some of them - such as engineering culture and presence of stable processes - make it even more attractive for data science, others - such as the high cost of mistake and strict safety requirements - often turn it into a challenge.
Beyond mathematics
First of all, the business culture of those using the technology in the industrial sector is very different from, say, IT or retail. Industrial companies are built and run by engineers who are well-versed in using numbers and performing experiments. This makes the process of defining features for the models, choosing the success metrics or designing A/B tests much smoother than in other industries. One does not need to explain why you need a test before going live - engineers are used to basing their decisions on data, not on gut feeling.

In addition, the expectations from the use of technology differ. In e-commerce, for example, machine learning models are often tasked to improve conversions, visits, clicks, and resulting uplift can be as high as 20 or 50%. Industrial companies typically do not have the luxury of expecting 50% improvement in some process metric. There are simply not so many degrees of freedom in what can be done. The laws of physics are constant, the modernization of equipment is very expensive, and the "usual" process optimization methods have already been applied. Adding just one server that has one AI-based service running on it and getting an uplift of 2% often sounds too good to be true. And this exactly what machine learning can bring.
Safety first
For obvious reasons, industrial sector has very strict requirements regarding safety and reliability. Imagine that your AI-based personal assistant suddenly does not recognise your voice, or an e-commerce engine gives you a wrong product recommendation or fails to display one. This may be frustrating, but it does not lead to any major consequences except for some customer dissatisfaction. If same happens in the industrial setting, the cost of mistake is much higher. This results in very strict standards and specific approaches to developing AI models for the industrial sector - to ensure that the system never fails to display a recommendation, and acts strictly inside the possible safety range.
Balancing model quality and safety compliance is often more of an art than data science.
At Mechanica AI, we use a number of approaches to incorporate domain knowledge and traditional models in machine learning systems. This way we ensure that the resulting solution behaves correctly and in accordance with known dependencies and physics of the process.
All about data
Industrial data is also quite different. In many other domains where data science is applied, data is generated by humans - customers purchase goods, they click and visit websites, make calls and so on. In industrial sector, we mostly work with machine-generated data. Typically this data has high frequency and is created in fully automatic fashion. On the one hand, having a lot of data is great for machine learning. On the other, you have to learn to deal with very specific challenges and limitations.

One example is systematic mistakes that accumulate over time. You may have a sensor that doesn't perform well under low temperatures and you need to deduct the pattern and account for it, without excluding the data fully. Another sensor may be broken and not replaced until the next technical maintenance - so that at every point in time you have one or more data sources missing. Even in production use, your models need to know how to act in this case.

Often, certain data points are received with a significant delay. For example, results of lab analyses are known hours after they were sampled. The models still need to be able to operate in real-time and return the recommendations or predictions in a situation when not all parameters of the process are known with precision. We use a number of tricks to "reconstruct" and make virtual "measurements" to ensure that models are reliable even when we operate under uncertain conditions.

Accessing the data is by itself a challenge. It typically comes from process control and manufacturing execution (MES) systems that were not originally made for the ease of data analysis. The archiving systems are old, data is often stored in a variety of formats, is badly structured and includes excessive amount of repeated readings. Sometimes, you may even have to deal with "fake" data: generated due to requests to the system being more frequent than actual measurements. Knowing and understanding the process to certain extent is essential to account for all such peculiarities.
Still, all these challenges make the data science part a lot of fun too. The idea of taking abstract mathematics and bringing it to the real industrial plants is a great motivator for our data science team. If you are curious to learn more about our work or eager to join - drop us a line!
More posts you may like
Made on