Get learnings from Europe’s largest joint research project on Artificial Intelligence of Things (AioT). The project aims at creating trust in AI based industrial intelligent systems.
In this blog post, I elaborate on the findings we made as part of the InSecTT consortium* project.
Cyber-attacks are wasting companies’ time and money. 72% of industrial sector organizations have experienced cyber disruption to their IT environment at least six times during the past 12 months, with average financial damage estimated at $2.8 million [source1]. Another study suggests that data breaches caused by cloud security vulnerabilities cost companies an average of $4.8 million USD to recover [source2].
Machine Learning based solutions that are currently integrated into cybersecurity solutions rely heavily on data. A new challenge appears when we store the Machine Learning algorithms’ training data on a centralized server, which can increase the vulnerability of the whole system.
Tietoevry addressed these challenges in partnership with Mälardalen University by developing an automatic network intrusions detection solution that makes industrial systems less vulnerable to cyber threats.
Network Intrusions detection is an old challenge in the Cybersecurity domain. Solutions are usually integrated into an industrial digital twin, which can detect and classify the type of cyber-attacks. Nowadays Machine Learning based solutions are the standard approach. They are constantly improved to monitor and detect potential threats more accurately.
However, most solutions rely on a centralized Machine Learning setting, where all the network traffic is collected and uploaded to one server. The main drawbacks of the centralized approach are:
Time criticality: By the time the centralized solution alerts about the attack, the attacker has already taken control of other machines.
Data breach: An intruder can steal or copy all data if sniffing the network used for data send-outs.
To tackle these challenges, we proposed a decentralized Machine Learning solution where the network intrusion system is deployed on each device. Consequently, this approach improves monitoring reactivity and removes data breaches by design.
In the decentralized setup, an intruder is stopped when trying to connect to a new machine. And all data is kept locally in devices, not sending it over the network. This makes it more difficult for an intruder proceed with speed and gather all the data.
Our solution is based on the Federated Learning paradigm where the intrusion detection algorithm is trained across multiple devices holding local network traffic data. Network traffic data remain on each device and are never exchanged with others.
In the next chapters, I explain in more detail our Federated Learning solution for tackling network intrusions.
As the Machine Learning algorithms require a lot of data for training purposes, one of the main obstacles is the security of the provided data. Storing data and performing predictions on the centralized server can raise various security issues. These obstacles can be solved by the implementation of a federated learning approach.
A federated learning approach is a decentralized learning technique that trains models locally and only transfers the parameters to the centralized server, which appropriately addresses the needs of removing potential data breaches in Intrusion Detection Systems.
Our approach was to develop an anomaly-based intrusion detection system with a network-based data source (NIDS). We determined that the classification algorithm called Random Forest* had the best performances for both intrusion detection and attack classification, hence we proposed a federated learning approach of Random Forest. The Random Forest models were trained on each device with local data and transferred to the centralized servers where they were aggregated. We investigated different merging methods for aggregating local Random Forests that we evaluated only for intrusion detection, on well-known public datasets (KDD99, NSL-KDD, UNSW-NB15, and CIC-IDS-2017). The best performance was achieved with the model aggregating only the best Decision Trees from each Random Forest* (see Figure 1).
Figure 1: Our approach utilized only the best Decision Trees from each Random Forest*
The research proposes an automatic Intrusion Detection approach by limiting data breaches on centralized servers. The aim is to build a shared global learning model via decentralized learning using data generated from local devices or processes.
Tietoevry and Mälardalen University published a scientific article on distributed machine learning: Random Forest Based on Federated Learning for Intrusion Detection [source3]
As a part of future work, we are investigating approaches to make our solution robust to potential data reconstruction attacks.
Even if the Federated Learning setting makes it harder for attackers to cause data breaches, it may be possible for them to reconstruct the data based on the local models transferred to the centralized server. To avoid such scenario, it is possible to ensure data privacy constraint during the local training. Tietoevry carries out this kind of experiment to guarantee that the data are not compromised globally and locally.
Tietoevry is a member of the Intelligent Secure Trustable Things (InSecTT) consortium, Europe’s largest joint effort on Artificial Intelligence of Things with a budget of over 40 million Euro distributed over 3 years.
The project aims at creating trust in AI-based intelligent systems and solutions as a major part of the Artificial Intelligence of Things (AioT) within applications in a variety of industries such as manufacturing, transport, logistics, and healthcare.
In this project, Tietoevry is involved in the Secure Industrial Communications System use case where the aim is to elaborate a secure strategy for network systems that are more and more present in factories and the industrial sector.
Random Forest (RF) is usually composed of a set of Decision Trees. RF’s prediction is usually the average over the set of predictions provided by the Decision Trees. In this case, a portion of the Decision Trees (only the best ones) is kept in the final aggregation by the centralized model.
Learn more about how we work around data driven business transformation
We are recruiting. Give life to your inner Transformer. Join Tietoevry Transform.
1. Cyber attacks on industrial assets cost firms millions
2. Top 6 cloud vulnerabilities
3. Random Forest Based on Federated Learning for Intrusion Detection
David holds a PhD in Machine Learning. He has a strong academic background and extensive experience in various industries such as manufacturing, financial services, information services, and human resources. He is passionate about AI, Data Science, and data driven processes to ease business transformation.