Getting the diagnosis right often takes many years of doctors’ visits and several misdiagnoses.
The European Union defines a disease as rare when it affects fewer than five individuals in 10,000. This low prevalence is one of the reasons these diseases are so hard to identify. Getting the diagnosis right often takes many years of doctors’ visits and several misdiagnoses.
AI-based diagnostic-decision systems would help doctors by enabling the speedy identification of these rare diseases. With ChatGPT and other AI tools bringing us closer towards the mass use of learning and generative models, we now need to find the courage to innovate potential use cases for healthcare that increase the welfare of people in Finland. Patients would benefit from more accurate diagnosis, as well as a better quality of care.
The HUS eCare for Me rare disease research team I’m part of specializes in studying how such diagnostic-decision support systems can help doctors with the early identification of rare diseases.
We selected three autoimmune diseases – glomerulonephritides, myositides and vasculitides – as they have the highest treatment costs for Helsinki University Hospital and enough patients to make it possible to train a machine-learning model. Glomerulonephritides affects the kidneys, myositides the muscles, and vasculitides the blood vessels.
There are some vasculititides that can cause glomerulonephritis and/or myositis, making diagnosis challenging. Our dataset thus included patients that have multiple diseases, which makes prediction even harder.
We studied a total of 114,897 patients, of which 14,897 have at least one of these rare diseases. We also studied 100,000 randomly selected control patients, so we could train the model to identify non-rare diseases patients too. Each patient in the study provided laboratory samples and diagnoses from multiple source systems.
The patient data was divided into three different subsets. One for training the model itself. A test dataset validated that the model was learning the right things, while a validation dataset – which the models have never seen – was used to ensure the validity of performance results.
The team’s methods are categorized as ‘supervised learning’, as labels have been created for each data point that the model needs to predict. XGBoost is an older state-of-the-art decision tree model which is quite commonly used, as it is easy to implement and does not need much computational power. Simply explained, the model uses lots of arguments that can be true or false, generating results in what we call a leaf node.
We also used a more complex model called InceptionVasGloMyotides. For simplicity, we can illustrate this by way of an image of a cat created with filters. A filter is a smaller window that scans through an image’s pixels and changes them from color to black and white, for example, with the outcome being a black and white image. This is done multiple times with different filters. At the end we can calculate the average number of pixels and the probability ratio of generating a cat. InceptionVasGloMyotides basically applies the same model to disease analysis.
In the end we managed to develop a new custom-made inception-type residual neural network for identifying these three rare diseases. This new machine learning model – which we called InceptionVasGloMyotides – is competitive on the level of XGBoost. The model could predict the diseases at least 30 days before the initial diagnosis by a doctor, and in the best case even years before. In most cases the accuracy of these models is more than 80%.
When compared to existing disease-detection support systems, the competitiveness of our models was at least similar but sometimes superior. In the best cases, binary classification could define that the patient may have any of the rare diseases without specifying which one. The easiest individual disease for the model to predict was glomerulonephritides.
Accuracy is one metric, but it’s not always the best for estimating the outcome. In machine learning and medicine, the most important metrics for estimating the probability of a correctly predicted outcome are the positive predictive value and negative predictive value.
With these values we can achieve over 92% certainty of a positive result using XGBoost, for a threshold of confidence score of over 93%. For InceptionVasGloMyotides we can achieve equal or better results for glomerulonephritides, with 98% certainty and a binary classification of 99% with a bigger prediction timeframe. This was not the case with vasculitides and myositides. While XGBoost had a 96% certainty in glomerulonephritides and a binary classification of 99% in the same timeframe, the model is still better with vasculitides and myositides. However, with both models’ negative predictive value varies between 38% and 97%, which means that confidence threshold in some lower negative predictive value cases need to be adjusted lower.
With these results we could potentially speed up the diagnosis of these diseases, improving patient care and avoiding the costly process of delayed and/or misdirected treatments and medications. Early diagnosis also means early treatment, which often means less treatment is needed. In this way our work has the potential to be beneficial to all parties in the care chain, but especially for patients.
The research team is continuing to develop the models so that we can extract the best performance. For example, adjusting ratio of positive and negative predictive values to more balanced state where the aim is to have both values at least about 80%, and minimizing false negatives is more important than minimizing false positives, as false negatives can affect a patient’s treatment process and decrease their quality of life.
We are also planning to clinically validate the models. This is needed before we can introduce them to the disease decision-support system to aid the work of healthcare professionals.
In future development work we want to test a two-step classification. Identifying controls from rare disease patients worked extremely well, so we want to use these as the first layer. After that we can flag potential individual rare diseases if the probability threshold is met. This gives us more room to find the best models for each disease.
Rasmus Ryyppö works as Data Scientist in Tietoevry Care and is a doctoral researcher at Tampere University. He is expert in machine learning and data mining and is passionate about harvesting the knowledge from data to implement new data solutions.