Datum
woensdag 12 maart 2025 vanaf 3:30 PM tot 4:30 PMLocatie
Neuron 0.262Organisator
Industrial Engineering and Innovation SciencesMedeorganisator
Eindhoven Artificial Intelligence Systems InstitutePrijs
free
Topic
Shortcuts and other shortcomings in machine learning for medical imaging
Abstract
The application of machine learning (ML) to medical imaging diagnosis has attracted a lot of attention in recent years, with numerous reports of recognising medical images more accurately than human experts. Yet progress in clinical practice has not been proportional to claims. Studies for other clinical applications of ML have also failed to find reliable published prediction models.
The increased popularity of ML in recent years is often explained by two developments. First, there are several large publicly available datasets. Second, open source deep-learning toolboxes allow development of algorithms without specialised domain knowledge, allowing more researchers into a field. Despite these seemingly ideal conditions for reproducibility, the state of ML in medical imaging is not as positive as one might think. In this talk I will highlight two of these issues.
One issue is that large sample sizes are not a panacea. There is a tendency to expect that a clinical task can be 鈥渟olved鈥 if the dataset is large enough. However, not all clinical tasks translate neatly into ML tasks. Furthermore, creating larger datasets often comes at the expense of quality, leading algorithms to learn spurious correlations or 鈥渟hortcuts鈥. For example, our recent results how that lung diseases can be diagnosed with high accuracy, even if the lungs are hidden from the x-ray 鈥 because ML learns to associate the way that patient was scanned, with the disease.
Another reason is that the availability of data and code, plus the theoretical option to 鈥渋nfinitely鈥 repeat experiments (for example, with different subsets of data, different initialization of the algorithms, etc.) creates an illusion of generalization. Since there are many degrees of freedom to how such repetition can be done, for practical reasons researchers tend to not do this exhaustively, but might be tempted to formulate their conclusions more generally.
In this talk I dive deeper into these problems and hopefully, with the help of the audience, also explore some solutions. I will also touch upon various incentives in ML and academia that interact with these findings.
About the speaker
Veronika Cheplygina is an Associate Professor at the IT University of Copenhagen. Her background is in machine learning in general, and based on medical images in particular. She is also thinking about how we do research, and addressing the inefficiencies/inequalities involved. Before ITU, she was faculty member at the Eindhoven University of Technology. Find more info on
Your host
Daniel Lakens of the department of Industrial Engineering and Innovation Sciences will host Professor Veronika Cheplygina of the IT University of Copenhagen.
Registration is required but free of charge.