Designing efficient algorithms for massive data analysis

Efficient algorithms for scalable and dynamic data analysis

Designing efficient algorithms for massive data analysis

November 12, 2025

Leyla Biabani developed new algorithms that enable the efficient and reliable analysis of massive, constantly evolving datasets, meeting a key challenge in today鈥檚 data-driven world.

image:iStockphoto.com

Every day, hundreds of millions of terabytes of data are generated by our online activity, sensors, and digital systems. Analyzing this massive and constantly changing data quickly and accurately is essential, because many technologies we rely on鈥攍ike Netflix recommendations, traffic apps, or healthcare algorithms鈥攄epend on it. If algorithms are too slow, inaccurate, or unable to adapt to changes, the results can be delayed or misleading, affecting millions of people in their daily lives.

Traditional algorithms often assume that all data can fit on a single computer, but in reality, data is distributed, evolving, and far too large to process in full.

PhD researcher Leyla Biabani addressed this challenge by developing scalable algorithms that can handle these modern realities without sacrificing accuracy. She defended her PhD thesis on Tuesday, November 11.

Tackling two core problems in data analysis

focused on two fundamental problems in machine learning: K-center clustering, which groups similar data points, and submodular maximization, a general optimization framework that models many decision-making tasks.

Both play a central role in applications such as pattern recognition, resource allocation, and network analysis.

Algorithms for massive data models

Her research explored these problems within three major computational settings that reflect how real-world systems operate: distributed environments, where data is processed across many machines; dynamic models, where information changes over time; and streaming models, where data arrives continuously and memory is limited.

By designing efficient algorithms for these scenarios, Biabani demonstrated how large-scale data can be processed quickly and accurately.

Making clustering more robust

In the first part of her thesis, Biabani developed methods to group similar data points even when the dataset contains errors or irregular entries. This makes it possible to analyze large, irregular datasets more reliably, which is important for applications such as detecting patterns in social media, tracking traffic flows, or monitoring health data.

Her work is among the first to provide strong guarantees that these methods will perform well in real-world, large-scale systems.

Optimizing under change

In the second part of her thesis, Biabani developed methods to make decisions that rely on large datasets more adaptable. Her algorithms can quickly update solutions when new data is added or existing data changes, ensuring that outcomes remain accurate even as conditions evolve.

This is especially useful for applications such as optimizing resources, improving recommendation systems, or monitoring complex networks, where data is constantly changing and decisions need to stay up to date.

Advancing the theory and practice of scalable computing

Biabani鈥檚 research delivered both practical methods and new theoretical insights into the limits of large-scale computation.

Her results include several first-of-their-kind algorithms that improve efficiency and accuracy across computational models.

Her work has been published in ten papers at leading international conferences, including SODA, NeurIPS, and ICML, highlighting its impact at the intersection of theoretical computer science and machine learning.

PhD researcher Leyla Biabani. Photo: Vincent van den Hoogen