Building more reliable language AI in low-resource and high-stakes settings
PhD researcher Iftitahu Ni’mah investigated how natural language processing (NLP) systems can be made more reliable when operating under limited data and challenging real-world conditions.
Artificial intelligence (AI) systems that process human language are used in search engines, chatbots, customer service systems, legal document analysis, and content moderation. While these systems have become increasingly powerful, their performance depends on large amounts of high-quality annotated data. In many real-world settings – such as under-resourced languages, specialized domains, or changing application contexts – high-quality data is scarce, expensive, or difficult to obtain. At the same time, evaluation methods are often unreliable under these conditions, making it difficult to assess system behavior and performance in a meaningful way. As a result, current language AI systems can be difficult to evaluate and may perform inconsistently in practice.
PhD researcher addressed these challenges by developing data-efficient learning methods and rigorous evaluation frameworks for natural language processing (NLP) systems in low-resource settings.
Her dissertation investigated contrastive learning methods and evaluation frameworks that remain reliable when supervision is limited or imperfect, with the aim of improving NLP systems under data-constrained conditions. She defended her PhD thesis at the Department of Mathematics and Computer Science on Tuesday, April 14.
Data-efficient learning
A key part of Iftitahu Ni’mah’s research focused on data-efficient learning techniques for NLP systems in low-resource or noisy settings. Her thesis introduced a new learning method that helps models to better identify when new input differs from what they have seen during training. This improves their ability to detect unfamiliar or unexpected cases.
Nimah also developed a comprehensive training and evaluation framework for fake news detection in Indonesian. The results showed that contrastive learning can significantly improve performance even when data is limited, uneven, or noisy, helping to build more reliable language AI for underrepresented languages.
More reliable evaluation of language generation systems
Another part of Ni’mah’s dissertation studied evaluation methods for language generation tasks such as summarisation and dialogue systems. While human evaluation is considered the most reliable approach, it is expensive and difficult to scale, which has led to widespread use of automatic scoring methods.
The research presented a framework that allows these automatic methods to be evaluated more systematically. This makes it possible to assess whether they really reflect human judgement, particularly in low-resource settings where evaluation data is limited.
Legal domains
Iftitahu Ni’mah’s research also studied AI systems for legal question answering in Indonesian that combine retrieval and text generation. Legal documents are complex and highly structured, which makes them difficult for language models to process.
She introduced a new evaluation dataset and a controlled setup for testing this task, and showed that the way that documents are split into smaller parts has a strong impact on performance. These findings provide practical guidance for building more reliable AI systems in sensitive domains, such as law.
Towards trustworthy and data-efficient language AI
Overall, the research conducted by Iftitahu Ni’mah shows that increasing model size alone is not sufficient for reliable artificial intelligence in real-world applications. Instead, robust language technologies depend on data-efficient learning methods and evaluation frameworks suited to low-resource and realistic deployment settings. In particular, the work highlights the importance of data-efficient approaches for building trustworthy NLP systems in languages and domains with limited annotated data.
Her research also demonstrates that evaluating language generation systems should go beyond simple correlations with human judgment, as these can be misleading. While automatic metrics remain useful, they should be applied with caution and complemented by human evaluation. Together, these contributions support the development of more reliable, transparent, and practical AI systems, aligned with broader efforts—such as those at ºÚÁϸ£ÀûÍøâ€”to create AI that is robust, fair, and effective in real-world contexts.
PhD researcher Iftitahu Ni’mah. Photo: Vincent van den Hoogen
-
Supervisors
Mykola Pechenizkiy, Vlado Menkovski, Meng Fang
Written by
More on AI and Data Science
Latest news