Building more reliable language AI in low-resource and high-stakes settings

Making AI language systems work better with less data and in challenging settings

Building more reliable language AI in low-resource and high-stakes settings

15 april 2026

PhD researcher Iftitahu Ni鈥檓ah investigated how natural language processing (NLP) systems can be made more reliable when operating under limited data and challenging real-world conditions.

image: iStockphoto.com

Artificial intelligence (AI) systems that process human language are used in search engines, chatbots, customer service systems, legal document analysis, and content moderation. While these systems have become increasingly powerful, their performance depends on large amounts of high-quality annotated data. In many real-world settings 鈥� such as under-resourced languages, specialized domains, or changing application contexts 鈥� high-quality data is scarce, expensive, or difficult to obtain. At the same time, evaluation methods are often unreliable under these conditions, making it difficult to assess system behavior and performance in a meaningful way. As a result, current language AI systems can be difficult to evaluate and may perform inconsistently in practice.

PhD researcher addressed these challenges by developing data-efficient learning methods and rigorous evaluation frameworks for natural language processing (NLP) systems in low-resource settings.

Her dissertation investigated contrastive learning methods and evaluation frameworks that remain reliable when supervision is limited or imperfect, with the aim of improving NLP systems under data-constrained conditions. She defended her PhD thesis at the Department of Mathematics and Computer Science on Tuesday, April 14.

Data-efficient learning

A key part of Iftitahu Ni鈥檓ah鈥檚 research focused on data-efficient learning techniques for NLP systems in low-resource or noisy settings. Her thesis introduced a new learning method that helps models to better identify when new input differs from what they have seen during training. This improves their ability to detect unfamiliar or unexpected cases.

Nimah also developed a comprehensive training and evaluation framework for fake news detection in Indonesian. The results showed that contrastive learning can significantly improve performance even when data is limited, uneven, or noisy, helping to build more reliable language AI for underrepresented languages.

More reliable evaluation of language generation systems

Another part of Ni鈥檓ah鈥檚 dissertation studied evaluation methods for language generation tasks such as summarisation and dialogue systems. While human evaluation is considered the most reliable approach, it is expensive and difficult to scale, which has led to widespread use of automatic scoring methods.

The research presented a framework that allows these automatic methods to be evaluated more systematically. This makes it possible to assess whether they really reflect human judgement, particularly in low-resource settings where evaluation data is limited.

Legal domains

Iftitahu Ni鈥檓ah鈥檚 research also studied AI systems for legal question answering in Indonesian that combine retrieval and text generation. Legal documents are complex and highly structured, which makes them difficult for language models to process.

She introduced a new evaluation dataset and a controlled setup for testing this task, and showed that the way that documents are split into smaller parts has a strong impact on performance. These findings provide practical guidance for building more reliable AI systems in sensitive domains, such as law.

Towards trustworthy and data-efficient language AI

Overall, the research conducted by Iftitahu Ni鈥檓ah shows that increasing model size alone is not sufficient for reliable artificial intelligence in real-world applications. Instead, robust language technologies depend on data-efficient learning methods and evaluation frameworks suited to low-resource and realistic deployment settings. In particular, the work highlights the importance of data-efficient approaches for building trustworthy NLP systems in languages and domains with limited annotated data.

Her research also demonstrates that evaluating language generation systems should go beyond simple correlations with human judgment, as these can be misleading. While automatic metrics remain useful, they should be applied with caution and complemented by human evaluation. Together, these contributions support the development of more reliable, transparent, and practical AI systems, aligned with broader efforts鈥攕uch as those at 黑料福利网鈥攖o create AI that is robust, fair, and effective in real-world contexts.

PhD researcher Iftitahu Ni鈥檓ah. Photo: Vincent van den Hoogen