Skip to Content
A Woman standing next to a sign about her scientific work

AI Recognizes Emotions – Even in Low-Resource Languages

  • News
  • Research

Whether joy, anger, fear, or sadness – language models such as ChatGPT have long been able to recognize emotions in texts. At least, that's true for English and other languages for which extensive training data is available. But what about less common languages?

There is still some catching up to do here, says Dr. Daryna Dementieva: “Large language models – known as LLMs – basically work in many languages. But for certain downstream tasks, they don't perform with sufficient precision – including emotion recognition in low-resource languages,” explains the research assistant at Prof. Alexander Fraser's Chair of Data Analytics & Statistics at TUM Campus Heilbronn. “The reason is often a lack of data sets. That's why it's necessary to collect specific regional, cultural, and language-specific data for the respective target language.”

This is exactly what is happening in the multilingual, multinational project “BRIGHTER” in which Dementieva is involved: Researchers from numerous countries are working together to develop a data set for 28 languages from Europe, Asia, Africa, and America. Dementieva's contribution: she created EMOBENCH-UA – the first publicly available data set for emotion detection in Ukrainian texts. 

The reason why she devoted herself to the Ukrainian language is obvious: “Ukrainian is my mother tongue. That's why I was invited to participate in this joint project.” EMOBENCH-UA was based on a corpus of nearly 5,000 posts on Platform X (formerly Twitter) in Ukrainian. There were several reasons for choosing this data, the most important being: “We were interested in texts that express feelings. Such content is particularly common on social media. At the same time, we needed short, freely accessible texts in Ukrainian.”

 

Original Data Works Best

 

Dementieva explains the next steps: “From the data set, which comprises several million tweets, we first made a preliminary selection: We identified posts with emojis, translated them into English, and analyzed them with an emotion classifier. This allowed us to filter out the texts that were highly likely to contain an emotion.” The actual labeling was then carried out via the crowdsourcing platform Toloka.ai: The test subjects – all native Ukrainian speakers – were asked to assign each post to one of six feelings: anger, fear, joy, disgust, surprise, sadness, or the seventh category, “no emotion”.

In the next step, the researchers had various language models perform the same emotion analysis – from language-based approaches to so-called transformers, which analyze texts in their overall context, to LLMs. The researchers then compared the results to find out which type of model is best suited for emotion detection in Ukrainian.

The most important findings: Despite strong results from the latest LLMs, the leaner transformer models proved to be a competitive and efficient alternative. In general, models trained with original Ukrainian-language data performed significantly better than those based on synthetic—i.e., artificially generated or automatically derived—training data. However, Dementieva was surprised by the strong performance of the Chinese models Qwen and DeepSeek. These were primarily trained for English and Chinese, but performed significantly better than some models developed specifically for Ukrainian. This may be due to their ability to think logically: “These models have particularly well-developed mechanisms for processing and structuring information. This enables them to achieve good results in other languages as well.”

 

Numerous Areas of Application

 

How can Dementieva's findings be put into practice? “I hope that Ukrainian developers in the field of NLP will benefit from this dataset and make use of it,” says the scientist. One possible area of application could be the analysis of product reviews: “If companies want to understand how their products are being discussed in the media or in feedback forms, they can analyze specifically whether positive or negative emotions predominate.” Dementieva also sees great potential at the social level – for example, in the fight against disinformation: Fake news often goes hand in hand with strong negative feelings or hatred. Such analyses can help to identify and filter harmful content at an early stage." Last but not least, there is increasing demand for chatbots that respond sensitively to emotions. EMOBENCH-UA could be an important component in developing such bots for Ukrainian.

Since the researcher will make all collected data and the best-performing trained model publicly available, the development process of the data set can also be transferred to other languages: “Of course, the definition of emotions and some examples will have to be adapted, but then the design can be reused to a large extent. In any case, interested parties can translate our instructions and the user interface design into their respective languages and add language- and culture-specific examples. Then it should not be too much effort to make the model work.”