When language models learn like babies

ID: 732852

(PresseBox) - Whether in digital assistance systems, text summarization, or programming—wherever language needs to be processed efficiently, AI-supported large language models (LLMs) are used. But these supposed all-rounders have their weaknesses. One of them is that trillions of words are sometimes needed to train a model. This has significant disadvantages, ranging from high costs and enormous energy consumption to greater sensitivity to bias.

In addition, LLMs often fail at tasks that seem trivial to us humans, explains Dr. Lukas Edman, postdoc at Prof. Alexander Fraser's Chair of Data Analytics and Statistics at TUM Campus Heilbronn: “They have difficulty with long-term contexts. For example, if you talk to ChatGPT for a very long time, it often no longer understands what was said some time ago. They have problems with logical thinking—complex tasks have to be broken down into smaller steps. They even fail at very simple tasks: they often can't insert a specific letter in a particular place in a word, or they don't recognize that a sentence can be grammatically correct even though it doesn't make sense in terms of content.”

The young scientist is researching Masked Language Modeling (MLM) – a training method in which individual words in a sentence are masked, i.e., left out. The model is supposed to predict the missing words and thus learn to figure out the meaning from the context. MLM improves the general understanding of sentences and makes it possible to get by with significantly less training data than before. The biggest advantage, in Edman's view, is that “the method is very similar to human learning: when we listen to someone, our brain constantly tries to predict the next word. If our prediction is wrong, we have to adapt and learn from it. This is exactly how the training of the models works – and that makes it easy to implement.”

Refinement Through Selective Masking

But MLM also has disadvantages: with simple sentences, the model learns very quickly which word is missing. For example, if you leave out the word “to” in the sentence “I like to go shopping,” it fills in the gap correctly after just a few attempts. “If such a passage continues to be masked, it does not provide any new insights and costs unnecessary computing time,” says Edman.

This is where Adaptive MLM comes in – a refinement of standard MLM in which the masked words are specifically selected. “First, we leave out randomly selected words. During training, we check whether the model predicts them correctly. We weight all correctly predicted words lower so that they are masked less frequently in the future. Instead, the training focuses on the difficult cases,“ explains Edman. For example, versatile adjectives or adverbs are more difficult to predict than very common words such as “the” or “and”.

Recognizing Connections Without Large Data Sets

It is often helpful to break words down into tokens—smaller units—or even finer components called subtokens. By splitting the word “walking” into the tokens “walk” and “ing,” the model can recognize the connection between “walk” and “walking” without relying on extremely large amounts of training data. “In fact, there has been some progress here, especially in adjective nominalization – that is, when an adjective such as ‘laughable’ is converted into a noun such as ‘laughability’. We often work with invented adjectives such as ‘wuggable’, which the model is supposed to convert into the noun ‘wuggability’. This teaches it the rule that ‘able’ typically becomes ‘ability’ and not ‘ness,’ for example,” explains Edman.

The goal is to develop a model that can access all letters in every word: “We humans can do that. We normally ignore this information when reading. But when we see that something is misspelled, we notice it. Language models should be able to do this too.” To achieve this goal, adaptive training approaches need to be systematically investigated further: “For example, we could analyze how the model behaves on a large scale. To do this, we would use larger data sets and compare whether the advantages of adaptive MLM only come into play with smaller amounts of data.” Edman also wants to test the simultaneous masking of several words that are related: “This could help convey grammatical concepts even more effectively.”

Opportunity for Stronger Cooperation

Last fall, Edman achieved a major success: At the Conference on Empirical Methods in Natural Language Processing (EMNLP) in Suzhou, China, a leading international conference in the field of empirical language processing and machine language understanding, he won the first prize in the Baby Language Modeling (BabyLM) Challenge. BabyLM refers to a research approach that investigates how language models learn languages with very little training data – similar to a baby, which does not have an infinite amount of data at its disposal.

“The Challenge Award means a lot to me,” says Edman. “It helps to publicize my research and hopefully convinces other people that it is worth looking into this topic. At the same time, it offers the opportunity to collaborate with other expert researchers. This is particularly computationally intensive, so it helps enormously that we have found an efficient method.”

Weitere Infos zu dieser Pressemeldung:

https://www.pressebox.de/newsroom/tumcampus-heilbronn-ggmbh

Unternehmensinformation / Kurzprofil:

Quantum Technology as a Pioneer for Innovation

Against stereotypes and misguided advice at the top

Bereitgestellt von Benutzer: PresseBox
Datum: 17.02.2026 - 08:51 Uhr
Sprache: Deutsch
News-ID 732852
Anzahl Zeichen: 5726

contact information:
Contact person: Kerstin Besemer
Town:

Heilbronn, Germany

Phone: +49 (7131) 26418-501

Kategorie:

Chemical industry

Diese Pressemitteilung wurde bisher 267 mal aufgerufen.

Die Pressemitteilung mit dem Titel:
"When language models learn like babies"
steht unter der journalistisch-redaktionellen Verantwortung von

Die TUM Campus Heilbronn gGmbH (Nachricht senden)

Beachten Sie bitte die weiteren Informationen zum Haftungsauschluß (gemäß TMG - TeleMedianGesetz) und dem Datenschutz (gemäß der DSGVO).

PresseMitteliung löschen Pressemitteilung ändern PresseMitteliung beanstanden

Fresh Ideas for Real-World Challenges ...

At the end of April, 210 master’s students from 55 academic disciplines, divided into 50 teams, gained insights into business practices during the “1,000+” project week at the Technical University of Munich (TUM). Teams from all TUM schools and ...

Optimistic in a Rapidly Changing World ...

Tsz Ching (Jessica) Chan has been pursuing a master’s degree in Management and Digital Technologies at TUM Campus Heilbronn since fall 2025. Since then, she has also been a recipient of the Germany Scholarship for the 2025/26 funding period. The 24 ...

A Student of Many Talents ...

Since the fall of 2025, Zhora Hambaryan has been pursuing a Bachelor of Science degree in Management and Data Science at TUM’s Heilbronn campus. The 19-year-old Armenian was selected as a 2025/26 Deutschlandstipendium recipient, likely in part due ...

When language models learn like babies

Heilbronn, Germany

Chemical industry

Fresh Ideas for Real-World Challenges ...

Optimistic in a Rapidly Changing World ...

A Student of Many Talents ...

Alle Meldungen von Die TUM Campus Heilbronn gGmbH

Neu hier ?

Werbung

Community

Sponsoren

Firmenverzeichniss