When language models learn like babies

When language models learn like babies

ID: 732852

(PresseBox) - Whether in digital assistance systems, text summarization, or programming—wherever language needs to be processed efficiently, AI-supported large language models (LLMs) are used. But these supposed all-rounders have their weaknesses. One of them is that trillions of words are sometimes needed to train a model. This has significant disadvantages, ranging from high costs and enormous energy consumption to greater sensitivity to bias.

In addition, LLMs often fail at tasks that seem trivial to us humans, explains Dr. Lukas Edman, postdoc at Prof. Alexander Fraser's Chair of Data Analytics and Statistics at TUM Campus Heilbronn: “They have difficulty with long-term contexts. For example, if you talk to ChatGPT for a very long time, it often no longer understands what was said some time ago. They have problems with logical thinking—complex tasks have to be broken down into smaller steps. They even fail at very simple tasks: they often can't insert a specific letter in a particular place in a word, or they don't recognize that a sentence can be grammatically correct even though it doesn't make sense in terms of content.”

The young scientist is researching Masked Language Modeling (MLM) – a training method in which individual words in a sentence are masked, i.e., left out. The model is supposed to predict the missing words and thus learn to figure out the meaning from the context. MLM improves the general understanding of sentences and makes it possible to get by with significantly less training data than before. The biggest advantage, in Edman's view, is that “the method is very similar to human learning: when we listen to someone, our brain constantly tries to predict the next word. If our prediction is wrong, we have to adapt and learn from it. This is exactly how the training of the models works – and that makes it easy to implement.”





Refinement Through Selective Masking

But MLM also has disadvantages: with simple sentences, the model learns very quickly which word is missing. For example, if you leave out the word “to” in the sentence “I like to go shopping,” it fills in the gap correctly after just a few attempts. “If such a passage continues to be masked, it does not provide any new insights and costs unnecessary computing time,” says Edman. 

This is where Adaptive MLM comes in – a refinement of standard MLM in which the masked words are specifically selected. “First, we leave out randomly selected words. During training, we check whether the model predicts them correctly. We weight all correctly predicted words lower so that they are masked less frequently in the future. Instead, the training focuses on the difficult cases,“ explains Edman. For example, versatile adjectives or adverbs are more difficult to predict than very common words such as “the” or “and”.

Recognizing Connections Without Large Data Sets

It is often helpful to break words down into tokens—smaller units—or even finer components called subtokens. By splitting the word “walking” into the tokens “walk” and “ing,” the model can recognize the connection between “walk” and “walking” without relying on extremely large amounts of training data. “In fact, there has been some progress here, especially in adjective nominalization – that is, when an adjective such as ‘laughable’ is converted into a noun such as ‘laughability’. We often work with invented adjectives such as ‘wuggable’, which the model is supposed to convert into the noun ‘wuggability’. This teaches it the rule that ‘able’ typically becomes ‘ability’ and not ‘ness,’ for example,” explains Edman. 

The goal is to develop a model that can access all letters in every word: “We humans can do that. We normally ignore this information when reading. But when we see that something is misspelled, we notice it. Language models should be able to do this too.” To achieve this goal, adaptive training approaches need to be systematically investigated further: “For example, we could analyze how the model behaves on a large scale. To do this, we would use larger data sets and compare whether the advantages of adaptive MLM only come into play with smaller amounts of data.” Edman also wants to test the simultaneous masking of several words that are related: “This could help convey grammatical concepts even more effectively.”

Opportunity for Stronger Cooperation

Last fall, Edman achieved a major success: At the Conference on Empirical Methods in Natural Language Processing (EMNLP) in Suzhou, China, a leading international conference in the field of empirical language processing and machine language understanding, he won the first prize in the Baby Language Modeling (BabyLM) Challenge. BabyLM refers to a research approach that investigates how language models learn languages with very little training data – similar to a baby, which does not have an infinite amount of data at its disposal. 

“The Challenge Award means a lot to me,” says Edman. “It helps to publicize my research and hopefully convinces other people that it is worth looking into this topic. At the same time, it offers the opportunity to collaborate with other expert researchers. This is particularly computationally intensive, so it helps enormously that we have found an efficient method.”

Weitere Infos zu dieser Pressemeldung:
Unternehmensinformation / Kurzprofil:
drucken  als PDF  an Freund senden  Quantum Technology as a Pioneer for Innovation Against stereotypes and misguided advice at the top
Bereitgestellt von Benutzer: PresseBox
Datum: 17.02.2026 - 08:51 Uhr
Sprache: Deutsch
News-ID 732852
Anzahl Zeichen: 5726

contact information:
Contact person: Kerstin Besemer
Town:

Heilbronn, Germany


Phone: +49 (7131) 26418-501

Kategorie:

Chemical industry



Diese Pressemitteilung wurde bisher 200 mal aufgerufen.


Die Pressemitteilung mit dem Titel:
"When language models learn like babies"
steht unter der journalistisch-redaktionellen Verantwortung von

Die TUM Campus Heilbronn gGmbH (Nachricht senden)

Beachten Sie bitte die weiteren Informationen zum Haftungsauschluß (gemäß TMG - TeleMedianGesetz) und dem Datenschutz (gemäß der DSGVO).

“Everyone Needs to Belong Somewhere” ...

A diverse crowd of literature enthusiasts and members of the English Theatre Society Heilbronn gathered this Thursday evening at the community center in Heilbronn’s Nordstadt district. The reason: author Cihan Acar was reading from his multi-award- ...

A Good Time for AI Optimists ...

Many people are lonely, Germans tend to be pessimistic, and Artificial Intelligence (AI) presents new challenges for businesses and society. At the end of March at the ‘Bürger-Uni’ in the auditorium of the ‘Bildungscampus’, Prof. Dr Maximili ...

Forward-Looking Strategies for the AI Era ...

More than 350 business leaders, as well as representatives from politics, academia, and civil society, gathered in mid-March at the World Economic Forum’s Industry Strategy Meeting 2026 in Munich. Over two days at Siemens headquarters and SAP Labs ...

Alle Meldungen von Die TUM Campus Heilbronn gGmbH



 

Werbung



Facebook

Sponsoren

foodir.org The food directory für Deutschland
News zu Snacks finden Sie auf Snackeo.
Informationen für Feinsnacker finden Sie hier.

Firmenverzeichniss

Firmen die firmenpresse für ihre Pressearbeit erfolgreich nutzen
1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z