Microsoft ends support for Internet Explorer on June 16, 2022.
We recommend using one of the browsers listed below.
Please contact your browser provider for download and installation instructions.
Today’s AI systems have a limitation. You can take two high-performance language models—both brilliant, both perfectly fluent—and ask them to collaborate, only to watch them fail. It’s not a clash of opinions or a lack of power; it’s that, at a fundamental level, they don't speak the same internal language.
What’s the deal? It’s to do with something called "tokens."
Large language models don’t process text word-by-word; they chop it into smaller chunks, parts of words, numbers, or symbols, to read and generate ideas. Every model has its own unique vocabulary of these tokens, shaped by its specific training. For instance, two models might both understand the sentence "The cat sat on the mat," but internally, they’ve sliced that sentence into entirely different digital pieces.
And this mismatch matters!
Most of the advanced ways to boost AI, such as having models double-check each other or sharing knowledge between them, rely on comparing their predictions step-by-step. If the tokens don’t line up, however, the conversation breaks down. It’s a vocabulary barrier that has, until now, acted as a ceiling on how well multi-model systems can actually function.
NTT’s latest research is trying to break through the ceiling. The company has developed a way for language models to adjust their token vocabularies on the fly, during inference, or the moment the LLM is actually coming up with the next word, without changing their output behavior. Essentially, a model can temporarily "shrink" its vocabulary to match a partner model or a shared set of tokens, all while preserving its original output characteristics.
Crazy, right? On paper, shrinking a vocabulary sounds like a recipe for degraded performance. Usually, if you strip tokens away, the model’s probabilities shift, the phrasing gets weird, and performance drops.
NTT’s breakthrough is both theoretical and experimental: its testing shows that it’s possible to transform a model’s predictions so that a smaller vocabulary produces the same output distribution. The model behaves as if it’s at full strength, even though it’s working with fewer building blocks.
The trick lies in how the model predicts the next token.
Normally, a model weighs every possible option in its massive vocabulary before picking one; NTT’s method recalculates those weights so they can be expressed over a smaller, shared vocabulary, re-expressing its next-token probabilities in real-time. NTT calls it “lossless vocabulary reduction,” which simply means that the quality doesn't drop just because the dictionary got smaller.
Once this reduction is applied, possibilities increase. Different models can finally meet in the middle, using what NTT researchers call a maximum common vocabulary. Instead of one model being forced to adopt another’s system, both can instead slim down to a shared subset and share predictions there. This allows for ensemble inference, where a team of models collaborates on an answer, even if those models were never originally designed to play nice together.
It changes the game for moving knowledge between systems. NTT has previously worked on portable tuning, which is the ability to transplant specialized expertise without retraining a whole model from scratch. That process lives or dies on how models align. But by removing the vocabulary mismatch, its new technique makes transferring skills between independent models far more flexible.
Most companies don’t just use one model; they have different systems for legal, customer service, or data analytics. Getting the models to talk to each other has usually meant building complex pipelines. A shared token layer offers a shortcut, letting models cooperate at the level of their next-token predictions.
And there’s an efficiency win here, too. Previous attempts to bridge models often broke everything down into raw bytes, the tiniest possible units of data. While technically that’s a possible solution, in practical terms it increases computational cost. By allowing models to reduce to an efficient subset of tokens, NTT’s method keeps the speed up while making the connection possible.
Plus, it doesn't require sending models back to school. The adjustment happens entirely at the time of inference, making it easy to plug into existing systems. So models such as NTT’s own tsuzumi can keep their optimized, proprietary vocabularies while still interacting with external models.
NTT is focused on fixing a stubborn, under-the-hood incompatibility. Models no longer need to be built together to work together; they can keep their unique structures and data but still collaborate in a meaningful way.
Think of how convenient a more connected future might be. As AI becomes more specialized, the ability to link models without forcing them to be identical could become more and more important; NTT is working to make that possible.
Innovating a Sustainable Future for People and Planet
For further information, please see this link:
https://group.ntt/en/newsrelease/2026/04/22/260422a.html
If you have any questions on the content of this article, please contact:
Public Relations
NTT
https://tools.group.ntt/en/news/contact/index.php
Daniel O'Connor joined the NTT Group in 1999 when he began work as the Public Relations Manager of NTT Europe. While in London, he liaised with the local press, created the company's intranet site, wrote technical copy for industry magazines and managed exhibition stands from initial design to finished displays.
Later seconded to the headquarters of NTT Communications in Tokyo, he contributed to the company's first-ever winning of global telecoms awards and the digitalisation of internal company information exchange.
Since 2015 Daniel has created content for the Group's Global Leadership Institute, the One NTT Network and is currently working with NTT R&D teams to grow public understanding of the cutting-edge research undertaken by the NTT Group.