Keynote Speakers

Yuval Pinter
Ben-Gurion University of the Negev
Beat them? Join them? Fix them? Tokenization Research in a Downstream World
Despite the crucial position of tokenization in the NLP pipeline, as well as some recent large-scale attention it has been receiving and a growing scientific community, it appears that researchers and users of LLMs are not keen on adapting and incorporating innovation from the tokenization literature into major systems. In this bird's-eye-view talk, I will classify approaches in technical research into those seeking to innovate within the confines of current practice ("join them"), those seeking to break through these confines ("beat them"), and those that embrace a restorative philosophy ("fix them"). Drawing from previous examples of such contrasts in NLP history and the ways they played out, I will offer my recommendation for a way forward, hopefully opening up a meaningful discussion for this workshop and beyond. The talk will feature a wide variety of research artifacts, including work from my lab, covering algorithm development, evaluation studies, analysis, and theoretical contributions.

Desmond Elliott
University of Copenhagen
Insights from Pixel Language Modeling
Language models typically have a multilingual representation bottleneck starting from their tokenizers. As an alternative vision, I will talk about pixel language models, which avoid this bottleneck by rendering text as images. Pixel language models can transfer within and across writing systems without paying a computational cost for their flexibility. I will present a series of models, pretrained on only English text or with multilingual pretraining, at scales ranging from just 5M parameters up to 86M parameters, on downstream tasks in 14 scripts. I will conclude by reflecting on the lessons learned from the early years of this type of tokenization-free representation.

Adrian Łańcucki
NVIDIA
Learning Dynamic Segmentation and Compression of Sequences in Transformer LLMs
Transformer-based LLMs excel at language tasks, but their efficiency hinges on input sequence length. Typically, input resolution—imposed by a tokenizer—remains unchanged across all layers. In this talk, we introduce methods that enable end-to-end learning to dynamically pool, compress, or sparsify input or key-value token sequences. By effectively tracking down and removing redundancies, these methods deliver performance gains during training or inference. We arrive at a surprisingly practical method—Dynamic Memory Sparsification—that allows a model to achieve 8x KV cache compression within just a few hundred training steps. The resulting savings can be used not only to improve throughput and latency, but also to boost accuracy, as demonstrated across several reasoning tasks.