TildeLM: foundational LLM for a multilingual Europe

Supported by:

Large AI model for Europe's languages

We’re building an open foundational LLM (large language model) for underrepresented European languages – a base you’ll be able to fine-tune for your specific needs. Customisable, secure, and built with European language data at its core.

June 2024

Tilde wins
Large AI Grand Challenge 🙌

September 2024

Access to LUMI supercomputer obtained

March 2025

Model training
begins

October 2025

Model goes live
on Hugging Face 🎉

Your language deserves better AI

Most AI models are built for the world’s major languages – and over 90% of LLM training data is in English. That means Baltic, Slavic, and other European languages are left behind, leading to lower accuracy, weaker cultural understanding, and limited access to high-quality AI tools.

We're making it happen

That’s why we’re developing TildeLM – an open-source foundational large language model with over 30 billion parameters, built to support all European languages. Once released, you’ll be able to fine-tune it to your own needs and deploy it securely – locally or in the cloud – to build trustworthy AI that actually speaks your language.

billion parameters

focus languages

GPU hours on LUMI

Why TildeLM?

Customisable with your own data
Secure and fully in your control
Deployable on-premises or in the cloud
Integrates with existing systems and workflows
Built as a foundation for advanced AI solutions

The AI foundation you can trust

TildeLM is more than a technological achievement. It’s an open-source foundation for custom AI, benefiting over 155 million Europeans.

Custom AI solutions for businesses and organisations 💼

Adapt TildeLM to your industry, data, and workflows — from virtual assistants to secure translation, speech tech, and more.

National language model development for governments 🏛️

Build inclusive language models that serve public needs, promote digital sovereignty, and support all official EU languages.

Powered by LUMI, backed by Europe

The development of TildeLM is supported by the European Commission and powered by LUMI – the fastest supercomputer in Europe. By winning the Large AI Grand Challenge, we’ve been granted 2 million GPU hours on LUMI to execute this ambitious project.

Contribute to a multilingual future

To build a strong multilingual LLM with over 30B parametrs, we’re looking for language data from across Europe. We welcome contributions from authors, publishers, state libraries, and other partners – with flexible terms that work for you.

Data providers that have already contributed to the project:

Our promise

Committing to open collaboration

Governments can leverage TildeLM to create tailored language models that improve public service accessibility for all citizens.

Open access

TildeLM will be available for both commercial and non-commercial use under a permissive license, published in Hugging Face and ELRC-SHARE.

Integrity and security

We guarantee that TildeLM is safe and free from harmful or inaccurate content, ensuring its reliability for a variety of public use cases.

Knowledge sharing

We are committed to collaboration and sharing insights, inviting partners to work with us in advancing TildeLM for the benefit of all.

Stay in the loop

Leave your email to get notified when TildeLM goes live on Hugging Face.

Frequently asked questions

What is TildeLM?

The TildeLM project aims to create a multilingual foundational large language model that focuses on underrepresented Baltic and Eastern European languages to promote digital equity and enhance access to advanced AI technologies for these communities.

Why is language equity in LLMs important?

This imbalance has efficiency and cost consequences. For instance, longer sequences are required to encode the same amount of information in lower-resourced languages compared to English, making models less efficient and more expensive to run. Additionally, the English-centricity of these models can introduce undesirable cultural biases. TildeLM will be trained to ensure equity for all supported languages.

What languages does the TildeLM project focus on?

The project targets Eastern European and Baltic languages such as Bulgarian, Croatian, Czech, Estonian, Finnish, Latvian, Lithuanian, Macedonian, Montenegrin, Polish, Serbian, Slovak, Slovene, and Ukrainian. The model will also support bigger languages such as English, French, German and Russian in balanced proportions to support translation and related multilingual tasks.

What does a “foundational model” mean?

A foundational model is a large, general-purpose AI model trained on a broad range of data. It serves as the “base” for building more specialised tools like internal virtual assistants, chatbots, or industry-specific AI applications. Once trained, it can be fine-tuned with specific data to perform targeted tasks more accurately and reliably.

What is the LUMI supercomputer?

The LUMI (Large Unified Modern Infrastructure) supercomputer is the fifth fastest supercomputer globally and the fastest in Europe. It is part of the EuroHPC Joint Undertaking, a collaborative effort involving the European Union and European countries to create a world-class high-performance computing (HPC) ecosystem in Europe. The LUMI supercomputer is located in Kajaani, Finland.

What is the Large AI Grand Challenge?

The purpose of the Large AI Grand Challenge, funded by the European Commission, is to expand European AI frontiers by harnessing the potential of large-scale AI models. The participants in the competition were innovative startups and SMEs with the technical capacity to develop AI models that boost Europe’s competitiveness in Generative AI. The European Commission has announced the winners of the Large AI Grand Challenge. Four innovative AI companies from Europe, including Tilde, will share a prize of €1 million and 8 million computational hours to advance Europe's leadership in AI development.

What is Tilde?

Tilde is a leading European language technology innovator and service provider with a mission to promote language diversity in the digital age. Tilde has over 150 employees in three offices located in Riga, Vilnius, and Tallinn. Tilde’s research team is comprised of nine PhDs and their research associates and has authored over 260 scientific publications. Over the years, Tilde has developed a vast R&D partnership network with leading EU research centres and universities and serves as a language technology research hub for the Baltic region. Most recent research and development activities of Tilde are focused on foundational large language models (LLMs), fine-tuning of LLMs for downstream applications, and integration of instruction-tuned LLMs in natural language processing applications (e.g., machine translation, virtual assistants, retrieval-augmented generation systems, processing of spoken language, summarisation, etc.).