Skip to main content

Lithuanian audio and video recordings converted to text by AI

With its 125 employees and full-service expertise, Kantar TNS Lithuania is one of the best-established research companies in the Lithuanian market.


The Client

With its 125 employees and full-service expertise, Kantar TNS Lithuania is one of the best-established research companies in the Lithuanian market.

The Challenge

Tilde and Kantar TNS have developed an innovative solution – robots will now help to monitor Lithuanian audio and video recordings and identify words. This AI technology will not only significantly speed up the monitoring of TV and radio recordings, but also improve the efficiency of journalists and stenographers in court hearings and meetings, as well as call center staff.

This automatic speech recognition technology is based on a deep neural network artificial intelligence model to recognize words and certain structures in sentences. More than 250 hours of speech, 58 million Lithuanian sentences, and over 600 thousand words, including special business keywords, have been used to teach the system developed for Kantar TNS.

“In the past, monitoring of TV, radio broadcasts and videos on social networks and the Internet was very labour intensive. We were physically watching dozens of hours of records every day. But the monitoring needs have grown exponentially, as the volume of video content is steadily increasing and audiences prefer various channels. Thus, this cost-efficient technology will help to quickly analyse the ever increasing flow of non-textual information and allow to see the big picture of the communications market,” said Deividas Butkus, the Head of Communications Monitoring and Analysis Department of Kantar TNS.

Solution: Work Eight Times Faster

Renata Špukienė, the Director of Tilde IT – the company that created the solution, says that the biggest challenge was the uniqueness of the language – unlike English, the changing cases of Lithuanian words greatly expand the range of possible meanings. The new technology is capable of quickly converting sound to text and using punctuation marks. Furthermore, the accuracy of recognition in news broadcasts is about 90 percent.

“It used to take one person about four hours to listen to recordings and transcribe them. But the new AI system can do the same work in just half an hour. In addition, multiple channels can be transcribed simultaneously, thus further increasing the volume of the processed material. Once transcribed, the text can be searched by means of keywords, archived, subtitled, etc.,” explained R. Špukienė.

According to D. Butkus, the technology will also considerably simplify the use of archived audio and TV records.

“Audio records not converted to text are to be forgotten sooner or later. After all, search engines generally find only what is included in the title or the description. Thus, for example, it used to be extremely difficult to find, let’s say, a five year old video if you use just a keyword, such as VAT relief. With this innovation, it will be easy. It will give staff more time for analysis and consultation instead of monotonous technical operations,” said the Head of Communications Monitoring and Analysis Department at Kantar TNS. Furthermore, the technology also allows you to easily find a specific place in the recording and can even show the timeline with time titles, indicating which sections contain the required keywords.

Based on the success of Tilde.AI in revolutionizing their multi-channel media monitoring and analysis workflows, Kantar TNS is now looking into ways of using speech recognition to change how media advertising is monitored.

Powerful tool for journalists, courts and call centers

R. Špukienė is convinced that this system can do much more than just increase the speed of media monitoring, “Now that we have seen how useful this system can be at Kantar TNS, it is clear that it has a much wider range of application: faster subtitling of news broadcasts and transcription of interviews or post-event texts for journalists, easier stenography of meetings and hearings – these are just a few areas where this technology can save you time and money.”

According to R. Špukiene, this neural network-based AI technology adapted for the recognition of uninterrupted spoken Lithuanian, is the first of its kind in Lithuania. A prototype of a similar system developed by Tilde for the recognition of spoken Latvian is already in use in Latvia.

Experts from Kantar TNS and Tilde IT predict that with the growing volume of audio and video information, companies and organizations working with data will definitely benefit from this technology. However, according to experts, human participation will still be needed though it will accelerate and improve the processing efficiency of audio and video materials.