In partnership with CGI Estonia, Tilde has developed Salme, a custom automatic speech recognition (ASR) solution for Estonian court hearings. Salme converts human voice to text by providing real-time and offline automated transcription for court hearings of all levels.
User: Estonian Courts (technical customer of the Ministry of Justice: Centre of Registers and Information Systems – RIK).
Industry: Legal and judicial
Challenge
Traditionally, court hearings rely on manual transcription by stenographers and court secretaries, which is a labour-intensive and time-consuming process. Although speech recognition technology has existed for decades, only recent developments and the introduction of machine learning algorithms have made it possible to generate reliable, automated transcripts. This advancement significantly reduces the time and effort needed for the preparation of court documents.
Recognising this, RIK initiated the development of Salme – their custom-made speech recognition solution.
Development
To create Salme, Tilde partnered with CGI Estonia – a global IT innovation company with a strong local presence in Estonia. CGI specialises in software development and system integration, while Tilde is experienced in developing custom AI-driven language technology solutions.
As part of the solution, Tilde developed an automated speech recognition (ASR) system for transcribing court hearings both offline and in real time. CGI was responsible for the Windows Presentation Foundation (WPF) user interface and the integration of the audio recording system with Tilde’s ASR system. Additionally, CGI oversaw Salme’s integration with the Court Information System over X-Road® – an open-source software and ecosystem solution that facilitates unified and secure data exchange between organisations. This enabled seamless and automated information and data exchange.
Salme was developed by training the speech recognition model with specific content provided by the client – more than 800 hours of transcribed audio and over 800 million words worth of textual data.
Since its launch, the solution has been implemented across all national and regional courts in Estonia.
Solution
To ensure efficient and accurate transcription, the ASR solution for Estonian Courts, Salme, incorporates several advanced features:
- Integration into the official Court Information System for smooth operation
- Real-time transcription with minimal delay
- Offline mode for transcribing past court hearings with higher accuracy
- Speaker identification through pre-assigned microphones during court hearings
- Automatic speaker identification if speaker voice samples are available beforehand
- Context-based transcript structuring
- Post-editing capabilities, including adding timestamped notes and adjusting transcripts
- Built-in spell check for enhanced accuracy
- Compliance with strict confidentiality standards and secure transcript storage
ASR quality factors
Salme’s effectiveness and quality of the transcripts depend on the quality of the audio recording and how the ASR model is trained.
Adapting the ASR model requires extensive data, appropriate software development tools, and thorough training to ensure that Salme can produce high-quality transcripts with the correct terminology, phrasing and jargon.
The speech recognition model was trained extensively to ensure highly accurate transcripts with word error rates (WER) between 8–15%. However, this level of accuracy can be achieved with access to high-quality technology and participants speaking in turns. The results might be less optimal in less controlled hearings with all microphones on and simultaneous speech. Additionally, audio quality depends on the ability of the equipment to tackle various acoustic challenges, such as background noise.
Results
The implementation of Salme has significantly improved the efficiency and accuracy of Estonian court hearing transcription.
RIK states, “If a court hearing is conducted with high-quality audio equipment and a word-for-word transcript is needed afterwards, Salme is a valuable tool that helps save time. The technology in all courtrooms is not of the same quality, which directly affects the transcription results. With high-quality audio equipment, the session’s transcription is prepared immediately, requiring only post-transcription checks and corrections if necessary. Feedback from the courts has been positive, as they can access session data as needed, provide information immediately after the hearing, and avoid manually uploading files to the court information system.”
Conclusion
Through the collaborative efforts of CGI Estonia and Tilde, Estonian courts can now get high-quality transcripts in seconds. This is a clear case of how leveraging advanced speech recognition technology can boost efficiency and accuracy where manual labour was previously the norm.
It’s high time for businesses, individuals, and the public sector to allocate their resources wisely and shift to automation wherever possible, ensuring the efficient utilisation of time and resources.
About CGI
Founded in 1976, CGI is among the largest independent IT and business consulting services firms in the world. With 90,000 consultants and professionals across the globe, CGI delivers an end-to-end portfolio of capabilities, from strategic IT and business consulting to systems integration, managed IT and business process services and intellectual property solutions. CGI works with clients through a local relationship model complemented by a global delivery network that helps clients digitally transform their organisations and accelerate results. CGI Fiscal 2023 reported revenue is CA$ 14.30 billion and CGI shares are listed on the TSX (GIB.A) and the NYSE (GIB). Learn more at https://www.cgi.com/ee/et.