Today, quality transcription is essential for extracting relevant information and implementing strategic actions, such as improving the customer experience and detecting trends. Beyond simply minimising errors, effective transcription means precise adaptation to the specific context of each brand.
At uh!ive, we have developed the Language Model Factory (LMF), a fundamental innovation for us and our customers. This tailor-made solution is based on language models adapted to each customer, ensuring accurate transcription and a detailed understanding of the nuances specific to each conversation.
The importance of quality transcription also extends to the field of natural language speech recognition systems and the interpretation of conversations between humans and robots, where accurate transcription becomes essential to ensure fluid communication and mutual understanding.
The uh!ive Language Model Factory is thus positioned as an essential lever for relevant analysis of telephone interactions, while meeting the challenges posed by constantly evolving man-machine interfaces.
The evolution of language models at uh!ive
The transformation of a speech signal into a usable transcription, through automatic speech recognition (ASR), involves a succession of crucial stages. To simplify matters, the starting point is the phonemes, the sound elements of language, which when grouped together (referred to as n-grams) will have a statistical weight, ultimately resulting in an expression containing the words of the final transcription. The language model (LM) plays a central role in converting these phonemes into expressions, and therefore into words, thus generating the sentences retrieved in our APIs* or Web interfaces.
Let’s explore the importance of these adapted models further after a brief look back:
There was a time, a few years ago, when the manual creation of a specific set of linguistic data for each customer was unavoidable. In those days, this involved not only painstakingly creating suitable data, but also subjecting it to a machine-learning process that involved servers running non-stop for days on end.
This dual constraint meant that the operation was not only excessively costly for our customers, but also subject to considerable delays, limiting the responsiveness required in an environment where speed is essential.
Fortunately, in 2021 we successfully introduced our Language Model Factory (LMF). This innovation now enables us to develop customised language models in just a few hours, once the user’s needs and use case have been clarified.
The Language Model Factory
How did we manage to reduce computing time to such an extent, and above all do away with the need to compile corpuses by hand? For each language (French, English, Spanish…) we have a basic model that serves as a framework, easily reusable, to which we add the corpus according to the desired industry (travel, energy, insurance…). But as each customer is unique, we can also easily add brand-specific lingo to this cocktail: product names, competitor names, idiomatic expressions specific to the sector, etc. It is also possible to add borrowings from foreign languages (e.g. English check-in often used instead of French enregistrement in the airline or hotel industry).
It is important to note that the LMF produces a true “decoding graph”. This means that the customer’s specific expressions are an integral part of automated learning, and are therefore present in the statistical model that is this decoding graph. It does not happen within a post-processing phase, following transcription, which would replace pieces of initial text with the customer’s expressions when there is homophony.
This significant advance in terms of the efficiency and accessibility of language model customisation removes the long lead times previously associated with this process, providing a precise response to our customers’ requirements and offering flexibility over time, adapted to the ongoing evolution of our customers (new products, new processes, etc.). The Language Model Factory enables us to continuously improve our models.
Exploring practical use cases
The adaptation of language models is crucial in different situations, to ensure optimal contextual understanding and fluid interactions between individuals or with robots.
- Industry-specific: When conversations are linked to specific areas of activity, such as the medical, legal, financial or other specialist industries, adapting the language model enables technical and specialist terms specific to each sector to be accurately recognised and transcribed.
- Business lingo: In professional environments where specific jargon is used, such as in energy, telecoms or other specialist fields, adapting the language model ensures correct recognition of these particular terms, contributing to more accurate transcription.
- Proper nouns and products: When proper nouns, such as company names, product names or personal names, are frequently mentioned, adapting the language model ensures a qualitative transcription of these elements, thus avoiding recognition errors.
- Casual language and localised expressions: During informal discussions, or to take account of local expressions in certain countries (flat versus apartment…), adapting the language model is crucial to capturing and transcribing these linguistic subtleties accurately and contextually.
- Multilingual environments: When conversations involve the frequent use of foreign terms (faux pas, wanderlust, etc.), adapting the language model makes it easier to transcribe these elements accurately, thus avoiding confusion and recognition errors.
- Evolving vocabulary and concepts: For sectors where terms and concepts evolve rapidly, such as technology or emerging industries, continuous adaptation of the language model ensures that it remains up to date, providing a relevant and accurate transcription over time.
In more concrete terms, here are a few case studies illustrating the benefits of adapting language models.
If you want to effectively identify the products most frequently mentioned in telephone conversations, whether to capitalise on a success or to solve a production problem, accurate recognition of product names, or terms used in your industry, is crucial, eliminating the need to use alternatives such as aliases.
For example, depending on your organisation activity, the transcription of terms such as “your right to protest” or “you’re right to protest” may have different implications for your analyses. There’s no need to use an alias, i.e. to keep a bad transcription and accept it, thanks to the creation of a suitable model via the Language Model Factory (LMF). This greatly simplifies the integration of the correct spelling according to context, eliminating any confusion between real terms and aliases.
Another example is the use of a phone-bot to handle requests outside working hours: in this case, the use of an adapted language model offers a significant advantage in correctly interpreting the caller’s intention.
For example, as an insurance company, it is in your interest to guarantee accurate recognition of the names of toll gates or towns in order to offer high-quality car assistance. Similarly, a transport authority in a metropolis would benefit from the availability of the names of stops on its network in a dedicated Language Model.
By enriching the LMF with a list of relevant keywords, ranging from a few dozen to a few hundred, we statistically reinforce these expressions. As a result, they stand out in the transcription, making it easier to orchestrate the phone-bot.. This approach guarantees better contextual understanding, contributing to more precise and effective interactions.
Towards the quest for excellence in automated conversational transcription
At uh!ive, we are working on several aspects to constantly improve the quality of our transcriptions. In particular, we are making progress on improving the application of grammatical rules, such as conjugation and plurals. We are also exploring the process of converting sound directly into words.
We are also determined to involve our customers more in our process. We want them to understand how we build our models, making our approach more transparent. By working closely with our customers, we aim to make updating our models faster and more flexible, adapting to their changing needs.
In short, at uh!ive, innovation and continuous improvement are essential. Our aim is to push back the boundaries of technology to provide ever better transcriptions, better adapted to context, and more easily incorporating feedback from our customers. We are determined to evolve constantly, convinced that each advance contributes to redefining the standards of speech analysis and automated transcription.
Footnotes
*API: Application Programming Interface, https://en.wikipedia.org/wiki/API