

The API response is only a text response. Rev.ai and Assembly AI are not implemented yet on Eden AI, so we use their API directly. With few lines of code, we can have access to the results from the 4 providers.

In fact, the Eden AI Speech-to-Text API allows to get the 4 providers APIs results with only one simple request. Of course, for a real project you will need to test on a representative part of your database (not only one audio) to have the right view about different performance.įor GCP, AWS, Azure and Watson, we do not need to use their API directly. For each use case, we tested the Speech-to-Text API from the 6 providers, with one audio per use case. We chose 3 use cases with different speakers and speeches. In this article, we are going to test different Speech-to-Text APIs with different types of audios representing common use cases. It is interesting to note that some other solutions and open source solutions exist.Īs said previously, Speech-to-Text APIs are used in hundreds of fields, for many various use cases. This is the pull of providers APIs we are going to test. Google Cloud Platform Speech-to-Text API.The aim is to show which problems can be solved with this kind of API ? Who are the main providers on the market ? What is the optimal process when using pre-trained APIs ?ĭuring our study on Speech-to-Text pre-trained APIs, we decided to choose 6 providers APIs that provide high performance according to many blog articles and rankings. This article briefly treats pre-trained Speech-to-Text APIs. Many solutions are based on several functionalities combined. This list does not represent an exhaustive list of all speech recognition functionalities. Speech Translation: allows to translate an audio speech from a specific language into an audio speech from another language.Speech Diarization: Allows you to identify and differentiate the different speakers speaking in the same audio (by accents, specificities, etc.).Speech analysis: allows to analyze an audio speech in order to extract information such as: gender, age, emotions of the speaker.Text-to-speech: allows you to transcribe a text into audio.Speech-to-text: allows you to transcribe audio into text.Speech recognition includes various functionalities : This popularity is due to the huge diversity of applications and needs : call center, broadcasting, traduction, health care, banking, voice assistant, etc. In recent years, within the world of Artificial Intelligence, one of the most popular applications is Speech recognition. We test these solutions on various relevant use cases. In this article, we test several pre-trained Speech-to-Text APIs. We rallow you to test and use in production a large number of AI engines from different providers directly through our API and platform. This article is brought to you by the Eden AI team.
