From sgcWebSockets 2023.5.0 building Real-Time Translators is more easy using OpenAI APIs and Text-To-Speech APIs from Windows, Google or Amazon.
OpenAI translation building applications offer a multitude of advantages. They provide fast and accurate translations across multiple languages, enabling seamless communication and breaking down language barriers. These applications leverage state-of-the-art machine learning algorithms, ensuring high-quality outputs. Moreover, they can be easily integrated into various platforms, making them versatile and accessible for a wide range of users.
Translator Component
To build a Translator with voice commands, the following steps are required:
- The Microphone Audio must be captured, so a speech to text system is needed to get the text that will be sent to OpenAI.
- Capturing the Microphone Audio is done using the component TsgcAudioRecorderMCI.
- Once we've captured the audio, this is sent to the OpenAI whisper api to convert the audio file to text.
- Once we get the speech to text, now we send the text to OpenAI using the ChatCompletion API.
- The response from OpenAI must be converted now to Speech using one of the following components:
- TsgcTextToSpeechSystem: (currently only for Windows) uses the Windows Speech To Text from Operating System.
- TsgcTextToSpeechGoogle: sends the response from OpenAI to the Google Cloud Servers and an mp3 file is returned which is played by the TsgcAudioPlayerMCI.
- TsgcTextToSpeechAmazon: ends the response from OpenAI to the Amazon AWS Servers and an mp3 file is returned which is played by the TsgcAudioPlayerMCI.
- OpenAIOptions: configure here the OpenAI properties.
- ApiKey: an API key is required to interactuate with the OpenAI APIs.
- LogOptions
- Enabled: if set to true, the API requests will be log into a text file.
- FileName: the filename of the log.
- Organization: an optional OpenAI API field.
- TranslatorOptions: configure here the Translator properties.
- Translation: configure here the OpenAI Translation API settings.
- Model: by default whisper-1
- Translation: configure here the OpenAI Translation API settings.
- AudioRecorder: assign a TsgcAudioRecorder component to capture the microphone audio.
- TextToSpeech: assign a TsgcTextToSpeech component to listen the response from OpenAI.
- OnAudioStart: the event is called when the Audio Starts to being recorded.
- OnAudioStop: the event is called after the Audio Stops Recording.
- OnTranslation: the event is called when receiving a response from OpenAI Translation API with the translation result.
Delphi Code Example
Create a new Translator, using the default Text-To-Speech from Microsoft Windows. Use Start to Start the recording of the audio and Stop to Stop the recording and send the audio to the OpenAI API and translate it.
Delphi Real-Time Translation AI Video
Delphi Translator Application Demo
Find below the source code of the Translator Application Demo showing the main features of the Real-Time Translator built with the sgcWebSockets library for windows.