From sgcWebSockets 2023.5.0 building Voice ChatBots is more easy using OpenAI APIs and Text-To-Speech APIs from Windows, Google or Amazon.
Chatbots and Virtual Assistants are Applications that can converse with humans in a natural, human-like manner. These can be used for customer support, handling queries, and providing information on a website or mobile app.
ChatBot Component
To build a ChatBot with voice commands, the following steps are required:
- The Microphone Audio must be captured, so a speech to text system is needed to get the text that will be sent to OpenAI.
- Capturing the Microphone Audio is done using the component TsgcAudioRecorderMCI.
- Once we've captured the audio, this is sent to the OpenAI whisper api to convert the audio file to text.
- Once we get the speech to text, now we send the text to OpenAI using the ChatCompletion API.
- The response from OpenAI must be converted now to Speech using one of the following components:
- TsgcTextToSpeechSystem: (currently only for Windows) uses the Windows Speech To Text from Operating System.
- TsgcTextToSpeechGoogle: sends the response from OpenAI to the Google Cloud Servers and an mp3 file is returned which is played by the TsgcAudioPlayerMCI.
- TsgcTextToSpeechAmazon: ends the response from OpenAI to the Amazon AWS Servers and an mp3 file is returned which is played by the TsgcAudioPlayerMCI.
Properties
- OpenAIOptions: configure here the OpenAI properties.
- ApiKey: an API key is required to interactuate with the OpenAI APIs.
- LogOptions
- Enabled: if set to true, the API requests will be log into a text file.
- FileName: the filename of the log.
- Organization: an optional OpenAI API field.
- ChatBotOptions: configure here the ChatBot properties.
- Transcription: configure here the OpenAI Transcription API settings.
- Model: by default whisper-1
- Language: the language code of the transcription (helps the model to transcribe better the speech to text).
- Chatcompletion: configure here the OpenAI ChatCompletion API settings.
- Model: by default gpt-3.5-turbo.
- AudioRecorder: assign a TsgcAudioRecorder component to capture the microphone audio.
- TextToSpeech: assign a TsgcTextToSpeech component to listen the response from OpenAI.
Events
- OnAudioStart: the event is called when the Audio Starts to being recorded.
- OnAudioStop: the event is called after the Audio Stops Recording.
- OnTranscription: the event is called when receiving a response from OpenAI Transcription API with the Speech-To-Text result.
- OnChatCompletion: the event is called when receiving a response from the OpenAI ChatCompletion API with the Content text.
Delphi Code Example
Create a new ChatBot, using the default Text-To-Speech from Microsoft Windows. Use Start to Start the recording of the audio and Stop to Stop the recording and send the audio to the OpenAI API and return a response from ChatGPT.
// ... create the chatbot component
sgcChatBot := TsgcAIOpenAIChatBot.Create(nil);
sgcChatBot.OpenAIOptions.ApiKey := 'your_openapi_api_key';
sgcChatBot.ChatBotOptions.Transcription.Language := 'en';
// ... create audio recorder and tex-to-speech
sgcAudioRecorder := TsgcAudioRecorderMCI.Create(nil);
sgcTextToSpeech := TsgcTextToSpeechSystem.Create(nil);
// ... assign audio components to chatbot
sgcChatBot.AudioRecorder := sgcAudioRecorder;
sgcChatBot.TextToSpeech := sgcTextToSpeech;
// ... start the chatbot, speak with a microphone to capture the audio and stop to process the audio
sgcChatBot.Start;
... speak
sgcChatBot.Stop;
Delphi ChatGPT ChatBot Video
Delphi ChatBot Application Demo
Find below the source code of the ChatBot Application Demo showing the main features of the ChatBot built with sgcWebSockets library for windows.