Translating Audio to Text is very easy using the OpenAI API, just upload an Audio file in one of the following formats: mp3, mp4, mpeg, mpga, m4a, wav, or webm. And the API will translate the audio to English.
OpenAI requires to build a request were you pass the audio file, the model, the temperature (to get a more ore less random output... find below a list of the available parameters.
- Filename: (Required) The audio file to transcribe, in one of these formats: mp3, mp4, mpeg, mpga, m4a, wav, or webm.
- Model: (Required) ID of the model to use. Only whisper-1 is currently available.
- Prompt: An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.
- ResponseFormat: The format of the transcript output, in one of these options: json, text, srt, verbose_json, or vtt.
- Temperature: The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.
Find below a simple example translating an audio file using whisper-1
procedure DoTranslateAudio(const aFileName: string); var oRequest: TsgcOpenAIClass_Request_Translation; oResponse: TsgcOpenAIClass_Response_Translation; begin oRequest := TsgcOpenAIClass_Request_Translation.Create; Try oRequest.Filename := aFileName; oRequest.Model := 'whisper-1'; oResponse := OpenAI.CreateTranslationFromFile(oRequest); Try DoLog(oResponse.Text); Finally oResponse.Free; End; Finally oRequest.Free; End; end;
Find below the compiled Demo for Windows using the sgcWebSockets OpenAI Delphi Library.
When you subscribe to the blog, we will send you an e-mail when there are new updates on the site so you wouldn't miss them.