Transcribing Audio to Text (also known as Speech to Text) is very easy using the OpenAI API, just upload an Audio file in one of the following formats: mp3, mp4, mpeg, mpga, m4a, wav, or webm. And the API will return the string.
OpenAI requires to build a request were you pass the audio file, the model, the temperature (to get a more ore less random output... find below a list of the available parameters.
- Filename: (Required) The audio file to transcribe, in one of these formats: mp3, mp4, mpeg, mpga, m4a, wav, or webm.
- Model: (Required) ID of the model to use. Only whisper-1 is currently available.
- Prompt: An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.
- ResponseFormat: The format of the transcript output, in one of these options: json, text, srt, verbose_json, or vtt.
- Temperature: The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.
- Language: The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.
Find below a simple example transcribing an audio file using whisper-1
procedure DoFileTranscription(const aFilename: string); var oRequest: TsgcOpenAIClass_Request_Transcription; oResponse: TsgcOpenAIClass_Response_Transcription; begin oRequest := TsgcOpenAIClass_Request_Transcription.Create; Try oRequest.Filename := aFilename; oRequest.Model := 'whisper-1'; oResponse := OpenAI.CreateTranscriptionFromFile(oRequest); Try DoLog(oResponse.Text); Finally oResponse.Free; End; Finally oRequest.Free; End; end;
Find below the compiled Demo for Windows using the sgcWebSockets OpenAI Delphi Library.
When you subscribe to the blog, we will send you an e-mail when there are new updates on the site so you wouldn't miss them.