When we ask to OpenAI a question that requires some specific context, example:
Who is my father?
OpenAI can either hallucinate or answer that he doesn't know.
To help OpenAI answer specific questions, you can provide extra contextual information in the prompt itself.
My father lives in Barcelona and is 50 year's old.
If we ask again to openAI the same question, OpenAI will answer including the prompt provided with contextual information.
Embeddings
OpenAI provides a capability known as text embeddings to measure the relatedness of text strings.For every block of text, chapter, or subject we can send that information to OpenAI's Embedding service to receive back its embedding data (i.e. a vector list of floating-point numbers). Example of request:
TsgcHTTP_API_OpenAI._CreateEmbeddings('text-embedding-ada-002', 'My father lives in Barcelona and is 50 year's old.');
And the response from openAI will be something like this:
{
"data": [
{
"embedding": [
-0.006929283495992422,
-0.005336422007530928,
...
-4.547132266452536e-05,
-0.024047505110502243
],
"index": 0,
"object": "embedding"
}
],
}
Once we have collected the special data that represents the different pieces of information we want our chatbot to understand, we need to save it in a safe place (like a vector database). Remember, we only do this step one time. We get this special data for the information once, and only if the information changes, we will update it.
Finally, when we want to ask a question to the chatbot, first we convert the query to a vector and with the result we search into the previously created database which vector is the most similar to our query, once found, we add the prompt of the most similar vector to the question as an embedding.
Simple Example
Let's create a simple example to use embeddings and sgcWebSockets library. First we will describe our family and calculate the vector for every one.
The previous results can be stored into a table where every row is an embedding with the prompt and the vector data.
Prompt | Vector |
My father lives in Barcelona and is 50 year's old. | [0.000742552,-0.0049907574...] |
My mather lives in Berlin and is 47 year's old. | [-0.027452856,-0.0023051118...] |
My sister lives in Seoul and is 28 year's old. | [-0.007873567,-0.014787777...] |
Now that we've stored our vectors, we will convert the question we will send to chatgpt into a vector
Then we will search this vector into the database to detect which one is the most similar with the question. Find below a pseudo-code example:
Finally, we ask to chatgpt adding the embedding found as a contextual information.