Understanding the Certara Generative AI Parameters
Introduction
The Certara Generative AI endpoint has a plethora of parameters that can be used to ensure the responses are to your needs. This guide will go over those parameters.
Swagger UI
All Certara AI endpoints and their parameters can be found at
https://YOUR_LAYAR_ENVIRONMENT/layar/swagger-ui.html
Parameters
The parameters are as follows:
{
"content": "string",
"task": "string",
"messageHistory": [
{
"type": "string",
"content": "string"
}
],
"sources": [{
"rawText": "string",
"documentId": "string",
"savedListId": "string",
"provider": "string"
}],
"max_tokens": 0,
"temperature": 0,
"top_k": 0,
"top_p": 0,
"retriever": {
"document": {
"index_name": "string",
"lookup_filter": "string",
"reranker": true,
"num_hits": 0,
"heat_ratio": 0,
"enable_sectioning": true,
"search_ratio": 0.5,
"relevance_threshold" : 0.5
},
"table": {
"index_name": "string",
"lookup_filter": "string",
"reranker": true,
"num_hits": 0,
"heat_ratio": 0,
"enable_sectioning": false,
"search_ratio": 0,
"relevance_threshold" : 0.5
}
"conversation": {
"chunk_size": 0
},
"summarize": {
"chunk_size": 0
},
"prompts": {
"system": "string",
"summarization": "string"
},
"transientData": true,
"rag_query": "string",
"frequency_penalty": 0.5,
"rag_skip_enabled": False
}
content (required)
The question you are asking CGPT to answer.
Content Word Limit
The prompt you ask can't be infinitely long. Keeping the content to a few paragraphs will ensure accuracy.
task (required)
Two possible strings, either "generate" or "summarize". This indicates to Certara Generative AI what kind of response you want.
messageHistory (optional)
A list of dictionaries and can be used to provide context of previous prompts, which will help dictate the answer given to the current prompt. The parameter requires two additional values, type
and content
.
type
Two possible strings, either "user" or "system". This allows you to categorize the content
as given by the user or Certara Generative AI.
content
The type
will determine if the string is a prompt previously given to Certara Generative AI or an answer to a previous prompt.
messageHistory Example
Here is an example using messageHistory
.
{
"content": "What was the frequency of the dosing of Mesalamine?",
"task": "generate",
"messageHistory": [
{
"type": "user",
"content": "What is the drug or drugs being studied?"
},
{
"type": "system",
"content":"The drug being studied is Mesalamine."
}
]
}
sources (optional)
A List that lets you dictate what document or set is used to generate an answer to your prompt. There are multiple sub-values that can be used.
rawText
Can take any text as a string, Certara Generative AI will use this raw text to generate a response.
documentId
A string that lets you define a specific document you want Certara Generative AI to use. You can use the guide Document Search to find specific document IDs.
savedListId
A string that lets you define a specific set you want Certara Generative AI to use. You can use the guide Create Set to acquire a set ID.
provider
A string that lets you specify a specific provider you want Certara Generative AI to use. This can be your whole environment or a specific provider like PubMed. If a document ID is specified, then the provider of that document needs to be added to this field.
Sources Optional
While
sources
is optional, giving Certara Generative AI specific content to work with will lead to better responses.
Sources Word Limit
Using documents or raw text that are very long may result in inconsistent responses.
max_tokens (optional)
An integer that caps the length of responses. This can be used make responses shorter or more verbose.
temperature (optional)
A floating-point number between 0 and 1. You can use this field to give Certara Generative AI more leeway in how it responds. Running the same prompt with a temperature of 0 will cause Certara Generative AI to return the same response, while a temperature of 1 will allow Certara Generative AI to change the response.
top_k (optional)
An integer that allows Certara Generative AI to return the X top responses. A higher value will Certara Generative AI to use more diverse phrases in the response.
Non-sensical Responses
A higher top_k can result in responses that are non-sensical. Experimenting with different values for top_k will help curate your responses.
top_p (optional)
A floating-point number. A higher top_p tells Certara Generative AI to provide a response that is more diverse while a lower value causes a safer response.
Top_P Suggestions
Adding top_p isn't necessary if you are just looking for a yes or no response. If you are looking for more verbose or detailed responses a higher top_p will help achieve this.
retreiver (optional)
Two dictionaries that contains multiple values. These parameters allow you to determine how information is pulled from RAG, which helps the system retrieve relevant content to assist in answering the prompt. Documents and Tables are stored separately, because of this retriever has parameters for both.
document
index_name
A string specifying where the document is being stored. This value will always beCGPTTextDocumentStore
when retrieving documents from RAG.
Index_name Considerations
In most use cases you will not need to provide an
index_name
. When not provided, the default document or table store will be used.
lookup_filter
A string specifying a documentId
or savedListId
. For more information on how to obtain this please view Upload Document or Create Document Set.
reranker
A boolean value that causes a second sorting of relevant tokens to take place before the tokens are sent to the LLM. Turning this off can help improve the time it takes a response to be generated.
num_hits
An integer that determines how many relevant sections to pull from RAG.
Num_Hits Suggestion
Using higher values can cause unwanted information to be forwarded to the LLM, resulting in unwanted responses. Testing various values will help curate responses to your specific needs.
enable_sectioning
A boolean value that dictates if RAG should return sections that bookend the relevant chunks. This helps ensure relevant information isn't cut off before being forwarded to the LLM.
heat_ratio
A floating-point number between .5 and 1. A lower value causes RAG to return more text from below the original chunk and append that text.
For example, if you use a value of 1, the chunk returned will have no fill from the following section. While a ratio of .5 will fill with proceeding text.
This value is closely related to num_hits
and chunk_size
since it's relying on extra token space in order to fill with surrounding text related to a chunk.
Reliance on Enable_Sectioning
In order for
heat_ratio
value to work,enable_sectioning
needs to be set to true.
search_ratio
A floating-point number between 0 and 1. This value determines what sort of RAG search is used. There are two types of searches "Keyword search" or "Vector search".
Keyword search uses words from the given prompt and looks for phrases in RAG, which dictates what chunks to return to the LLM.
Vector search puts weight on the relevance of a specific token, even if it doesn't fully match words given in the prompt.
If this value isn't supplied, response generation will use a default of 0.5, which indicates balanced use of both search types. The lower the value, the more RAG will favor keyword search and higher will favor vector search.
Score Normalization
When using search_ratio, the ending score that is seen in the API response is normalized. This is because vector and keyword search both return different scores. The value of the search ratio is then used to produce a new score that favors either vector or keyword, depending on the value given.
Search_Ratio Suggestions
It's best to experiment with this value to help curate responses to your specific needs.
relevance_threshold
A floating-point number. This value determines the chunks obtained by RAG. The value defaults to .5 if not provided. Increasing the number will cause RAG to only pull chunks over a specific relevance_threshold, which can cause fewer chunks to be returned.
table
Table Parameters
The
table
parameters are identical to thedocument
parameters. The only difference is thatenable_sectioning
must be false. This means thatheat_ratio
can't be set.
conversation (optional)
A dictionary value that includes one key, chunk_size
chunk_size
An integer that determines the number of tokens fed to Certara Generative AI. In order to generate a summary, Certara Generative AI will break up text given to it. Making chunk_size
too large or too small can result in Certara Generative AI giving unwanted responses.
summarize (optional)
A dictionary value that includes one key, chunk_size
. This parameter works exactly like conversation
but for summarization.
chunk_size
An integer that determines the number of tokens fed to Certara Generative AI. In order to generate a summary, Certara Generative AI will break up text given to it. Making chunk_size
too large or too small can result in Certara Generative AI giving incorrect summaries.
Chunk_size Optional
If you already created an embedding of your sources, you do not have to provide it when forwarding a generate request. If the embedding hasn't been made yet
chunk_size
will go towards making it and all generation using this data going forward will not need achunk_size
.
Chunk_size Considerations
You do not need to supply chunk size if an embedding have already been made on the document or set. For more information on how to create an embedding, review Create Embedding.
prompts (optional)
A dictionary that allows you to provide additional context for Certara Generative AI without inflating the contents of your initial prompt.
system
A string that gives Certara Generative AI further direction on how to respond to your prompt. I.E. "Use a scientific writer tone." or "Use yes or no responses."
summarization
Two possible strings, either "verbose" or "brief". This will result in a longer or shorter summary.
transientData (optional)
A boolean value. If marked true, will causes any provided documents to be discarded after a response is returned. Defaults to false if the value is not provided.
Increased Response Time
If you are setting
transientData
to true, be prepared for increased CGPT response times.
rag_query (optional)
A string that allows you to directly control the retrieval of relevant context to pass to the model during generation. If you give a prompt "What is the patients potassium mmol/l? Only respond in the format of this typescript definition {"answer":float}" you can provide a rag_query
"potassium mmol/l".
Rag_query Tip
rag_query
is good to use when you want to give Certara Generative AI related text to help guide responses, without complicating your prompt. Remember that additional phrases added to the prompt can result in varying responses.
frequency_penalty (optional)
An integer between -2 and 2. The value defaults to 0.5 if not provided. The higher the value, Certara Generative AI will be less likely to utilize phrases that are repeated often. Lower values will result in repeated phrases being utilized when a response is being generated.
frequency_penalty effecting throughput
Using a low frequency_penalty can result in more tokens being generated in a response, which will result in lower throughput. Override the default with caution.
rag_skip_enabled (optional)
A boolean value that tells Certara Generative AI to skip using RAG. This can be useful when you have documents that are short enough to fit the whole text into the context window.
Updated 18 days ago