Understanding the Certara Generative AI Parameters

Introduction

The Certara Generative AI endpoint has a plethora of parameters that can be used to ensure the responses are to your needs. This guide will go over those parameters.

📘
Swagger UI
All Certara AI endpoints and their parameters can be found at https://YOUR_LAYAR_ENVIRONMENT/layar/swagger-ui.html

Parameters

The parameters are as follows:

{
  "content": "string",
  "task": "string",
  "messageHistory": [
    {
      "type": "string",
      "content": "string"
    }
  ],
  "sources": [{
    "rawText": "string",
    "documentId": "string",
    "savedListId": "string",
    "provider": "string"  
  }],
  "max_tokens": 0,
  "temperature": 0,
  "top_k": 0,
  "top_p": 0,
  "retriever": {
    "document": {
      "index_name": "string",
      "lookup_filter": "string",
      "reranker": true, 
      "num_hits": 0, 
      "heat_ratio": 0,
      "enable_sectioning": true,
      "search_ratio": 0.5,
      "relevance_threshold" : 0.5,
      "recreate_index" : True
    },
    "table": {
      "index_name": "string",
      "lookup_filter": "string",
      "reranker": true,
      "num_hits": 0,
      "heat_ratio": 0,
      "enable_sectioning": false,
      "search_ratio": 0,
      "relevance_threshold" : 0.5,
      "recreate_index" : True
    }
  "conversation": {
    "chunk_size": 0
  },
  "summarize": {
    "chunk_size": 0
  },
  "prompts": {
    "system": "string",
    "summarization": "string"
  },
  "transientData": true,
  "rag_query": "string",
  "frequency_penalty": 0.5,
  "rag_skip_enabled": False
}

content (required)

The question you are asking CGPT to answer.

🚧
Content Word Limit
The prompt you ask can't be infinitely long. Keeping the content to a few paragraphs will ensure accuracy.

task (required)

Two possible strings, either "generate" or "summarize". This indicates to Certara Generative AI what kind of response you want.

messageHistory (optional)

A list of dictionaries and can be used to provide context of previous prompts, which will help dictate the answer given to the current prompt. The parameter requires two additional values, type and content.

type

Two possible strings, either user or system. This allows you to categorize the content as given by the user or Certara Generative AI.

content

The type will determine if the string is a prompt previously given to Certara Generative AI or an answer to a previous prompt.

messageHistory Example

Here is an example using messageHistory.

{
  "content": "What was the frequency of the dosing of Mesalamine?",
  "task": "generate",
  "messageHistory": [
    {
      "type": "user",
      "content": "What is the drug or drugs being studied?"
    },
    {
      "type": "system",
      "content":"The drug being studied is Mesalamine."
    }                
                      ]
}

sources (optional)

A List that lets you dictate what document or set is used to generate an answer to your prompt. There are multiple sub-values that can be used.

rawText

Can take any text as a string, Certara Generative AI will use this raw text to generate a response.

documentId

A string that lets you define a specific document you want Certara Generative AI to use. You can use the guide Document Search to find specific document IDs.

savedListId

A string that lets you define a specific set you want Certara Generative AI to use. You can use the guide Create Set to acquire a set ID.

provider

A string that lets you specify a specific provider you want Certara Generative AI to use. This can be your whole environment or a specific provider like PubMed. If a document ID is specified, then the provider of that document needs to be added to this field.

🚧
Sources Optional
While sources is optional, giving Certara Generative AI specific content to work with will lead to better responses.

🚧
Sources Word Limit
Using documents or raw text that are very long may result in inconsistent responses.

max_tokens (optional)

An integer that caps the length of responses. This can be used make responses shorter or more verbose.

temperature (optional)

A floating-point number between 0 and 1. You can use this field to give Certara Generative AI more leeway in how it responds. Running the same prompt with a temperature of 0 will cause Certara Generative AI to return the same response, while a temperature of 1 will allow Certara Generative AI to change the response.

top_k (optional)

An integer that allows Certara Generative AI to return the X top responses. A higher value will Certara Generative AI to use more diverse phrases in the response.

❗️
Temperature with Top_k
If temperature is set to 0 the top_k is automatically set to -1.

🚧
Non-sensical Responses
A higher top_k can result in responses that are non-sensical. Experimenting with different values for top_k will help curate your responses.

top_p (optional)

A floating-point number. A higher top_p tells Certara Generative AI to provide a response that is more diverse while a lower value causes a safer response.

👍
Top_P Suggestions
Adding top_p isn't necessary if you are just looking for a yes or no response. If you are looking for more verbose or detailed responses a higher top_p will help achieve this.

retreiver (optional)

Two dictionaries that contains multiple values. These parameters allow you to determine how information is pulled from RAG, which helps the system retrieve relevant content to assist in answering the prompt. Documents and Tables are stored separately, because of this retriever has parameters for both.

document

index_name

A string specifying where the document is being stored. This value will always beCGPTTextDocumentStore when retrieving documents from RAG.

🚧
Index_name Considerations
In most use cases you will not need to provide an index_name. When not provided, the default document or table store will be used.

lookup_filter

A string specifying a documentId or savedListId . For more information on how to obtain this please view Upload Document or Create Document Set.

reranker

A boolean value that causes a second sorting of relevant tokens to take place before the tokens are sent to the LLM. Turning this off can help improve the time it takes a response to be generated.

num_hits

An integer that determines how many relevant sections to pull from RAG.

👍
Num_Hits Suggestion
Using higher values can cause unwanted information to be forwarded to the LLM, resulting in unwanted responses. Testing various values will help curate responses to your specific needs.

enable_sectioning

A boolean value that dictates if RAG should return sections that bookend the relevant chunks. This helps ensure relevant information isn't cut off before being forwarded to the LLM.

heat_ratio

A floating-point number between .5 and 1. A lower value causes RAG to return more text from below the original chunk and append that text.

For example, if you use a value of 1, the chunk returned will have no fill from the following section. While a ratio of .5 will fill with proceeding text.

This value is closely related to num_hitsand chunk_size since it's relying on extra token space in order to fill with surrounding text related to a chunk.

📘
Reliance on Enable_Sectioning
In order for heat_ratio value to work, enable_sectioning needs to be set to true.

search_ratio

A floating-point number between 0 and 1. This value determines what sort of RAG search is used. There are two types of searches "Keyword search" or "Vector search".

Keyword search uses words from the given prompt and looks for phrases in RAG, which dictates what chunks to return to the LLM.

Vector search puts weight on the relevance of a specific token, even if it doesn't fully match words given in the prompt.

If this value isn't supplied, response generation will use a default of 0.5, which indicates balanced use of both search types. The lower the value, the more RAG will favor keyword search and higher will favor vector search.

🚧
Score Normalization
When using search_ratio, the ending score that is seen in the API response is normalized. This is because vector and keyword search both return different scores. The value of the search ratio is then used to produce a new score that favors either vector or keyword, depending on the value given.

👍
Search_Ratio Suggestions
It's best to experiment with this value to help curate responses to your specific needs.

relevance_threshold

A floating-point number. This value determines the chunks obtained by RAG. The value defaults to .5 if not provided. Increasing the number will cause RAG to only pull chunks over a specific relevance_threshold, which can cause fewer chunks to be returned.

recreate_index

A boolean value which allows you delete all relevant vector chunks related to the sources used in the generate request.

table

📘
Table Parameters
The table parameters are identical to the document parameters. The only difference is that enable_sectioning must be false. This means that heat_ratio can't be set.

conversation (optional)

A dictionary value that includes one key, chunk_size

chunk_size

An integer that determines the number of tokens fed to Certara Generative AI. In order to generate a summary, Certara Generative AI will break up text given to it. Making chunk_size too large or too small can result in Certara Generative AI giving unwanted responses.

summarize (optional)

A dictionary value that includes one key, chunk_size. This parameter works exactly like conversation but for summarization.

chunk_size

📘
Chunk_size Optional
If you already created an embedding of your sources, you do not have to provide it when forwarding a generate request. If the embedding hasn't been made yet chunk_sizewill go towards making it and all generation using this data going forward will not need a chunk_size.

🚧
Chunk_size Considerations
You do not need to supply chunk size if an embedding have already been made on the document or set. For more information on how to create an embedding, review Create Embedding.

prompts (optional)

A dictionary that allows you to provide additional context for Certara Generative AI without inflating the contents of your initial prompt.

system

A string that gives Certara Generative AI further direction on how to respond to your prompt. I.E. "Use a scientific writer tone." or "Use yes or no responses."

summarization

Two possible strings, either "verbose" or "brief". This will result in a longer or shorter summary.

transientData (optional)

A boolean value. If marked true, will causes any provided documents to be discarded after a response is returned. Defaults to false if the value is not provided.

🚧
Increased Response Time
If you are setting transientData to true, be prepared for increased CGPT response times.

rag_query (optional)

A string that allows you to directly control the retrieval of relevant context to pass to the model during generation. If you give a prompt "What is the patients potassium mmol/l? Only respond in the format of this typescript definition {"answer":float}" you can provide a rag_query "potassium mmol/l".

👍
Rag_query Tip
rag_query is good to use when you want to give Certara Generative AI related text to help guide responses, without complicating your prompt. Remember that additional phrases added to the prompt can result in varying responses.

frequency_penalty (optional)

An integer between -2 and 2. The value defaults to 0.5 if not provided. The higher the value, Certara Generative AI will be less likely to utilize phrases that are repeated often. Lower values will result in repeated phrases being utilized when a response is being generated.

🚧
frequency_penalty effecting throughput
Using a low frequency_penalty can result in more tokens being generated in a response, which will result in lower throughput. Override the default with caution.

rag_skip_enabled (optional)

A boolean value that tells Certara Generative AI to skip using RAG. This can be useful when you have documents that are short enough to fit the whole text into the context window.

Introduction

📘Swagger UI

Parameters

content (required)

🚧Content Word Limit

task (required)

messageHistory (optional)

type

content

messageHistory Example

sources (optional)

rawText

documentId

savedListId

provider

🚧Sources Optional

🚧Sources Word Limit

max_tokens (optional)

temperature (optional)

top_k (optional)

❗️Temperature with Top_k

🚧Non-sensical Responses

top_p (optional)

👍Top_P Suggestions

retreiver (optional)

document

index_name

🚧Index_name Considerations

lookup_filter

reranker

num_hits

👍Num_Hits Suggestion

enable_sectioning

heat_ratio

📘Reliance on Enable_Sectioning

search_ratio

🚧Score Normalization

👍Search_Ratio Suggestions

relevance_threshold

recreate_index

table

📘Table Parameters

conversation (optional)

chunk_size

summarize (optional)

chunk_size

📘Chunk_size Optional

🚧Chunk_size Considerations

prompts (optional)

system

summarization

transientData (optional)

🚧Increased Response Time

rag_query (optional)

👍Rag_query Tip

frequency_penalty (optional)

🚧frequency_penalty effecting throughput

rag_skip_enabled (optional)

📘
Swagger UI

🚧
Content Word Limit

🚧
Sources Optional

🚧
Sources Word Limit

❗️
Temperature with Top_k

🚧
Non-sensical Responses

👍
Top_P Suggestions

🚧
Index_name Considerations

👍
Num_Hits Suggestion

📘
Reliance on Enable_Sectioning

🚧
Score Normalization

👍
Search_Ratio Suggestions

📘
Table Parameters

📘
Chunk_size Optional

🚧
Chunk_size Considerations

🚧
Increased Response Time

👍
Rag_query Tip

🚧
frequency_penalty effecting throughput