Document Search
Pre-Reqs
Before a document search can be done the API requests must be authenticated. Make sure you have already followed the instructions for importing dependencies and authentication from the Getting Started Guide.
Check Your Imported Modules
Make sure you have imported the
requests
andjson
module before proceeding with this guide.
header = {'Accept': 'application/json',
'Content-Type': 'application/json',
'Authorization': f"Bearer {token}",
'X-Vyasa-Client': 'layar',
'X-Vyasa-Data-Providers' : 'sandbox.certara.ai',
'X-Vyasa-Data-Fabric' : 'YOUR_FABRIC_ID'
}
The next thing you'll need is a list of all the data providers you would like to search within. You can search for data providers using a GET request to /layar/node
endpoint. The JSON of the request would look like this.
searchProviderUri = f'{envUrl}/layar/node'
body = {
'all' : True
}
response = requests.get(searchProviderUri,
headers = header,
json = body
)
providers = response
We can now use the list of data providers for the x-vyasa-data-providers
parameter in the header
that gets forwarded with each request. We've been using a pre-defined header for the previous guides, let's review that header again.
header = {'Accept': 'application/json',
'Content-Type': 'application/json',
'Authorization': f"Bearer {token}",
'X-Vyasa-Client': 'layar',
'X-Vyasa-Data-Providers' : 'sandbox.certara.ai',
'X-Vyasa-Data-Fabric' : 'YOUR_FABRIC_ID'
}
'X-Vyasa-Client': 'layar'
indicates that we are querying Layar. 'X-Vyasa-Data-Providers' : 'sandbox.certara.ai'
denotes that we are querying the provider, sandbox.certara.ai, for data. 'X-Vyasa-Data-Fabric' : 'YOUR_FABRIC_ID'
takes the specific data fabric you want to perform the search on. In order to get the fabric ID, follow the Fabric ID Recipe
Searching for Documents
Now that we have our providers added to the header
we can create the body of the search. The endpoint we will be using is /layar/sourceDocument/search
. Utilizing Swagger we can see what possible values can be used to filter the search.
There are a lot of parameters we use to dictate the contents of our search; a generic request would look like this.
searchDocUri = f'{envUrl}/layar/sourceDocument/search'
response = requests.post(searchDocUri,
headers = header
)
Caution
Notice that we are not forwarding any JSON data with the header, this means we will get back ALL documents under that data provider. This could take some time.
The sections below will go over how to search on specific parameters. Over the examples we will build out the body variable
with JSON, each part narrowing down the search.
Search By IDs
If you have access to the specific document Layar IDs you would like to request, perhaps from another API call, you can add them as a list of strings to the ids
property.
body = {
'ids' : [
'id1','id2'
]
}
Search By Terms
in most cases you won't already have document ids to go off of, so you may want to search on terms in order to acquire documents. To do so we will use the termOperator
and terms
properties.
body = {
'termOperator' : 'AND', #You can also use OR as a boolean operator
'terms' : [
'term1','term2'
]
}
Search By Saved Lists
If you would like to limit your search to the documents in a saved list or a few saved lists, you can add the list IDs to the savedListIds
property.
body = {
'termOperator' : 'AND', #You can also use OR as a boolean operator
'terms' : [
'term1','term2'
],
'savedListIds' : [
'listId1','listId2'
]
}
Search By NER
With NER filtering you can find documents that are tagged with specific concepts or with concept types. The namedEntities
property accepts a list of NamedEntity objects that you can use to filter your document search results.
body = {
'termOperator' : 'AND', #You can also use OR as a boolean operator
'terms' : [
'term1','term2'
],
'savedListIds' : [
'listId1','listId2'
],
'namedEntities' : [{
'concept' : 'fibromyalgia',
'typeId' : 'DISEASE'
}]
}
Perform the Search
You can now use this body variable to request the filtered documents using requests.post
.
docSearchUri = f'{envUrl}/layar/sourceDocument/search'
response = requests.post(docSearchUri,
headers = header,
json = body
)
pprint(response) #optional
Updated 3 months ago
Now that we know how to search for specific documents, we can search for specific paragraphs within document.