HomeGuidesRecipesAPI EndpointsRelease NotesCommunity
Log In

Document Search

Pre-Reqs

Before a document search can be done the API requests must be authenticated. Make sure you have already followed the instructions for importing dependencies and authentication from the Getting Started Guide.

👍

Check Your Imported Modules

Make sure you have imported the requests and json module before proceeding with this guide.

The next thing you'll need is a list of all the data providers you would like to search within. You can search for data providers using a GET request to /layar/node endpoint. The JSON of the request would look like this.

searchProviderUri = f'{envUrl}/layar/node'

body = {
  			'all' : True
        }

response = requests.get(searchProviderUri,
                        headers = header,
                        json = body
                       )

providers = response

We can now use the list of data providers for the x-vyasa-data-providers parameter in the header that gets forwarded with each request. We've been using a pre-defined header for the previous guides, let's review that header again.

header = {'Accept': 'application/json',
          'Content-Type': 'application/json',
          'Authorization': f"Bearer {token}",
          'X-Vyasa-Client': 'layar',
          'X-Vyasa-Data-Providers' : 'sandbox.certara.ai'}

'X-Vyasa-Client': 'layar'\ indicates that we are querying Layar. 'X-Vyasa-Data-Providers' : 'sandbox.certara.ai'\ denotes that we are querying the provider, sandbox.certara.ai, for data.

Searching for Documents

Now that we have our providers added to the header we can create the body of the search. The endpoint we will be using is /layar/sourceDocument/search. Utilizing Swagger we can see what possible values can be used to filter the search.

There are a lot of parameters we use to dictate the contents of our search; a generic request would look like this.

searchDocUri = f'{envUrl}/layar/sourceDocument/search'
response = requests.post(searchDocUri,
                         headers = header
                        )

🚧

Caution

Notice that we are not forwarding any JSON data with the header, this means we will get back ALL documents under that data provider. This could take some time.

The sections below will go over how to search on specific parameters. Over the examples we will build out the body variable with JSON, each part narrowing down the search.

Search By IDs

If you have access to the specific document Layar IDs you would like to request, perhaps from another API call, you can add them as a list of strings to the ids property.

body = {
  			'ids' : [
          			'id1','id2'
                ]
       }

Search By Terms

in most cases you won't already have document ids to go off of, so you may want to search on terms in order to acquire documents. To do so we will use the termOperator and terms properties.

body = {
  			'termOperator' : 'AND', #You can also use OR as a boolean operator
  			'terms' : [
          			'term1','term2'
                ]
       }

Search By Saved Lists

If you would like to limit your search to the documents in a saved list or a few saved lists, you can add the list IDs to the savedListIds property.

body = {
  			'termOperator' : 'AND', #You can also use OR as a boolean operator
  			'terms' : [
          			'term1','term2'
                  ],
  			'savedListIds' : [
          								'listId1','listId2'
                         ]
       }

Search By NER

With NER filtering you can find documents that are tagged with specific concepts or with concept types. The namedEntities property accepts a list of NamedEntity objects that you can use to filter your document search results.

body = {
  			'termOperator' : 'AND', #You can also use OR as a boolean operator
  			'terms' : [
          			'term1','term2'
                  ],
  			'savedListIds' : [
          								'listId1','listId2'
                         ],
  			'namedEntities' : [{
          								'concept' : 'fibromyalgia',
                          'typeId' : 'DISEASE'
                          }]
       }

Perform the Search

You can now use this body variable to request the filtered documents using requests.post.

docSearchUri = f'{envUrl}/layar/sourceDocument/search'

response = requests.post(docSearchUri,
                         headers = header,
                         json = body
                        )
pprint(response) #optional

Up Next

Now that we know how to search for specific documents, we can search for specific paragraphs within document.