Using Statement Search to Find Table Details

Now that we know how to search for statements in a document, we can use this same method to get table details that may be in the document.

Pre-Req

Make sure you have already followed the instructions for importing dependencies and authentication from the Getting Started Guide. You can use your token variable to authenticate your requests.

👍
Check Your Imported Modules
Make sure you have imported the requests ,json and pprint module before proceeding with this guide.

The following header can be used in your request.

header = {'Accept': 'application/json',
          'Content-Type': 'application/json',
          'Authorization': f"Bearer {token}",
          'X-Vyasa-Client': 'layar',
          'X-Vyasa-Data-Providers' : 'sandbox.certara.ai',
	  'X-Vyasa-Data-Fabric' : 'YOUR_FABRIC_ID'
  	 }

Get A Document ID

In order to perform the statement search, we need a document ID that contains a table. For the sake of this guide, I will be using AY9YvS8mU0xLUebQHJVB which was uploaded using Upload a Document.

📘
Is a Document ID Always Needed?
The /layar/statement/search endpoint does not require you to provide a document ID in the request body. You can look at all statements across the whole data fabric, if desired.

Create the Request Body

Along with the documentId we will make use of rows to ensure all statements related to the document are returned.

body = {
  'rows' : 500,
  'documentIds' : ['AY9YvS8mU0xLUebQHJVB']
       }

Request Statement Details

Now that we have the body, we can use our header from the previous guide along with the statement endpoint.

stateSearchUrl = f'{envUrl}/layar/statement/search'

response = requests.post(stateSearchUrl,
                         headers = header,
                         json = body).json()

pprint(response) #optional

Analyzing the Response

The output will be a list of dictionaries, each dictionary pertaining to a specific statement. Let look at one of those dictionaries that relates back to a table.

{'chemNerMetadata': [],
  'columnKeys': ['column_0_string',
                 'column_1_string',
                 'column_2_string',
                 'column_3_string',
                 'column_4_string'],
  'columns': {'column_3_string': 'Cycle 2 Day 1'},
  'dataFabricId': 'fabric_7RCTRVCE7G1BMRTDM362VPQ3TN_249',
  'dateIndexed': '2024-05-08T15:05:59.521+0000',
  'datePublished': '2024-05-08T15:05:59.519+0000',
  'dateUpdated': '2024-05-08T15:05:59.521+0000',
  'detectedConcepts': {},
  'detectedConceptsWithPosition': [],
  'detectedTypes': [],
  'documentId': 'AY9YvS8mU0xLUebQHJVB',
  'highlightedQueryTerms': [],
  'id': 'AY9YvWniU0xLUebQHJXy',
  'namedEntities': [],
  'provider': 'sandbox.certara.ai',
  'startOffset': 0,
  'taggedConcepts': [],
  'taggedRelationships': [],
  'tweet': {},
  'type': 'delimited'}

The most important values in the response are columnKeys ,columns and startOffset.

`columnKeys`

A list of strings, the length of the list correlates to the amount of columns in the table. In the above example the length would be 5, which means 5 columns.

`columns`

A dictionary that contains all the values for a row of the table. Each key relates to the specific value of a column in the current row.

`startOffset`

This indicates what row the data in. The above example shows an offset of 0 which means this is the first row of data in the table.

Finding the Column Headers

Now that we know how to set the details for the table we need to the column headers to put it all together. In order to do this, we can use /layar/sourceDocument/{documentID} .

Create the Request Body

A request body isn't needed for this request.

Request Source Details

We can use the header from our previous request and the source document endpoint.

docDetailUrl = f'{envUrl}/layar/sourceDocument/AY9YvS8mU0xLUebQHJVB'

response = requests.get(docDetailUrl,
                         headers = header
                  			).json()

pprint(response) #optional

Analyzing the Response

Querying the document details gives us back a lot of information. We are only curious in the columnDefinitions key that is returned in the JSON.

'columnDefinitions': [{
                        'dataType': 'string',
                        'key': 'column_0_string',
                        'name': 'Unnamed: 0',
                        'order': 0},
                       {'dataType': 'string',
                        'key': 'column_1_string',
                        'name': 'Unnamed: 1',
                        'order': 1},
                       {'dataType': 'string',
                        'key': 'column_2_string',
                        'name': 'Unnamed: 2',
                        'order': 2},
                       {'dataType': 'string',
                        'key': 'column_3_string',
                        'name': 'Visit',
                        'order': 3
                     }]

`columnDefinitions`

A list of dictionaries. This includes all the details we need for the columns contained in the table. key can be correlated directly to the columnKeys listed in the statement search. name gives us the actual name of the column. The order value details the order the columns appear in the table.

Using Statement Search to Find Table Details

Pre-Req

👍
Check Your Imported Modules

Get A Document ID

📘
Is a Document ID Always Needed?

Create the Request Body

Request Statement Details

Analyzing the Response

`columnKeys`

`columns`

`startOffset`

Finding the Column Headers

Create the Request Body

Request Source Details

Analyzing the Response

`columnDefinitions`

Pre-Req

👍Check Your Imported Modules

Get A Document ID

📘Is a Document ID Always Needed?

Create the Request Body

Request Statement Details

Analyzing the Response

columnKeys

columns

startOffset

Finding the Column Headers

Create the Request Body

Request Source Details

Analyzing the Response

columnDefinitions

👍
Check Your Imported Modules

📘
Is a Document ID Always Needed?

`columnKeys`

`columns`

`startOffset`

`columnDefinitions`