Using Statement Search to Find Table Details
Now that we know how to search for statements in a document, we can use this same method to get table details that may be in the document.
Setting Up
Make sure you have already followed the instructions for importing dependencies and authentication from the Getting Started Guide. You can use your token
variable to authenticate your requests.
Check Your Imported Modules
Make sure you have imported the
requests
,json
andpprint
module before proceeding with this guide.
Get A Document ID
In order to perform the statement search, we need a document ID that contains a table. For the sake of this guide, I will be using AY9YvS8mU0xLUebQHJVB
which was uploaded using Upload a Document.
Is a Document ID Always Needed?
The
/layar/statement/search
endpoint does not require you to provide a document ID in the request body. You can look at all statements across the whole data fabric, if desired.
Create the Request Body
Along with the documentId
we will make use of rows
to ensure all statements related to the document are returned.
body = {
'rows' : 500,
'documentIds' : ['AY9YvS8mU0xLUebQHJVB']
}
Request Statement Details
Now that we have the body, we can use our header from the previous guide along with the statement endpoint.
stateSearchUrl = f'{envUrl}/layar/statement/search'
response = requests.post(stateSearchUrl,
headers = header,
json = body).json()
pprint(response) #optional
Analyzing the Response
The output will be a list of dictionaries, each dictionary pertaining to a specific statement. Let look at one of those dictionaries that relates back to a table.
{'chemNerMetadata': [],
'columnKeys': ['column_0_string',
'column_1_string',
'column_2_string',
'column_3_string',
'column_4_string'],
'columns': {'column_3_string': 'Cycle 2 Day 1'},
'dataFabricId': 'fabric_7RCTRVCE7G1BMRTDM362VPQ3TN_249',
'dateIndexed': '2024-05-08T15:05:59.521+0000',
'datePublished': '2024-05-08T15:05:59.519+0000',
'dateUpdated': '2024-05-08T15:05:59.521+0000',
'detectedConcepts': {},
'detectedConceptsWithPosition': [],
'detectedTypes': [],
'documentId': 'AY9YvS8mU0xLUebQHJVB',
'highlightedQueryTerms': [],
'id': 'AY9YvWniU0xLUebQHJXy',
'namedEntities': [],
'provider': 'sandbox.certara.ai',
'startOffset': 0,
'taggedConcepts': [],
'taggedRelationships': [],
'tweet': {},
'type': 'delimited'}
The most important values in the response are columnKeys
,columns
and startOffset
.
columnKeys
columnKeys
A list of strings, the length of the list correlates to the amount of columns in the table. In the above example the length would be 5, which means 5 columns.
columns
columns
A dictionary that contains all the values for a row of the table. Each key relates to the specific value of a column in the current row.
startOffset
startOffset
This indicates what row the data in. The above example shows an offset of 0 which means this is the first row of data in the table.
Finding the Column Headers
Now that we know how to set the details for the table we need to the column headers to put it all together. In order to do this, we can use /layar/sourceDocument/{documentID}
.
Create the Request Body
A request body isn't needed for this request.
Request Source Details
We can use the header from our previous request and the source document endpoint.
docDetailUrl = f'{envUrl}/layar/sourceDocument/AY9YvS8mU0xLUebQHJVB'
response = requests.get(docDetailUrl,
headers = header
).json()
pprint(response) #optional
Analyzing the Response
Querying the document details gives us back a lot of information. We are only curious in the columnDefinitions
key that is returned in the JSON.
'columnDefinitions': [{
'dataType': 'string',
'key': 'column_0_string',
'name': 'Unnamed: 0',
'order': 0},
{'dataType': 'string',
'key': 'column_1_string',
'name': 'Unnamed: 1',
'order': 1},
{'dataType': 'string',
'key': 'column_2_string',
'name': 'Unnamed: 2',
'order': 2},
{'dataType': 'string',
'key': 'column_3_string',
'name': 'Visit',
'order': 3
}]
columnDefinitions
columnDefinitions
A list of dictionaries. This includes all the details we need for the columns contained in the table. key
can be correlated directly to the columnKeys
listed in the statement search. name
gives us the actual name of the column. The order
value details the order the columns appear in the table.
Updated 7 months ago