Section-driven question answering systems
Introduction
Some users are interested in specific sections of a document. For example, a scientist may be interested in finding experimental protocols in the "Methods" section of an article. A pharmaceutical company may be interested in the "Inclusion Criteria" of a clinical trial.
In this tutorial, you'll learn how to pull the sections for a set of documents, and how to use these sections as a criterion when submitting a bulk question QA job. This will create a QA batch on a set of documents, on the sections you've specified.
Pre-Reqs
We will be using the same dependencies as in previous guides.
# add dependencies
from __future__ import print_function
from pprint import pprint
import time
import requests
import json
If you already haven't made your authentication token
and header
variable, please review the Getting Started Guide.
The following header can be used in your request.
header = {'Accept': 'application/json',
'Content-Type': 'application/json',
'Authorization': f"Bearer {token}",
'X-Vyasa-Client': 'layar',
'X-Vyasa-Data-Providers' : 'sandbox.certara.ai',
'X-Vyasa-Data-Fabric' : 'YOUR_FABRIC_ID'
}
Finding A Document
If you don't already have a documentId
you can review the Upload Documents Guide or Document Search Guide to obtain one.
Finding Headings of Sections
Before we can create the question, we will need to get the headings that exist for each section of our desired document. As with everything we have done before, it will require a request to be made to a specific endpoint. In this case, it's /layar/sourceDocument/{documentId}
. We don't need a body
for this requests.get
so let's dive into getting the document details.
docDetailsUri = f'{envUrl}/layar/sourceDocument/{documentId}'
response = requests.get(docDetailsUri,
headers = header)
Pro Tip
There is a lot of JSON data that returns in the
response
. If you want to see what information returns, you can review the response details in Swagger athttps://YOUR_LAYAR_ENVIRONMENT/layar/swagger-ui.html
Specifically, we are looking for the sections
detail, we can do the following to pull that information into a variable that can be further altered.
sections = response.json().get('sections')
sectionsJson = json.dumps(sections, ident = 6) #Optional, debugging line to review returned data.
print(sectionsJson) #Optional, debugging line to review returned data
If you ran the optional debugging line, you'll notice that the information returned is very large. We are only interested in the heading for our desired sections.
For the sake of the guide, we will be looking for three headings. The abstract, Subject Information and Consent and Study Sites. Let's try to see if those headings even exist in returned JSON. Since we did json().get
our sections
variable is now a list of dictionaries, each dictionary containing a section and heading. Let's iterate through these dictionaries to see if our desired headings exist.
for i in sections:
for key in i.keys():
if key == 'heading' and i[key] in ['abstract','Subject Information and Consent','Study Sites']:
print(i[key])
Here is the headings that are returned.
abstract
Subject Information and Consent
Study Sites
Since we got results, that means the headings exist. Now we just need to put the headings into a variable that we can use with our request body. We can alter our previous iteration to add the headings to a list variable, for further use in our request. Note that we are making an empty list desiredHeadings
before the for statement.
desiredHeadings = []
for i in sections:
for key in i.keys():
if key == 'heading' and i[key] in ['abstract','Subject Information and Consent','Study Sites']:
desiredHeadings = desiredHeadings.append(i[key])
Now that we have our headings in a list, we can use this list to form the body for our question request.
Creating the Request Body
The body is very similar to what was used in Submitting a Bulk Question Job. The one difference is we will be adding our desiredHeading
list to the contents of the body under the value sectionKeywords
.
body = {
'bulkQuestions' : [{
'conceptTypes' : [
'CHEMICAL' , 'CHEBI'
],
'questionKey' : 'Study Drug',
'questionStringVariations' : [
'Where were the Study Sites?',
'What was the summary of the abstract?',
'Who were the subjects of the study?'
],
'deepLearningModelId' : 'AYz45Lw7gz6XyJQfUcsx',
'sectionKeywords' : desiredHeading
}],
'sourceDocumentSearchCommand' : {
'rows' : 500,
'savedListIds' : setId
},
'questionGroupingKey' : 'Demo Section QA Batch'
{
}
Submitting the Batch Create Request
Now we can use requests.post
to submit our QA batch to /layar/question/startBulkQuestionAnswerJob
.
submitBatchQaUri = f'{envUrl}/layar/question/startBulkQuestionAnswerJob'
Response = Requests.post(submitBatchQaUri,
headers = header,
json = body)
pprint(response) #Optional
Get Batch Answers
We can use the questionGroupingKey
value that we entered in Creating the Request Body As part of the body.
body = {
'rows' : 50,
'batchGroupingKey' : 'INSERT questionGroupingKey'
}
Submitting the Batch Answers Request
Now we can use requests.get
to with the body
to obtain answers for the batch of questions.
searchBatchAnswers = f'{envUrl}/layar/answer'
Response = Requests.post(searchBatchAnswers,
headers = header,
json = body)
pprint(response) #Optional
Updated 3 months ago