Section-driven question answering systems

Introduction

Some users are interested in specific sections of a document. For example, a scientist may be interested in finding experimental protocols in the "Methods" section of an article. A pharmaceutical company may be interested in the "Inclusion Criteria" of a clinical trial.

In this tutorial, you'll learn how to pull the sections for a set of documents, and how to use these sections as a criterion when submitting a bulk question QA job. This will create a QA batch on a set of documents, on the sections you've specified.

Pre-Reqs

We will be using the same dependencies as in previous guides.

# add dependencies
from __future__ import print_function
from pprint import pprint
import time
import requests
import json

If you already haven't made your authentication token and header variable, please review the Getting Started Guide.

The following header can be used in your request.

header = {'Accept': 'application/json',
          'Content-Type': 'application/json',
          'Authorization': f"Bearer {token}",
          'X-Vyasa-Client': 'layar',
          'X-Vyasa-Data-Providers' : 'sandbox.certara.ai',
	  'X-Vyasa-Data-Fabric' : 'YOUR_FABRIC_ID'
  	 }

Finding A Document

If you don't already have a documentId you can review the Upload Documents Guide or Document Search Guide to obtain one.

Finding Headings of Sections

Before we can create the question, we will need to get the headings that exist for each section of our desired document. As with everything we have done before, it will require a request to be made to a specific endpoint. In this case, it's /layar/sourceDocument/{documentId}. We don't need a body for this requests.get so let's dive into getting the document details.

docDetailsUri = f'{envUrl}/layar/sourceDocument/{documentId}'

response = requests.get(docDetailsUri,
                        headers = header)

👍
Pro Tip
There is a lot of JSON data that returns in the response. If you want to see what information returns, you can review the response details in Swagger at https://YOUR_LAYAR_ENVIRONMENT/layar/swagger-ui.html

Specifically, we are looking for the sections detail, we can do the following to pull that information into a variable that can be further altered.

sections = response.json().get('sections')
sectionsJson = json.dumps(sections, ident = 6) #Optional, debugging line to review returned data.
print(sectionsJson) #Optional, debugging line to review returned data

If you ran the optional debugging line, you'll notice that the information returned is very large. We are only interested in the heading for our desired sections.

For the sake of the guide, we will be looking for three headings. The abstract, Subject Information and Consent and Study Sites. Let's try to see if those headings even exist in returned JSON. Since we did json().get our sections variable is now a list of dictionaries, each dictionary containing a section and heading. Let's iterate through these dictionaries to see if our desired headings exist.

for i in sections:
    for key in i.keys():
        if key == 'heading' and i[key] in ['abstract','Subject Information and Consent','Study Sites']:
            print(i[key])

Here is the headings that are returned.

abstract
Subject Information and Consent
Study Sites

Since we got results, that means the headings exist. Now we just need to put the headings into a variable that we can use with our request body. We can alter our previous iteration to add the headings to a list variable, for further use in our request. Note that we are making an empty list desiredHeadings before the for statement.

desiredHeadings = []
for i in sections:
    for key in i.keys():
        if key == 'heading' and i[key] in ['abstract','Subject Information and Consent','Study Sites']:
            desiredHeadings = desiredHeadings.append(i[key])

Now that we have our headings in a list, we can use this list to form the body for our question request.

Creating the Request Body

The body is very similar to what was used in Submitting a Bulk Question Job. The one difference is we will be adding our desiredHeading list to the contents of the body under the value sectionKeywords.

body = {
  'bulkQuestions' : [{
    'conceptTypes' : [
      'CHEMICAL' , 'CHEBI'
                     ],
    'questionKey' : 'Study Drug',
    'questionStringVariations' : [
      'Where were the Study Sites?',
      'What was the summary of the abstract?',
      'Who were the subjects of the study?'
                                 ],
    'deepLearningModelId' : 'AYz45Lw7gz6XyJQfUcsx',
    'sectionKeywords' : desiredHeading
                      }],
  'sourceDocumentSearchCommand' : {
    'rows' : 500,
    'savedListIds' : setId
                                  },
  'questionGroupingKey' : 'Demo Section QA Batch'
                                  {                                    
       }

Submitting the Batch Create Request

Now we can use requests.post to submit our QA batch to /layar/question/startBulkQuestionAnswerJob.

submitBatchQaUri = f'{envUrl}/layar/question/startBulkQuestionAnswerJob'

Response = Requests.post(submitBatchQaUri,
                        headers = header,
                        json = body)
pprint(response) #Optional

Get Batch Answers

We can use the questionGroupingKey value that we entered in Creating the Request Body As part of the body.

body = {
  'rows' : 50,
  'batchGroupingKey' : 'INSERT questionGroupingKey'
       }

Submitting the Batch Answers Request

Now we can use requests.get to with the body to obtain answers for the batch of questions.

searchBatchAnswers = f'{envUrl}/layar/answer'

Response = Requests.post(searchBatchAnswers,
                        headers = header,
                        json = body)
pprint(response) #Optional

Introduction

Pre-Reqs

Finding A Document

Finding Headings of Sections

👍Pro Tip

Creating the Request Body

Submitting the Batch Create Request

Get Batch Answers

Submitting the Batch Answers Request

👍
Pro Tip