HomeGuidesRecipesAPI EndpointsRelease NotesCommunity
Log In

Section-driven question answering systems

Some users are interested in specific sections of a document. For example, a scientist may be interested in finding experimental protocols in the "Methods" section of an article. A pharmaceutical company may be interested in the "Inclusion Criteria" of a clinical trial.

As part of the SourceDocumentApi, you can find the sections for an input document list, and use our deep learning question answering pipeline to ask questions of those sections!

In this tutorial, you'll learn how to pull the sections for a set of documents, and how to use these sections as a criteria when submitting a bulk question QA job. This will create a QA batch on a set of documents, on the sections you've specified.

Let's begin!

Part I. Set Up

Import Dependencies

To start, let's upload our dependencies.

# add dependencies
import layar_api
from layar_api.rest import ApiException
import requests
from pprint import pprint

Configure Authentication

Next, we'll want to configure our session with our authentication keys. Copy the following commands and only swap out the strings for base_host, client_id, and client_secret. The base_host is the Layar instance you're working within (e.g. 'demo.vyasa.com'), and the client ID and secret are your provided authentication keys.

To learn how to get your authentication keys, please reference this document.

# set up your authentication credentials
base_host = 'BASE_URL' # your Layar instance (e.g. 'demo.vyasa.com')
client_id = 'AbcDEfghI3' # example developer API key
client_secret = '1ab23c4De6fGh7Ijkl8mNoPq9' #example developer API secret

# configure oauth access token for authorization
configuration = layar_api.Configuration()
configuration.host = f"https://{base_host}"
configuration.access_token = configuration.fetch_access_token(
    client_id, client_secret)

# Make your life easier for the next task: instantiating APIs!
client = layar_api.ApiClient(configuration)

Instantiate Your APIs

# Instantiate APIs
sourceDocApi = layar_api.SourceDocumentApi(client) #Needed for the the full tutorial
groupText = layar_api.GroupApi(client) # Only required if using the grouping API in Part 4
questionApi = layar_api.QuestionApi(client) # Required for asking your QA query
answerApi = layar_api.AnswerApi(client) # Required for retrieving the answers from your bulk QA job

Part 2. Gathering Your Sections

To find the sections of your documents, we'll be using the get_doc_field_counts method from the SourceDocumentApi. This method is available as a recipe for quick access (see below).

Identify Your Documents

In order to know what sections you have, you're going to want to identify what documents you are referring to. You can find these documents based on a number of parameters found in the SourceDocumentSearchCommand. Here, we've used the saved_list_ids parameter (which is the same thing as the Layar Set ID if you've created a Layar Set with all of your documents).

# create an instance of the api class
body = layar_api.SourceDocumentSearchCommand(
    saved_list_ids = ['AYBrf_ZWKBnV9heqHzEu', 'AYBszFceKBnV9heqH0pT'], # Provide Layar Set ID if you want to query a specific set of Documents
    ids = ['AYBrfsyQKBnV9heqHzBS'], # Delete after testing
    )

Identify the SourceDoc Field of Interest (Section)

The get_doc_field_counts method allows you to search and count many different fields found the SourceDocument domain object. A running list of these fields are automatically updated in the Swagger documentation for this endpoint.

1312

The current list of available field options as of writing this article, as seen in the API Endpoints documentation.

For this tutorial, we want the sectionHeading as our field parameter.

field = 'sectionHeading' # str |

The body for the get_doc_field_counts method can also use some additional parameters (annotation_key and value_type). However, these are optional and not relevant to our task of identifying the sections.

Run the API Call

Finally, we run the API call, including the request body we created above, and the field we defined as well.

try:
    # Get document counts by field type
    api_response = sourceDocApi.get_doc_field_counts(body, field)
    pprint(api_response)
except ApiException as e:
    print("Exception when calling SourceDocumentApi->get_doc_field_counts: %s\n" % e)

You should expect to see a response in your terminal that returns a list of dictionaries with each of the sections.
The response will return:

  • a count for how many times that section existed in your input set of documents
  • a key for the section's heading (what the actual section was titled)
  • a name if that section has been provided any additional metadata and details that have been changed by the user (e.g. if a user wanted to group the keys 'criteria for evaluation:', 'duration of treatment:', and 'subject disposition' under a uniform 'name': Trial Design).

For example, here's a sample of what you might see:

[{'count': 6, 'key': 'abstract', 'name': None},
 {'count': 6, 'key': 'duration of treatment:', 'name': None},
 {'count': 5, 'key': 'safety results:', 'name': None},
 {'count': 4, 'key': 'criteria for evaluation:', 'name': None},
 {'count': 2, 'key': 'statistical methods:', 'name': None},
 {'count': 2, 'key': 'subject disposition', 'name': None},
 {'count': 1, 'key': 'antithrombotic prophylaxis', 'name': None},
 {'count': 1, 'key': 'auc t', 'name': None},
 {'count': 1, 'key': 'efficacy analysis:', 'name': None},
 {'count': 1, 'key': 'efficacy conclusions', 'name': None},
 {'count': 1, 'key': 'efficacy:', 'name': None},
 ]

Sometimes, the sections are just different enough where you want to group them into general clusters (e.g. 'efficacy analysis:', 'efficacy conclusions', and 'efficacy:' above). To group these results, use the GroupingApi.

We have a recipe available below as an example of how to use the Grouping API:

Since the grouping API will group an entire string, you'll want to break the keys out into their own list before putting through the grouping API. Otherwise, you'll get them grouped based on count, key, and name, which will make the clustering less exact for your purposes.

πŸ‘

Success!

You've got a list of sections! You can now use these sections as inputs when submitting a bulk QA job. We'll briefly describe the new portions of that workflow here, but if you need a general refresher on how to submit a bulk QA job, check it out here!

Part 3. Create the Bulk QA Job Request

🚧

Shortened Tutorial

As a reminder, this is just a quick demo of how to include the sections into your QA job request. For a longer explanation of how the submission of a bulk question answering job works, please reference the QA job request tutorial.

Identify Your Document Batch

You can find the documents based on a number of parameters in the SourceDocumentSearchCommand.

Since we're looking for specific sections within a Layar Set, we've used the saved_list_ids and the section_searches parameter.

# Identify which Documents & Sections will be submitted for QA
docSet = layar_api.SourceDocumentSearchCommand(
    saved_list_ids = ['AYBrf_ZWKBnV9heqHzEu', 'AYBszFceKBnV9heqH0pT'], # Provide Layar Set ID if you want to query a specific set of Documents
    section_searches= layar_api.SectionSearch(heading = "abstract") # If you want to find results that have a specific string
    )

Input "Ask a Question" Parameters

This section has been described in more detail previously.

Below is a sample code snippet to test with the tutorial:

# Provide the Question Parameters
q = layar_api.BulkQuestion(
    question_key = ['Disease on Sets'], #The question key you would like to name this job as
    question_string_variations = ['What is the disease or indication being evaluated?'] # If there are multiple variations of the question you wish to try under the same bulk job
    )

Build the Request Body

This section has been described in more detail previously.

Below is a sample code snippet to test with the tutorial:

# Build the BulkQuestion Body
job = layar_api.BulkQuestionCommand(
    bulk_questions = [q], 
    source_document_search_command = docSet, 
    question_grouping_key='Clinical Trial API Demo')

Submit Your BulkQuestion Job

Now that you have your request body prepped (job), it's time to submit your API request! Make a call to the questionApi class using the start_batch endpoint (see SDK documentation).

# Submit BulkQuestion Job
try:
    section_qa = questionApi.start_batch(body=job)
    pprint(section_qa)
    print("Success!") 
except ApiException as e:
    print("Exception when calling QuestionApi->start_batch: %s\n" % e)

You should see the following as a console response. The job_id is a submitted question batch job, whose status can be viewed by using the QuestionBatchSearch command and using the job_id as a search parameter.

{'job_id': 'd6f1c63b-d191-46e1-86e2-dc6533130bd2'}
Success!

πŸ‘

Success!

You've just submitted a section-driven question answering job! you'll be able to view it in your defined batch and question key.


Up Next

Now that you've submitted a bulk question job (with sections instead of whole documents), learn how you can retrieve the answers (model and human curated) for each document within a given batch!