Automate Creation of New Document Sets
Create New Document Sets
First, you need to identify what Curate Set(s) you intend to extract answers from. Using the RESTful API, you can make a call to the savedList
POST method that to create a new Curate set.
Skip this step and proceed to next section if you're working with an existing Curate Set.
X-Vyasa-Client
ConfigurationsThe
savedList
endpoint default saves the "set" to the Layar data fabric/user interface (under "Sets" in the left-hand navigation menu). If you are working with sets in the Synapse Curate user interface, you will need to specify which client application to use ('X-Vyasa-Client' : 'curate'
). See example below.
## Create New Set
response = requests.post(f"{envUrl}/layar/savedList",
data = json.dumps({
"name":"API Test - Create Set"}), # This is the title for the document set, which you will see in the Curate UI under the "Document Sets" tab
headers = {
'accept':'application/json',
'content-Type':'application/json',
'authorization':f"Bearer {token}",
'X-Vyasa-Client': 'curate' # This piece is what defines creating a Layar Set vs. a Curate Set to the system
}
)
response.json()
{'vyasaClient': 'curate',
'listType': 'SEARCH',
'sharedWithUsers': [],
'savedListTerms': [],
'id': 'AYXweEvEEs8gbQkuEVT1',
'name': 'API Test - Create Set',
'createdByUser': 25036}
Updating Existing Sets w/ Filters
Use the PUT method to apply the desired filters & columns of interest that you would like in your Curate Set. This call allows you to:
- Programmatically set the columns & filters for end users in the UI or
- Subset your data fabric to the target documents you wish to run QA or Classification jobs on
You must specify the set ID you are working in when running the /layar/savedList/{setId}
PUT method. Replace {setId}
with the unique Layar identifier assigned to that set. If you created a set in Part I, the {setId}
will be the id
parameter found in the response body.
Skip if you have no interest in adding filters or metadata columns to the view in the UI.
Here's a simple version of the call, where we only change the name from "API Test - Create Set" to "Update Set Test"
response = requests.put(f"{envUrl}/layar/savedList/AYXweEvEEs8gbQkuEVT1", # Put the Set ID here
data = json.dumps({
"id":"AYXweEvEEs8gbQkuEVT1", # Put the Set ID here
"name":"Update Set Test" # Optional, if you wish to change the Set name
}), # Changed date range from one day to full month (Nov 13-Dec 13)
headers = {
'accept':'application/json',
'content-Type':'application/json',
'authorization':f"Bearer {token}",
'X-Vyasa-Client': 'curate'
}
)
response.json()
{'vyasaClient': 'curate',
'listType': 'SEARCH',
'searches': [{'dataProviders': ['master-pubmed.vyasa.com',
'master-pmc-oa.vyasa.com',
'master-clinicaltrials.vyasa.com'],
'filters': [{'op': 'AND',
'conditions': [{'field': 'datePublished',
'minDate': '2022-11-13T00:00:00.000+0000',
'maxDate': '2022-12-13T00:00:00.000+0000'}]}],
'filterOp': 'AND',
'highlight': False,
'highlightPreTag': '<em>',
'highlightPostTag': '</em>',
'logSearch': True,
'randomize': False,
'q': '"osteonecrosis" OR "osteonecroses"',
'start': 0,
'sortOrder': 'asc'}],
'sharedWithUsers': [],
'savedListTerms': [],
'id': 'AYXweEvEEs8gbQkuEVT1',
'name': 'API Test - Create Set',
'dateIndexed': '2023-01-26T23:44:53.698+0000',
'datePublished': '2023-01-26T23:44:53.698+0000',
'dateUpdated': '2023-01-26T23:50:52.544+0000',
'createdByUser': 25036}
Using the same PUT method, you can also specify filters that should be applied to the document set you are creating.
Anything you can filter on through the Curate UI is also available as a filter in the API call.
- Adding
searches.filters.conditions
allows you to filter documents in your Curate set to those that meet any annotation or metadata criteria (e.g.datePublished
,trialData.phases
, etc.) - Specifying
searches.dataProviders
filters your document set down to those that exist in a specific instance, such as a Vyasa Canonical Instance (e.g. PubMed, Clinical Trials, etc.) or client instances (e.g. "your-instance.vyasa.com"). - Search can be done by adding
searches.q
and inputting your boolean query string. Note: This does change if a user is looking to use section-specific search (e.g. when searching the 'Introduction' section of the document rather than the entire document). Section-specific search is not covered in this tutorial.
For example, below is a call to update the existing set with documents published from January 1 - December 31st of 2022, and that mention "osteonecrosis" OR "osteonecroses"
. For this example, we only want the documents to come from Pubmed, Pubmed OA, and Clinical Trials.
Include All Filters When Using PUT Method
If you make the PUT method of /layar/savedList to update the Set with an additional filter, make sure to include all other filters in your request. These filter settings are not persistent when making a new PUT call, so the Set would remove all prior filters and only display the filters provided in the new request. In this example, I'm removing all past filters and only using the
Document ID
, so I'm not concerned with having those other filters in my set.
response = requests.put(f"{envUrl}/layar/savedList/AYXweEvEEs8gbQkuEVT1", # Required: place Set ID here
data = json.dumps({
"id":"AYXweEvEEs8gbQkuEVT1", # Required: place Set ID here
"searches":[{ # Optional Filters & Searches
"dataProviders":["master-pubmed.vyasa.com", "master-pmc-oa.vyasa.com","master-clinicaltrials.vyasa.com"], # Specified PubMed documents only
"q": "\"osteonecrosis\" OR \"osteonecroses\"", # Added Boolean search for "rhematoid arthritis" OR "rheumatism"
"filters":[{
"conditions":[{
"field":"datePublished", # This is the same filter key you would see in Curate when adding a filter
"minDate":"2022-11-13T00:00:00.000Z", # Here we've added a value for the date filter - only docs published AFTER Nov 13, 2022
"maxDate":"2022-12-13T00:00:00.000Z"}]}]}]}), # Filter of Doc Max Date, specifying publishing BEFORE December 13. Use both to implement a date range.
headers = {
'accept':'application/json',
'content-Type':'application/json',
'authorization':f"Bearer {token}",
'X-Vyasa-Client': 'curate'
}
)
response.json()
{'vyasaClient': 'curate',
'listType': 'SEARCH',
'searches': [{'dataProviders': ['master-pubmed.vyasa.com',
'master-pmc-oa.vyasa.com',
'master-clinicaltrials.vyasa.com'],
'filters': [{'op': 'AND',
'conditions': [{'field': 'datePublished',
'minDate': '2022-11-13T00:00:00.000+0000',
'maxDate': '2022-12-13T00:00:00.000+0000'}]}],
'filterOp': 'AND',
'highlight': False,
'highlightPreTag': '<em>',
'highlightPostTag': '</em>',
'logSearch': True,
'randomize': False,
'q': '"osteonecrosis" OR "osteonecroses"',
'start': 0,
'sortOrder': 'asc'}],
'sharedWithUsers': [],
'savedListTerms': [],
'id': 'AYXweEvEEs8gbQkuEVT1',
'name': 'API Test - Create Set',
'dateIndexed': '2023-01-26T23:44:53.698+0000',
'datePublished': '2023-01-26T23:44:53.698+0000',
'dateUpdated': '2023-01-26T23:50:52.544+0000',
'createdByUser': 25036}
Adding Specialized Columns to Sets
Specifying 'viewConfig' Parameters
The viewConfig
allows developers to specify which columns for a document set will be present to the user working in the Curate UI.
The viewConfig
you see in the responses above is the default configuration assigned to any new Curate set, without any column customizations. This can be modified during creation of a set, or during updates to an existing set. See the below example request for how you can format this config in your calls.
# Create a New Set With Filters
response = requests.post(f"{envUrl}/layar/savedList",
data = json.dumps({
"name":"Default Created Set Example",
"viewConfig":{
"curateSettingsV1":{
"queryParams":{"qSearchParams":[],"extraC":[],"extraS":[]}}}}), # This is where we'll add additional columns, see further steps
headers = {
'accept':'application/json',
'content-Type':'application/json',
'authorization':f"Bearer {token}",
'X-Vyasa-Client': 'curate' # This piece is what defines creating a Layar Set vs. a Curate Set to the system
}
)
response.json()
Similarly to how we can add several different types of columns via the UI, you can add these same types of columns programmatically using the viewConfig
.
For example:
- Metadata Columns, such as ClinicalTrial.gov trial data, Pubmed metadata, and custom annotations.
- Section Columns, which are automatically parsed from long unstructured documents (e.g. "Abstract", "Methods", "Inclusion/Exclusion Criteria", etc.)
- Classification Columns, which are predictions from a completed classification job, which was submitted either via "Classify Documents" within the user interface, or done programmatically. Learn more about how to submit a classification job here.
- QA Columns, which are predictions from a completed question-answering (QA) job, which was submitted either via "Ask a Question" within the user interface, or done programmatically. Learn more about how to submit a QA job here.
Metadata & Section Columns
The same viewConfig applies to metadata and section columns (noticing a trend?), except this time, there's no deep learning job you need to submit. That's because this data has already been generated on the backend for each of these documents, and it's just a matter of hooking it up in the view.
- Metadata columns are defined with
viewConfig.curateSettingsV1.queryParams.extraC
- Section columns are defined with
viewConfig.curateSettingsV1.queryParams.extraS
Annotations are considered a metadata column, so you can add them as a column using the annotationSearches
prefix (see the example below, where we've created custom annotations for PubMed ID
).
Sections have already been automatically parsed by a Layar deep learning model, and you can retrieve a list of the sections generated for each document using the SourceDocumentApi
.
response = requests.put(f"{envUrl}/layar/savedList/AYXweEvEEs8gbQkuEVT1", # note the set ID
data = json.dumps({
"viewConfig":{
"curateSettingsV1":{
"extraC":[ # Add Metadata Columns
"annotationSearches.Pubmed ID", # Annotation PubMed ID - can find annotations via API
"dateUpdated"],
"extraS":[ # Add Section Columns
"abstract"]
}}}}),
headers = {
'accept':'application/json',
'content-Type':'application/json',
'authorization':f"Bearer {token}",
'X-Vyasa-Client': 'curate'
}
)
response.json()
{'vyasaClient': 'curate',
'listType': 'SEARCH',
'searches': [{'dataProviders': ['master-pubmed.vyasa.com',
'master-pmc-oa.vyasa.com',
'master-clinicaltrials.vyasa.com'],
'filters': [{'op': 'AND',
'conditions': [{'field': 'datePublished',
'minDate': '2022-11-13T00:00:00.000+0000',
'maxDate': '2022-12-13T00:00:00.000+0000'}]}],
'filterOp': 'AND',
'highlight': False,
'highlightPreTag': '<em>',
'highlightPostTag': '</em>',
'logSearch': True,
'randomize': False,
'q': '"osteonecrosis" OR "osteonecroses"',
'start': 0,
'sortOrder': 'asc'}],
'sharedWithUsers': [],
'savedListTerms': [],
'viewConfig': {'curateSettingsV1': {'queryParams': {'qSearchParams': [{'questionKey': 'What is the disease or indication being studied?',
'batchGroupingKey': 'AYXweEvEEs8gbQkuEVT1'},
{'questionKey': 'Where is the osteonecrosis located on the body?',
'batchGroupingKey': 'AYXweEvEEs8gbQkuEVT1'}]},
'extraC': ['annotationSearches.Pubmed ID', 'dateUpdated'],
'extraS': ['abstract']}},
'id': 'AYXweEvEEs8gbQkuEVT1',
'name': 'Update Set Test',
'dateIndexed': '2023-01-26T23:44:53.698+0000',
'datePublished': '2023-01-26T23:44:53.698+0000',
'dateUpdated': '2023-01-31T17:55:11.561+0000',
'createdByUser': 25036}
Classification Columns
The classification and QA columns both work under the viewConfig.curateSettingsV1.queryParams.qSearchParams
parameter. To add the column, you need to specify the batchGroupingKey
, which is the Curate Set in which the classification job was run on, and the questionKey
, which is the name of the specific model job whose predictions you wish to view.
To find the questionKey
, please reference the "Submitting a bulk classification job" documentation here.
Required: Submit a BulkDoc Classification Job First
You must submit a bulkDoc classification job before you can add the classification column to your view. To learn how to submit a classification job, please reference the tutorial here.
Below is an example where we ran two classification jobs on a document set (batchGroupingKey: AYXweEvEEs8gbQkuEVT1
): a binary cancer classification model (questionKey: Cancer - Binary Text Classification
), and a heme classification model (questionKey: Heme - Binary Text Classification
).
We want to view the predictions of both of those jobs within the document set it was run on (id: AYXvm4oMEs8gbQkuEVRt
)
# Add Predictions from a Classification Run as a Column in Curate Set
response = requests.put(f"{envUrl}/layar/savedList/AYXweEvEEs8gbQkuEVT1", # Note the Set ID
data = json.dumps({
"viewConfig":{ # viewConfig is where we update the UI with additional QA/metadata/section columns, see below
"curateSettingsV1":{
"queryParams":{
"qSearchParams":[{
"batchGroupingKey":"AYXweEvEEs8gbQkuEVT1", # Curate document set ID where the QA job was run
"questionKey":"Cancer - Binary Text Classification"}, # Cancer Classification Model's Question Key
{"batchGroupingKey":"AYXweEvEEs8gbQkuEVT1", # Curate document set ID where the QA job was run
"questionKey":"Heme - Binary Text Classification"}], # Heme Classification Model's Question Key
}}}}),
headers = {
'accept':'application/json',
'content-Type':'application/json',
'authorization':f"Bearer {token}",
'X-Vyasa-Client': 'curate'
}
)
response.json()
{'vyasaClient': 'curate',
'listType': 'SEARCH',
'searches': [{'dataProviders': ['master-pubmed.vyasa.com',
'master-pmc-oa.vyasa.com',
'master-clinicaltrials.vyasa.com'],
'filters': [{'op': 'AND',
'conditions': [{'field': 'datePublished',
'minDate': '2022-11-13T00:00:00.000+0000',
'maxDate': '2022-12-13T00:00:00.000+0000'}]}],
'filterOp': 'AND',
'highlight': False,
'highlightPreTag': '<em>',
'highlightPostTag': '</em>',
'logSearch': True,
'randomize': False,
'q': '"osteonecrosis" OR "osteonecroses"',
'start': 0,
'sortOrder': 'asc'}],
'sharedWithUsers': [],
'savedListTerms': [],
'viewConfig': {'curateSettingsV1': {'queryParams': {'qSearchParams': [{'questionKey': 'Cancer - Binary Text Classification',
'batchGroupingKey': 'AYXweEvEEs8gbQkuEVT1'},
{'questionKey': 'Heme - Binary Text Classification',
'batchGroupingKey': 'AYXweEvEEs8gbQkuEVT1'}]}}},
'id': 'AYXweEvEEs8gbQkuEVT1',
'name': 'Update Set Test',
'dateIndexed': '2023-01-26T23:44:53.698+0000',
'datePublished': '2023-01-26T23:44:53.698+0000',
'dateUpdated': '2023-01-26T23:59:08.443+0000',
'createdByUser': 25036}
QA Columns
Similarly to classification columns, QA columns using the viewConfig.curateSettingsV1.queryParams.qSearchParams
parameter. To add the column, you need to specify the batchGroupingKey
, which is the Curate Set in which the QA job was run on, and the questionKey
, which is the name of the specific model job you've run, and whose predictions you wish to view.
To find the questionKey
, please reference the "Submitting a bulk QA job" documentation here. More often than not, however, the questionKey
will be the query you used when submitting the QA job (e.g. "What is the drug of this study?").
Required: Submit a BulkDoc QA Job First
You must submit a bulkDoc QA job before you can add the QA column to your view. To learn how to submit a QA job, please reference the tutorial here.
# Add Predictions from a QA Run as a Column in Curate Set
response = requests.put(f"{envUrl}/layar/savedList/AYXweEvEEs8gbQkuEVT1", # Note the Set ID
data = json.dumps({
"viewConfig":{ # viewConfig is where we update the UI with additional QA/metadata/section columns, see below
"curateSettingsV1":{
"queryParams":{
"qSearchParams":[{
"batchGroupingKey":"AYXweEvEEs8gbQkuEVT1", # Curate document set ID where the QA job was run
"questionKey":"What is the disease or indication being studied?"},
{"batchGroupingKey":"AYXweEvEEs8gbQkuEVT1", # Curate document set ID where the QA job was run
"questionKey":"Where is the osteonecrosis located on the body?"}]}}}}), # The question key for the QA job
headers = {
'accept':'application/json',
'content-Type':'application/json',
'authorization':f"Bearer {token}",
'X-Vyasa-Client': 'curate'
}
)
response.json()
{'vyasaClient': 'curate',
'listType': 'SEARCH',
'searches': [{'dataProviders': ['master-pubmed.vyasa.com',
'master-pmc-oa.vyasa.com',
'master-clinicaltrials.vyasa.com'],
'filters': [{'op': 'AND',
'conditions': [{'field': 'datePublished',
'minDate': '2022-11-13T00:00:00.000+0000',
'maxDate': '2022-12-13T00:00:00.000+0000'}]}],
'filterOp': 'AND',
'highlight': False,
'highlightPreTag': '<em>',
'highlightPostTag': '</em>',
'logSearch': True,
'randomize': False,
'q': '"osteonecrosis" OR "osteonecroses"',
'start': 0,
'sortOrder': 'asc'}],
'sharedWithUsers': [],
'savedListTerms': [],
'viewConfig': {'curateSettingsV1': {'queryParams': {'qSearchParams': [{'questionKey': 'What is the disease or indication being studied?',
'batchGroupingKey': 'AYXweEvEEs8gbQkuEVT1'},
{'questionKey': 'Where is the osteonecrosis located on the body?',
'batchGroupingKey': 'AYXweEvEEs8gbQkuEVT1'}]}}},
'id': 'AYXweEvEEs8gbQkuEVT1',
'name': 'Update Set Test',
'dateIndexed': '2023-01-26T23:44:53.698+0000',
'datePublished': '2023-01-26T23:44:53.698+0000',
'dateUpdated': '2023-01-27T00:03:11.323+0000',
'createdByUser': 25036}
Adding All Column Types
Here is an example using all of the different column types: QA columns, classification columns, metadata columns, and section columns.
# Example Request Using All Four Types of Columns (Classification, QA, Metadata, & Sections)
response = requests.put(f"{envUrl}/layar/savedList/AYXweEvEEs8gbQkuEVT1", # note the set ID
data = json.dumps({
"viewConfig":{
"curateSettingsV1":{
"queryParams":{
"qSearchParams":[{ # Add Question Columns (QA & Classification)
"batchGroupingKey":"AYXweEvEEs8gbQkuEVT1",
"questionKey":"Cancer - Binary Text Classification"},{
"batchGroupingKey":"AYXweEvEEs8gbQkuEVT1",
"questionKey":"What drug is being evaluated in this study?"}],
"extraC":[ # Add Metadata Columns
"annotationSearches.Pubmed ID", # Annotation PubMed ID - can find annotations via API
"dateUpdated"],
"extraS":[ # Add Section Columns
"abstract"]
}}}}),
headers = {
'accept':'application/json',
'content-Type':'application/json',
'authorization':f"Bearer {token}",
'X-Vyasa-Client': 'curate'
}
)
response.json()
{'vyasaClient': 'curate',
'listType': 'SEARCH',
'searches': [{'dataProviders': ['master-pubmed.vyasa.com'],
'filters': [{'op': 'AND',
'conditions': [{'field': 'id',
'values': ['pubmed_36555184',
'pubmed_35190930',
'pubmed_35932370',
'pubmed_36463478',
'pubmed_36447263']}]}],
'filterOp': 'AND',
'highlight': False,
'highlightPreTag': '<em>',
'highlightPostTag': '</em>',
'logSearch': True,
'randomize': False,
'start': 0,
'sortOrder': 'asc'}],
'sharedWithUsers': [],
'savedListTerms': [],
'viewConfig': {'curateSettingsV1': {'queryParams': {'qSearchParams': [{'questionKey': 'Cancer - Binary Text Classification',
'batchGroupingKey': 'AYXu8wBvEs8gbQkuEVPz'},
{'questionKey': 'What drug is being evaluated in this study?',
'batchGroupingKey': 'AYXu8wBvEs8gbQkuEVPz'}],
'extraC': ['annotationSearches.Pubmed ID', 'dateUpdated'],
'extraS': ['abstract']}}},
'id': 'AYXu8wBvEs8gbQkuEVPz',
'name': 'Update Set Test',
'dateIndexed': '2023-01-26T16:39:40.906+0000',
'datePublished': '2023-01-26T16:39:40.906+0000',
'dateUpdated': '2023-01-26T16:42:13.684+0000',
'createdByUser': 25036}
Updated 8 months ago
In the next tutorial, you will learn how to submit a bulk QA or classification job. This guide also includes a section on how to submit a chained question as well.