HomeGuidesRecipesAPI EndpointsRelease NotesCommunity
Log In

Text Clustering with Document Search

Text clustering allows you to organize documents based on the sentences and phrases used in them. Layar allows you text cluster using document search in order to limit the documents used.

Pre-Reqs

Before a document search can be done the API requests must be authenticated. Make sure you have already followed the instructions for importing dependencies and authentication from the Getting Started Guide.

πŸ‘

Check Your Imported Modules

Make sure you have imported the requests and json module before proceeding with this guide.

Create Header

The header for this request will use X-Vyasa-Advanced-Parameters. This parameter takes a dictionary of values. The parameter can be used to determine the method of text clustering, either bertopic or umaphdbscan. Each type of task will need varying advanced parameters. For this example bertopic is the task we will use.

πŸ“˜

Advanced Parameters Details

You can find details on the various parameters here: Clustering Parameters

header = {'Accept': 'application/json',
          'Content-Type': 'application/json',
          'Authorization': f"Bearer {token}",
          'X-Vyasa-Client': 'layar',
          'X-Vyasa-Data-Providers' : 'sandbox.certara.ai',
				     'X-Vyasa-Data-Fabric' : 'YOUR_FABRIC_ID',
          'X-Vyasa-Advanced-Parameters' : 
          				{'task':'bertopic',
         			         'remove_stops': True,
                   'stopwords' : [],
                   'ngram_range : (1,1),
                   'min_topic_size' : 20,
                   'num_keywords' : 1,
                   'ngram_weight' : 4}
  				}

Create Request Body

The request body will mimic the body used when performing a document search. For more details on possible values you can use, review Search Documents. We will be using a simple query search to find documents for clustering.

body = {
    'q': "JAK",
    'rows' : 20
       }

Clustering Request

We can use /sourceDocument/startClusterDocsJob endpoint to create the project with a document search and get the ID of the project.

textClusterUrl = f'{envUrl}/sourceDocument/startClusterDocsJob'

projectId = requests.post(textClusterUrl,
                         headers = header,
                         json = body
                        ).json().get('id')                         

Project Status Request

Now that we have the projectId created, we can use /projectComputation/{projectid}to get the status of the project. X-Vyasa-Advanced-Parametersand a request body will not be needed.

projectStatusUrl = f'{envUrl}//projectComputation/{projectId}'

header = {'Accept': 'application/json',
          'Content-Type': 'application/json',
          'Authorization': f"Bearer {token}",
          'X-Vyasa-Client': 'layar',
          'X-Vyasa-Data-Providers' : 'sandbox.certara.ai',
	  			     'X-Vyasa-Data-Fabric' : 'YOUR_FABRIC_ID'
  				}
status = requests.get(projectStatusUrl,
                      headers = header
                     ).json().get('status')

print(status)