NLP Class
What's the NLP class?
NLP inherits from TextChunk.
NLP is a class designed to give access to relevant data at the different levels (document, sentence, subsentence) in an intuitive way. It allows you to perform quick data exploration, manipulation and analysis. It's also used to perform requests and can save as well as load result as JSON objects.
When a response from the API is received it's stored in a hierarchy of classes: NLP (all data) => Document => Sentence => Subsentence => Token
At each level direct access it's possible to access inferior levels i.e. nlp.sentences gives access to a list of all the Sentence in the current data, while nlp.documents[0].sentences only gives the Sentence of the first Document.
NLP is iterable and will yield Document instances.
Attributes & Properties
Name | Type | Description |
---|---|---|
documents | list of Document instances | List of all the Document instances |
sentences | list of Sentence instances | Direct access to all of the Sentences instances. |
subsentences | list of Subsentence instances | Direct access to all of the Subsentence instances. |
tokens | list of Token instances | Direct access to all Tokens in the subsentence |
fields | list of string | List of all common properties accessible at all levels (token, span, cluster, pos etc.) |
client | instance of Client | Client used for performing request to Lettria's API |
Common properties | depends on property | Properties allowing access to specific data (pos, token etc.) |
NLP methods
Data analysis
Below is an overview list of the methods that can be used to manage data with the API.
Method | Description |
---|---|
add_documents() | Submits a batch of documents to API |
add_document() | Submits document to API |
save_data() | Saves data from json file |
load_results() | Loads data from json file |
reset_data() | Erase data and reinitialise object |
add_client() | Adds new client / api_key |
to_annotation_format() | Exports the data to the input format of our annotation platform. |
add_documents()
add_documents(documents, batch_size, skip_document = False, id=None, verbose=True)
This method allows you to batch the documents in order to accelerate the calls, to be used in priority elsewhere.
Parameters:
Name | Type | Description | Optional |
---|---|---|---|
document | list of string | Data to be sent to the API | False |
batch_size | int | Number of documents to be sent at the same time to the API. Reduce if documents are too big. | 32 |
skip_document | bool | Whether to skip the document if there is a problem during processing | True |
document_ids | list of str | List of Id to identify the documents, by default an incrementing integer is assigned. | True |
verbose | bool | Whether to print additional statements about document processing.True |
add_document()
add_document(document, skip_document = False, id=None, verbose=True)
This method allows you to batch the documents in order to accelerate the calls, to be used in priority elsewhere.
Parameters:
Name | Type | Description | Optional |
---|---|---|---|
document | string or list of string | Data to be sent to the API | False |
skip_document | bool | Whether to skip the document if there is a problem during processing | True |
id | str | Id to identify the document, by default an incrementing integer is assigned. | True |
verbose | bool | Whether to print additional statements about document processing.True |
save_results()
save_results(file = '')
Writes current results to a JSON file. If no file is specified the default path is results_X.json with X being next 'free' integer.
Parameters:
Name | Type | Description | Optional |
---|---|---|---|
file | string | Path of file to write in. | True |
load_results()
load_results(path = 'results_0', reset = False)
Loads results from a JSON file.
Parameters:
Name | Type | Description | Optional |
---|---|---|---|
path | string | Path of file to load. | True |
reset | bool | Whether to erase current data. | True |
reset_data()
reset_data()
Erase all data inside NLP and reinitialise documents ids.
Parameters:
Name | Type | Description | Optional |
---|---|---|---|
file | string | Path of file to load. | True |
reset | bool | Whether to erase current data. | True |
add_client()
add_client(client = None, api_key = None)
Replaces current client with provided one, or creates a new client using the provided api_key.
Parameters:
Name | Type | Description | Optional |
---|---|---|---|
client | instance of Client class | Client instance to replace the current one. | True |
api_key | string | Key to use for the new client. | True |
to_annotation_format
to_annotation_format(output_file, attribute=None, filter_list = [], verbose=True):
Writes data to a file in the annotation format for lettria's platform.
Parameters:
Name | Type | Description | Optional |
---|---|---|---|
output_file | string | Name of the file to write to. | False |
attribute | string | Attribute to be used to preselect tokens to annotate. Defaults to None. | True |
filter_list | list | Filter used to compare to the chosen attribute. Defaults to []. | True |
verbose | bool | Turns on/off verbosity. Defaults to True. | True |