Skip to main content
Version: 2.0

NLP Class

What's the NLP class?

NLP inherits from TextChunk.

NLP is a class designed to give access to relevant data at the different levels (document, sentence, subsentence) in an intuitive way. It allows you to perform quick data exploration, manipulation and analysis. It's also used to perform requests and can save as well as load result as JSON objects.

When a response from the API is received it's stored in a hierarchy of classes: NLP (all data) => Document => Sentence => Subsentence => Token

At each level direct access it's possible to access inferior levels i.e. nlp.sentences gives access to a list of all the Sentence in the current data, while nlp.documents[0].sentences only gives the Sentence of the first Document.

NLP is iterable and will yield Document instances.

Attributes & Properties

NameTypeDescription
documentslist of Document instancesList of all the Document instances
sentenceslist of Sentence instancesDirect access to all of the Sentences instances.
subsentenceslist of Subsentence instancesDirect access to all of the Subsentence instances.
tokenslist of Token instancesDirect access to all Tokens in the subsentence
fieldslist of stringList of all common properties accessible at all levels (token, span, cluster, pos etc.)
clientinstance of ClientClient used for performing request to Lettria's API
Common propertiesdepends on propertyProperties allowing access to specific data (pos, token etc.)

NLP methods

Data analysis

Below is an overview list of the methods that can be used to manage data with the API.

MethodDescription
add_documents()Submits a batch of documents to API
add_document()Submits document to API
save_data()Saves data from json file
load_results()Loads data from json file
reset_data()Erase data and reinitialise object
add_client()Adds new client / api_key
to_annotation_format()Exports the data to the input format of our annotation platform.

add_documents()

add_documents(documents, batch_size, skip_document = False, id=None, verbose=True)

This method allows you to batch the documents in order to accelerate the calls, to be used in priority elsewhere.

Parameters:

NameTypeDescriptionOptional
documentlist of stringData to be sent to the APIFalse
batch_sizeintNumber of documents to be sent at the same time to the API. Reduce if documents are too big.32
skip_documentboolWhether to skip the document if there is a problem during processingTrue
document_idslist of strList of Id to identify the documents, by default an incrementing integer is assigned.True
verboseboolWhether to print additional statements about document processing.True

add_document()

add_document(document, skip_document = False, id=None, verbose=True)

This method allows you to batch the documents in order to accelerate the calls, to be used in priority elsewhere.

Parameters:

NameTypeDescriptionOptional
documentstring or list of stringData to be sent to the APIFalse
skip_documentboolWhether to skip the document if there is a problem during processingTrue
idstrId to identify the document, by default an incrementing integer is assigned.True
verboseboolWhether to print additional statements about document processing.True

save_results()

save_results(file = '')

Writes current results to a JSON file. If no file is specified the default path is results_X.json with X being next 'free' integer.

Parameters:

NameTypeDescriptionOptional
filestringPath of file to write in.True

load_results()

load_results(path = 'results_0', reset = False)

Loads results from a JSON file.

Parameters:

NameTypeDescriptionOptional
pathstringPath of file to load.True
resetboolWhether to erase current data.True

reset_data()

reset_data()

Erase all data inside NLP and reinitialise documents ids.

Parameters:

NameTypeDescriptionOptional
filestringPath of file to load.True
resetboolWhether to erase current data.True

add_client()

add_client(client = None, api_key = None)

Replaces current client with provided one, or creates a new client using the provided api_key.

Parameters:

NameTypeDescriptionOptional
clientinstance of Client classClient instance to replace the current one.True
api_keystringKey to use for the new client.True

to_annotation_format

to_annotation_format(output_file, attribute=None, filter_list = [], verbose=True):

Writes data to a file in the annotation format for lettria's platform.

Parameters:

NameTypeDescriptionOptional
output_filestringName of the file to write to.False
attributestringAttribute to be used to preselect tokens to annotate. Defaults to None.True
filter_listlistFilter used to compare to the chosen attribute. Defaults to [].True
verboseboolTurns on/off verbosity. Defaults to True.True