Skip to main content
Version: 2.0

Document Class

Document inherits from TextChunk.

Document stores the information for a document (for example an online review for a product or a news article). The class is iterable and will yield instances of Sentence.

Attributes / Properties

NameTypeDescription
sentenceslist of Sentence instancesList of Sentences in the document.
subsentenceslist of Subsentence instancesDirect access to list of Subsentence for the document.
idintegerId of document, by default sequential integer if not provided.
common propertiesdepends on propertyProperties allowing access to specific data (pos, token etc.).

Document methods

Below is an overview list of the methods that can be used to manage data with the API.

MethodDescription
replace_coreference()Performs coreference resolution, replaces spans by the head of their cluster.

replace_coreference

replace_coreference(self, attribute = 'source', replace=['CLS']) -> list:

Replaces coreference mentions with the head of the cluster in the text

Parameters:

NameTypeDescription
attributestringAttribute to get. Defaults to 'source'.
replacelistDefines what kind of spans get replaced by head span. Defaults to ['CLS'].

Return:

TypeDescription
ListList of sentences with the desired attribute information and replacement.

For a demo of document_class check out our tutorial ๐Ÿ‘จ๐Ÿปโ€๐Ÿ’ป