Skip to main content
Version: 2.0

Document Class

Document inherits from TextChunk.

Document stores the information for a document (for example an online review for a product or a news article). The class is iterable and will yield instances of Sentence.

Attributes / Properties

sentenceslist of Sentence instancesList of Sentences in the document.
subsentenceslist of Subsentence instancesDirect access to list of Subsentence for the document.
idintegerId of document, by default sequential integer if not provided.
common propertiesdepends on propertyProperties allowing access to specific data (pos, token etc.).

Document methods

Below is an overview list of the methods that can be used to manage data with the API.

replace_coreference()Performs coreference resolution, replaces spans by the head of their cluster.


replace_coreference(self, attribute = 'source', replace=['CLS']) -> list:

Replaces coreference mentions with the head of the cluster in the text


attributestringAttribute to get. Defaults to 'source'.
replacelistDefines what kind of spans get replaced by head span. Defaults to ['CLS'].


ListList of sentences with the desired attribute information and replacement.

For a demo of document_class check out our tutorial ๐Ÿ‘จ๐Ÿปโ€๐Ÿ’ป