Skip to main content
Version: 2.0

TextChunk

What is TextChunk?

TextChunk is the base class of NLP, Document, Sentence and Subsentence. It offers different methods that can be accessed through children classes.

Data analysis:

METHODDESCRIPTION
vocabulary()Returns vocabulary from current data.
word_count()Returns word count from current data.
word_frequency()Returns word frequency of current data.
list_entities()Returns dictionaries of detected entities by type.
statistics()Returns statistics about the data contained in the object
get_emotion()Returns emotion results at the specified hierarchical level
get_sentiment()Returns sentiment results at the specified hierarchical level
word_sentiment()Returns average sentiment for each word of the whole vocabulary
word_emotion()Returns average emotion for each word of the whole vocabulary
meaning_sentiment()Returns average sentiment for each meaning
meaning_emotion()Returns average emotion for each meaning
filter_polarity()Filters Sentence or Subsentence of the specified polarity
filter_emotion()Filters Sentence or Subsentence of the specified emotions
filter_type()Filters Sentence of the specified types
match_pattern()Returns matches from given patterns.

TextChunk methods

vocabulary

vocabulary(filter_pos = None, lemma=False)

Returns vocabulary from current data with their associated POS tag i.e. if a word appears both as a verb and a noun it will be in two tuples (word, 'V'), (word, 'N'). Allows filtering by POS tags.

Parameters:

NameTypeDescriptionOptional
filter_poslist of stringTags to use for filtering. If the option to filter is added it will only include words in given list of POS tags. Defaults to None.POS tag
lemmastringWhether to use lemma or plain words.If the option is added the lemma will be used instead of the base words. Defaults to False.True

Return:

TypeDescription
list of tupleList of unique tuples (token, POStag).

For a demo of vocabulary check out our tutorial ๐Ÿง‘๐Ÿปโ€๐Ÿ’ป

word_count

word_count(filter_pos = None, lemma=False):

Returns word count from current data with their associated POS tag i.e. if a word appears both as a verb and a noun it will be in two tuples (word, 'V'), (word, 'N'). Allows filtering by POS tags.

Parameters:

NameTypeDescriptionOptional
filter_poslist of stringIf provided it will only include words in given list of postags. Defaults to None.True
lemmastringWhether to use lemma instead of base words. Defaults to False.True

Return:

TypeDescription
dictionarydictionary of word counts { (token, POStag): occurences }.

For a demo of word_count check out our tutorial ๐Ÿ‘จ๐Ÿปโ€๐Ÿ’ป

word_frequency

word_frequency(filter_pos = None, lemma=False)

Returns words or lemma frequency, allows filtering by POS tag.

Parameters:

NameTypeDescriptionOptional
filter_poslist of stringIf provided it will only include words in given list of postags. Defaults to None.POS tag
lemmaboolWhether to use lemma instead of base words. Defaults to False.True

Return:

TypeDescription
dictionaryDictionary of word frequency

For a demo of word_frequency check out our tutorial ๐Ÿ‘จ๐Ÿปโ€๐Ÿ’ป

list_entities

list_entities()

Returns dictionaries of detected entities by type.

Return:

TypeDescription
list of dictionaryList of dictionaries of different entities at the specified level.

For a demo of list_entities check out our tutorial ๐Ÿ‘จ๐Ÿปโ€๐Ÿ’ป

statistics

statisitics()

Returns statistics about the data contained in the object

Return:

NameTypeDescription
documentintegerReturns integer of the data contained in the document.
sentenceintegerReturns integer of the data contained in the sentence.
subsentenceintegerReturns integer of the data contained in the subsentence.
tokenintegerReturns integer of the tokens contained in the document.

For a demo of statistics check out our tutorial ๐Ÿง‘๐Ÿปโ€๐Ÿ’ป

get_emotion

get_emotion(granularity = 'sentence')

Returns emotion results, granularity defines whether to use emotion by sentence or by subsentence.

Parameters:

NameTypeDescriptionOptional
granularitystringWhether to use emotion by 'sentence' or 'subsentence'. Defaults to None.sentence or subsentence

Return:

TypeDescription
list of dictList of dictionaries with emotions as keys and dict {'occurences','sum','average'} as values.

For a demo of get_emotion check out our tutorial ๐Ÿ‘จ๐Ÿปโ€๐Ÿ’ป

get_sentiment

get_sentiment(granularity = 'sentence')

Returns sentiment results, granularity defines whether to use sentiment by sentence or by subsentence.

Parameters:

NameTypeDescriptionOptional
granularitystringWhether to use sentiment by 'sentence' or 'subsentence'. Defaults to None.sentence or subsentence

Return:

TypeDescription
list of dictList of dictionaries with polarity as keys and dict {'occurences','sum','average'} as values.

For a demo of get_sentiment check out our tutorial ๐Ÿ‘จ๐Ÿปโ€๐Ÿ’ป

word_sentiment

word_sentiment(granularity = 'sentence', lemma = False, filter_pos = None, average=True)

Returns an average sentiment score for each word or lemma. For each sentence or subsentence (granularity parameter), the sentiment score is added to each of the words present. The scores are divided by the number of sentences or subsentences to get an average.

Parameters:

NameTypeDescriptionOptional
granularitystringWhether to use sentiment by 'sentence' or 'subsentence' for scoring.True
lemmaboolWhether to use lemma or plain words.True
filter_poslist of stringPOStags to use for filtering.True
averageboolWhether to return average or list of values.True

Return:

TypeDescription
dictionaryDictionary with words as keys and sentiment as value

For a demo of word_sentiment check out our tutorial ๐Ÿ‘จ๐Ÿปโ€๐Ÿ’ป

word_emotion

word_emotion(granularity = 'sentence', lemma = False, filter_pos = None, average=True)

Returns the average score for each emotion for each word or lemma in the vocabulary. For each sentence or subsentence (granularity parameter), the emotion scores are added to each of the words present. The scores are divided by the number of sentences or subsentences to get an average (or list of values if 'average' == False).

Parameters:

NameTypeDescriptionOptional
granularitystringWhether to use emotion by 'sentence' or 'subsentence' for scoring. Defaults to None.True
lemmaboolWhether to use lemma instead of base words. Defaults to False.True
filter_poslist of stringIf provided it will only include words in given list of postags. Defaults to None.True
averageboolWhether to return average or list of values. Defaults to True.True

Return:

TypeDescription
dictionaryDictionary with (words, POS tag) as keys and a dictionary with emotion scores as value.

Example return

{
('patients', 'N'): -0.4917,
('male', 'N'): -0.4275,
('age', 'N'): -0.5167,
('cure', 'N'): 0.6421
}

For a demo of word_emotion check out our tutorial ๐Ÿ‘จ๐Ÿปโ€๐Ÿ’ป

meaning_sentiment

meaning_sentiment(granularity='sentence', filter_meaning=None, average=True)

Returns average sentiment score for each meaning For each sentence or subsentence(granularity parameter), the sentiment score is added to each of the meaning present. The scores are divided by the number of sentences or subsentences to get an average. This can be used with custom meaning to get the sentiment associated with a particular meaning, for example 'customer service' or 'pricing' when analyzing customer reviews.

Parameters:

NameTypeDescriptionOptional
granularitystringWhether to use sentiment by 'sentence' or 'subsentence'. Defaults to None.True
filter_meaninglist of stringIf provided it will only include meanings in given list. Defaults to None.True
averageboolWhether to return average or list of values. Defaults to True.True

Return:

TypeDescription
dictionaryDictionary with meanings as keys and sentiment as value

Example return

{
('patients', 'N'): {'surprise': 0.753, 'neutral': 0.445},
('male', 'N'): {'neutral': 0.8},
('surgery', 'N'): {'sadness': 0.79}
}

For a demo of meaning_sentiment check out our tutorial ๐Ÿ‘จ๐Ÿปโ€๐Ÿ’ป

meaning_emotion

meaning_emotion(granularity='sentence', filter_meaning=None, average=True)

Returns average emotion scores for each meaning. For each sentence or subsentence(granularity parameter), the score for each emotion is added to each of the meaning. The scores are divided by the number of sentences or subsentences to get an average. This can be used with custom meaning to get the emotion associated with a particular meaning, for example 'customer service' or 'pricing' when analyzing customer reviews.

Parameters:

NameTypeDescriptionOptional
granularitystringWhether to use emotion by 'sentence' or 'subsentence'. Defaults to None.True
filter_meaninglist of stringIf provided it will only include meanings in given list. Defaults to None.True
averageboolWhether to return average or a list of values. Defaults to True.True

Return:

TypeDescription
dictionaryDictionary with meanings as keys and sentiment as value

For a demo of meaning_emotion check out our tutorial ๐Ÿ‘จ๐Ÿปโ€๐Ÿ’ป

filter_polarity

filter_polarity(polarity, granularity='sentence')

Filters Sentence or Subsentence of the specified polarity.

Parameters:

NameTypeDescriptionOptional
polaritystring or list of stringPolarity, 'neutral', 'positive', 'negative'.False
granularitystringWhether to use sentiment by 'sentence' or 'subsentence'. Defaults to None.True

Return:

TypeDescription
list of instances of Sentence or SubsentenceList of instances of objects with the specified polarity.

For a demo of filter_polarity check out our tutorial ๐Ÿ‘จ๐Ÿปโ€๐Ÿ’ป

filter_emotion

filter_emotion(emotions, granularity='sentence')

Filters Sentence of the specified emotions.

Parameters:

NameTypeDescriptionOptional
emotionsstring or list of stringEmotions to filter, one of 'joy', 'love', 'surprise', 'anger', 'sadness', 'fear' or 'neutral'.False
granularitystringWhether to use sentiment by 'sentence' or 'subsentence' for scoring.True

Return:

TypeDescription
list of instances of Sentence or SubsentenceList of instances of objects with the specified emotion.

For a demo of filter_emotion check out our tutorial ๐Ÿ‘จ๐Ÿปโ€๐Ÿ’ป

filter_type

filter_type(sentence_type)

Filters Sentence of the specified emotions.

Parameters:

NameTypeDescriptionOptional
sentence_typestring or list of stringTypes to filter, one of 'assert', 'command', 'question_open', 'question_closed'.False

Return:

TypeDescription
list of instances of SentenceList of instances of Sentence with the specified type.

For a demo of filter_type check out our tutorial ๐Ÿ‘จ๐Ÿปโ€๐Ÿ’ป

match_pattern

match_pattern(self, patterns_json, level = None, print_tree=False, skip_errors=False)

Match given pattern (either Token Pattern or Dependency Pattern) on the current TextChunk object.

The 'level' argument specifies on which level the matching should be done, i.e. on the document level (returns matches per document), on the sentence or subsentence level. The default level is one level below in the hierarchy, document for NLP class, sentence for Document class and subsentence for Sentence class.

For more information on patterns look at the dedicated section: Patterns.

Parameters:

NameTypeDescriptionOptional
patterns_jsondictionaryToken Pattern or Dependency PatternFalse
levelstringLevel on which matching is done, one of 'document', 'sentence', 'subsentence'. Defqults to none.True
print_treeboolPrints dependency tree. Defaults to False.True
skip_errorsboolWhether to skip or raise errors. Defaults to False. matching.True

Return:

TypeDescription
list of tupleList of tuple (TextChunk object, match dictionary)

For a demo of match_pattern check out our tutorial ๐Ÿ‘จ๐Ÿปโ€๐Ÿ’ป