TextChunk
What is TextChunk?
TextChunk is the base class of NLP, Document, Sentence and Subsentence. It offers different methods that can be accessed through children classes.
Data analysis:
METHOD | DESCRIPTION |
---|---|
vocabulary() | Returns vocabulary from current data. |
word_count() | Returns word count from current data. |
word_frequency() | Returns word frequency of current data. |
list_entities() | Returns dictionaries of detected entities by type. |
statistics() | Returns statistics about the data contained in the object |
get_emotion() | Returns emotion results at the specified hierarchical level |
get_sentiment() | Returns sentiment results at the specified hierarchical level |
word_sentiment() | Returns average sentiment for each word of the whole vocabulary |
word_emotion() | Returns average emotion for each word of the whole vocabulary |
meaning_sentiment() | Returns average sentiment for each meaning |
meaning_emotion() | Returns average emotion for each meaning |
filter_polarity() | Filters Sentence or Subsentence of the specified polarity |
filter_emotion() | Filters Sentence or Subsentence of the specified emotions |
filter_type() | Filters Sentence of the specified types |
match_pattern() | Returns matches from given patterns. |
TextChunk methods
vocabulary
vocabulary(filter_pos = None, lemma=False)
Returns vocabulary from current data with their associated POS tag i.e. if a word appears both as a verb and a noun it will be in two tuples (word, 'V'), (word, 'N'). Allows filtering by POS tags.
Parameters:
Name | Type | Description | Optional |
---|---|---|---|
filter_pos | list of string | Tags to use for filtering. If the option to filter is added it will only include words in given list of POS tags. Defaults to None. | POS tag |
lemma | string | Whether to use lemma or plain words.If the option is added the lemma will be used instead of the base words. Defaults to False. | True |
Return:
Type | Description |
---|---|
list of tuple | List of unique tuples (token, POStag). |
For a demo of vocabulary check out our tutorial ๐ง๐ปโ๐ป
word_count
word_count(filter_pos = None, lemma=False):
Returns word count from current data with their associated POS tag i.e. if a word appears both as a verb and a noun it will be in two tuples (word, 'V'), (word, 'N'). Allows filtering by POS tags.
Parameters:
Name | Type | Description | Optional |
---|---|---|---|
filter_pos | list of string | If provided it will only include words in given list of postags. Defaults to None. | True |
lemma | string | Whether to use lemma instead of base words. Defaults to False. | True |
Return:
Type | Description |
---|---|
dictionary | dictionary of word counts { (token, POStag): occurences }. |
For a demo of word_count check out our tutorial ๐จ๐ปโ๐ป
word_frequency
word_frequency(filter_pos = None, lemma=False)
Returns words or lemma frequency, allows filtering by POS tag.
Parameters:
Name | Type | Description | Optional |
---|---|---|---|
filter_pos | list of string | If provided it will only include words in given list of postags. Defaults to None. | POS tag |
lemma | bool | Whether to use lemma instead of base words. Defaults to False. | True |
Return:
Type | Description |
---|---|
dictionary | Dictionary of word frequency |
For a demo of word_frequency check out our tutorial ๐จ๐ปโ๐ป
list_entities
list_entities()
Returns dictionaries of detected entities by type.
Return:
Type | Description |
---|---|
list of dictionary | List of dictionaries of different entities at the specified level. |
For a demo of list_entities check out our tutorial ๐จ๐ปโ๐ป
statistics
statisitics()
Returns statistics about the data contained in the object
Return:
Name | Type | Description |
---|---|---|
document | integer | Returns integer of the data contained in the document. |
sentence | integer | Returns integer of the data contained in the sentence. |
subsentence | integer | Returns integer of the data contained in the subsentence. |
token | integer | Returns integer of the tokens contained in the document. |
For a demo of statistics check out our tutorial ๐ง๐ปโ๐ป
get_emotion
get_emotion(granularity = 'sentence')
Returns emotion results, granularity defines whether to use emotion by sentence or by subsentence.
Parameters:
Name | Type | Description | Optional |
---|---|---|---|
granularity | string | Whether to use emotion by 'sentence' or 'subsentence'. Defaults to None. | sentence or subsentence |
Return:
Type | Description |
---|---|
list of dict | List of dictionaries with emotions as keys and dict {'occurences','sum','average'} as values. |
For a demo of get_emotion check out our tutorial ๐จ๐ปโ๐ป
get_sentiment
get_sentiment(granularity = 'sentence')
Returns sentiment results, granularity defines whether to use sentiment by sentence or by subsentence.
Parameters:
Name | Type | Description | Optional |
---|---|---|---|
granularity | string | Whether to use sentiment by 'sentence' or 'subsentence'. Defaults to None. | sentence or subsentence |
Return:
Type | Description |
---|---|
list of dict | List of dictionaries with polarity as keys and dict {'occurences','sum','average'} as values. |
For a demo of get_sentiment check out our tutorial ๐จ๐ปโ๐ป
word_sentiment
word_sentiment(granularity = 'sentence', lemma = False, filter_pos = None, average=True)
Returns an average sentiment score for each word or lemma. For each sentence or subsentence (granularity parameter), the sentiment score is added to each of the words present. The scores are divided by the number of sentences or subsentences to get an average.
Parameters:
Name | Type | Description | Optional |
---|---|---|---|
granularity | string | Whether to use sentiment by 'sentence' or 'subsentence' for scoring. | True |
lemma | bool | Whether to use lemma or plain words. | True |
filter_pos | list of string | POStags to use for filtering. | True |
average | bool | Whether to return average or list of values. | True |
Return:
Type | Description |
---|---|
dictionary | Dictionary with words as keys and sentiment as value |
For a demo of word_sentiment check out our tutorial ๐จ๐ปโ๐ป
word_emotion
word_emotion(granularity = 'sentence', lemma = False, filter_pos = None, average=True)
Returns the average score for each emotion for each word or lemma in the vocabulary. For each sentence or subsentence (granularity parameter), the emotion scores are added to each of the words present. The scores are divided by the number of sentences or subsentences to get an average (or list of values if 'average' == False).
Parameters:
Name | Type | Description | Optional |
---|---|---|---|
granularity | string | Whether to use emotion by 'sentence' or 'subsentence' for scoring. Defaults to None. | True |
lemma | bool | Whether to use lemma instead of base words. Defaults to False. | True |
filter_pos | list of string | If provided it will only include words in given list of postags. Defaults to None. | True |
average | bool | Whether to return average or list of values. Defaults to True. | True |
Return:
Type | Description |
---|---|
dictionary | Dictionary with (words, POS tag) as keys and a dictionary with emotion scores as value. |
Example return
{
('patients', 'N'): -0.4917,
('male', 'N'): -0.4275,
('age', 'N'): -0.5167,
('cure', 'N'): 0.6421
}
For a demo of word_emotion check out our tutorial ๐จ๐ปโ๐ป
meaning_sentiment
meaning_sentiment(granularity='sentence', filter_meaning=None, average=True)
Returns average sentiment score for each meaning For each sentence or subsentence(granularity parameter), the sentiment score is added to each of the meaning present. The scores are divided by the number of sentences or subsentences to get an average. This can be used with custom meaning to get the sentiment associated with a particular meaning, for example 'customer service' or 'pricing' when analyzing customer reviews.
Parameters:
Name | Type | Description | Optional |
---|---|---|---|
granularity | string | Whether to use sentiment by 'sentence' or 'subsentence'. Defaults to None. | True |
filter_meaning | list of string | If provided it will only include meanings in given list. Defaults to None. | True |
average | bool | Whether to return average or list of values. Defaults to True. | True |
Return:
Type | Description |
---|---|
dictionary | Dictionary with meanings as keys and sentiment as value |
Example return
{
('patients', 'N'): {'surprise': 0.753, 'neutral': 0.445},
('male', 'N'): {'neutral': 0.8},
('surgery', 'N'): {'sadness': 0.79}
}
For a demo of meaning_sentiment check out our tutorial ๐จ๐ปโ๐ป
meaning_emotion
meaning_emotion(granularity='sentence', filter_meaning=None, average=True)
Returns average emotion scores for each meaning. For each sentence or subsentence(granularity parameter), the score for each emotion is added to each of the meaning. The scores are divided by the number of sentences or subsentences to get an average. This can be used with custom meaning to get the emotion associated with a particular meaning, for example 'customer service' or 'pricing' when analyzing customer reviews.
Parameters:
Name | Type | Description | Optional |
---|---|---|---|
granularity | string | Whether to use emotion by 'sentence' or 'subsentence'. Defaults to None. | True |
filter_meaning | list of string | If provided it will only include meanings in given list. Defaults to None. | True |
average | bool | Whether to return average or a list of values. Defaults to True. | True |
Return:
Type | Description |
---|---|
dictionary | Dictionary with meanings as keys and sentiment as value |
For a demo of meaning_emotion check out our tutorial ๐จ๐ปโ๐ป
filter_polarity
filter_polarity(polarity, granularity='sentence')
Filters Sentence or Subsentence of the specified polarity.
Parameters:
Name | Type | Description | Optional |
---|---|---|---|
polarity | string or list of string | Polarity, 'neutral', 'positive', 'negative'. | False |
granularity | string | Whether to use sentiment by 'sentence' or 'subsentence'. Defaults to None. | True |
Return:
Type | Description |
---|---|
list of instances of Sentence or Subsentence | List of instances of objects with the specified polarity. |
For a demo of filter_polarity check out our tutorial ๐จ๐ปโ๐ป
filter_emotion
filter_emotion(emotions, granularity='sentence')
Filters Sentence of the specified emotions.
Parameters:
Name | Type | Description | Optional |
---|---|---|---|
emotions | string or list of string | Emotions to filter, one of 'joy', 'love', 'surprise', 'anger', 'sadness', 'fear' or 'neutral'. | False |
granularity | string | Whether to use sentiment by 'sentence' or 'subsentence' for scoring. | True |
Return:
Type | Description |
---|---|
list of instances of Sentence or Subsentence | List of instances of objects with the specified emotion. |
For a demo of filter_emotion check out our tutorial ๐จ๐ปโ๐ป
filter_type
filter_type(sentence_type)
Filters Sentence of the specified emotions.
Parameters:
Name | Type | Description | Optional |
---|---|---|---|
sentence_type | string or list of string | Types to filter, one of 'assert', 'command', 'question_open', 'question_closed'. | False |
Return:
Type | Description |
---|---|
list of instances of Sentence | List of instances of Sentence with the specified type. |
For a demo of filter_type check out our tutorial ๐จ๐ปโ๐ป
match_pattern
match_pattern(self, patterns_json, level = None, print_tree=False, skip_errors=False)
Match given pattern (either Token Pattern or Dependency Pattern) on the current TextChunk object.
The 'level' argument specifies on which level the matching should be done, i.e. on the document level (returns matches per document), on the sentence or subsentence level. The default level is one level below in the hierarchy, document for NLP class, sentence for Document class and subsentence for Sentence class.
For more information on patterns look at the dedicated section: Patterns.
Parameters:
Name | Type | Description | Optional |
---|---|---|---|
patterns_json | dictionary | Token Pattern or Dependency Pattern | False |
level | string | Level on which matching is done, one of 'document', 'sentence', 'subsentence'. Defqults to none. | True |
print_tree | bool | Prints dependency tree. Defaults to False. | True |
skip_errors | bool | Whether to skip or raise errors. Defaults to False. matching. | True |
Return:
Type | Description |
---|---|
list of tuple | List of tuple (TextChunk object, match dictionary) |
For a demo of match_pattern check out our tutorial ๐จ๐ปโ๐ป