Skip to content

Lemmatizer

Introduction

The Lettria API allows you to perform lemmatization, which is the process of finding the base form of a word, called the lemma.

This is done by removing inflections (such as tense, person, number, gender, etc.) from a word.

For example, the lemma of the word "jumps" is "jump," the lemma of the word "running" is "run," and the lemma of the word "better" is "good." This is useful in natural language processing because it allows words to be compared and analyzed more easily.

Format

Lemmatizer objects can be received as either an Array() or an Object().

KeyTypeDescription Concerned Tags
conjugatelist of Conjugate ObjectsList possible conjugationsV, VP, VINF
confidencefloatlevel of confidence in the results (higher is better)*
genderGenderdescribes the gender and pluralityVP, JJ, N, D, PD
lemmaStringlemmatized version of the sourceC, CC, CLO, CLS, D, JJ, N, NP, PUNCT, P, PD, PROREL, RB, RB_WH, SYM, UH
infinitlist of Stringlist of possible verb infinitivesV, VP, VINF
transitifBooleanwhether the verb is transitive or notV, VP, VINF
NumberfloatvalueCD
modeStringmode of the verbD, PD
possessingintsee Possessive determinersD, PD
pronomintsee PronounsCLS
designationlist of Stringsee CategoriesCLO
categoryStringsee Adverb CategoriesRB
sourceStringSource of the lemmatizationRB, P
senslist of Preposition sens objectsee Preposition sensP

Examples

V

{
	"infinit": "etre",
	"gender": { "female": false, "plural": false },
	"conjugate": [{ "mode": "indicative", "temps": "past", "pronom": 1, "modality": null }],
	"transitif": true
}

RB

{ "category": ["time and aspect"] }

CLS

{ "pronom": 1 }

Next steps