Glossary
Introduction
This glossary defines key terms used in NLP, knowledge graphs, ontology design, machine learning, and AI-powered document intelligence, with a focus on regulated industries such as insurance, finance, and legal. Each entry provides a precise definition, relevant examples, and references to standards or best practices where applicable.
A
accuracy (noun)
Metric that measures the proportion of correct predictions (true positives + true negatives) among all predictions (true positives + true negatives + false positives + false negatives). See also F1 score; F-measure; precision; recall.
B
blank node (noun)
RDF* node* that allows users to not specify a resource URI* when it is not known, whilst still letting them express things about that resource within a knowledge* graph.
C
camel case (noun)
Naming convention which consists of writing phrases without spaces or punctuation and with capitalized words, with an initial capital for the upper camel case (or Pascal case) and with an initial lower case for the lower camel case (or dromedary case).
Note: In ontologies, the upper camel case is used for class names and the lower camel case for property names.
chunk (noun)
Part of a text divided according to size (number of words) or semantic criteria, small enough to be processed by a model.
chunk retrieval (noun)
Retrieval* which processes information in the form of chunks*.
chunk retriever (noun)
Retriever* which processes information in the form of chunks*.
chunk threshold (noun)
RAG* parameter used to filter chunks* according to a relevance score, which is calculated using the similarity between the chunk and the query or any other weighting method.
chunk top k (noun)
RAG* parameter aimed to filter the number of most relevant chunks* to be used.
Note: In mathematics and science, k is often used to represent a constant or an arbitrary positive integer. For example, in a list or set, k can represent the number of elements you wish to extract or consider.
class (noun)
Category of concrete or abstract things (object, person, animal, idea, etc.) represented in an ontology*, which can be divided into subclasses* and which groups together individuals*.
Example: A vehicle ontology contains classes such as Train, Car, Two-wheeler, Aircraft, Boat and so on.
Note: Class names are written in upper camel* case.
collection (noun)
Set of documents* used for a project in the Lettria platform.
Synonym: data collection.
Note: Not to be confused with dataset*.
community (noun)
Set of nodes* grouped by semantic proximity.
context (noun)
Element or set of elements of information provided by the retriever* in response to a query* and from which an answer is written by the LLM*.
D
data collection (noun)
See collection.
data model (noun)
Abstract framework used to describe the fundamental structure and organization of data in an information system.
data property or datatype property (noun)
Property* (1.) used to indicate a characteristic of a class* or an individual* by associating a value* (1.) with it.
Example: In a geographic ontology, the HasPopulation attribute is used to indicate the number of inhabitants of a geographic entity, with the class Place as the domain* (1) and an integer as the value*.
data source (noun)
See source.
dataset (noun)
Set of data for model training.
Note: Not to be confused with collection.
datatype (noun)
Type of literal value* (1.) that can be assigned to a data* property.
See also XSD datatype.
disambiguation (noun)
Process of distinguishing the actual meaning of a linguistic unit (word, phrase, sentence, etc.) from several possible meanings.
See also entity disambiguation; named entity disambiguation; word sense disambiguation.
document (noun)
User file until it is parsed* by Lettria.
See also source.
domain (noun)
1. Class* or group of classes to which a property* (1.) can be applied.
Example: In a geographic ontology, the hasForCapitalCity property has the Country class as its domain.
See also range.
2. Sector of activity or knowledge field.
Example: Finance, healthcare and e- commerce are some domains.
dromedary case (noun)
See camel case.
Note: To talk about the type of element that can link two nodes, without any additional information, use relation* type.
E
edge (noun)
Element that connects two nodes* in a graph*, enriched with additional information such as properties* (2.).
Example: In the graphical representation of the sentence This 9 cm needle is a useful complementary instrument to the device, the edge length links the node needle to the node 9 cm, the edge hasQuality links the node needle to the node useful and the edge complementaryTo links the node needle to the node device.
to embed (transitive verb)
To represent (sthg) in vectors*, especially for the creation of a RAG*.
embedder (noun)
Algorithm for embedding*.
embedding (noun)
Representation in vectors*, especially of semantic data for the creation of a RAG*. Examples:
- Written data, nodes, edges or even entire graphs can be transformed into vectors using embedding.
- A graph embedding is a representation of a graph in vectors.
end-to-end (adjective)
Designed to handle all stages of a system or operation, from start to finish, without requiring intermediary steps or additional external processes.
Examples: End-to-end solution. End-to-end tests. End-to-end machine learning model.
Abbreviation: E2E.
entity (noun)
Any distinct element of the world, concrete or abstract.
Example: In the sentence Angela Merkel always indulges in daydreaming, listens to ABBA and reads a book when she takes the train to London, there are six entities: daydreaming, book, train, Angela Merkel, ABBA and London. The last three are named* entities.
Note: In graphs*, entities are represented by nodes*.
See also named entity; tailed entity.
entity disambiguation (noun)
Disambiguation* of entities* in a text.
See also named entity disambiguation.
entity linking (noun)
Process of linking the entities* of a text to the corresponding elements of a knowledge* base.
Example: In an entity linking process that links entities to elements of the Wikipedia knowledge base, occurrences of bow that designate the foremost part of a ship or a boat will be linked to Bow (watercraft) while those that designate the rower seated in the bow of a racing shell will be linked to Bow (position).
See also named entity linking.
entity resolution (noun)
Task of identifying, linking and merging records that correspond to the same real-world entities* across different data sources to uniquely represent an entity across datasets.
expander (noun)
RAG* component used to enrich the data extracted by the vector* retriever with a certain number of relations* (2.).
F
F-measure (noun)
Weighted version of the F1* score that gives more or less weight to recall* or precision*, depending on the context: Fβ = 1+β2 x ((precision x recall) / ((β2 x precision) + recall)). β controls the relative importance of recall and precision:
- If β > 1, recall is favored (more emphasis is placed on detecting true positives).
- If β < 1, precision is favored (we prefer to avoid false positives).
- The F1 score is a special case of the F-measure with β = 1 (precision and recall are equally weighted).
Notes:
- An F2-measure (β = 2) would be useful in a task where recall is twice as important as precision (such as the detection of serious diseases).
- An F0.5-measure (β = 0.5) would favor precision (as in recommender systems where error is expensive).
See also accuracy.
F1 score (noun)
Metric that combines precision* and recall* F1 = 2 x ((precision x recall) / (precision + recall)).
Note: F1 score is particularly useful when classes are unbalanced, as it balances the two metrics, precision and recall. If precision is low but recall is high, or vice versa, the F1 score reflects this balance. It is useful for evaluating a model when precision or recall cannot be given exclusive priority.
See also accuracy; F-measure.
G
GRAG or G-RAG (noun)
See graph-RAG.
graph (noun)
Structure composed by nodes* linked together through edges*.
See also knowledge graph; property graph; RDF; semantic graph.
graph retrieval (noun)
Retrieval* which processes information structured in graphs*.
graph retriever (noun)
Retriever* which processes information structured in graphs*.
graph threshold (noun)
RAG* parameter used to filter triples* (2.) according to a relevance score, which is calculated using the similarity between the chunk and the query or any other weighting method.
graph top k (noun)
RAG* parameter indicating the number of triples* (2.) to be extracted by the vector* retriever.
graph-RAG (noun)
RAG* which processes information mainly structured in graphs*.
Abbreviations: GRAG; G-RAG.
ground truth
1. (noun) Data considered to be correct, used as a reference for evaluating other data.
2. (adjective) That is considered to be correct and is used as a reference.
Example: Ground truth answers.
H
-
I
identifier (noun)
String used to uniquely identify an item on a network, particularly on the Web.
See also URI.
individual (noun)
Member of a class*, in an ontology*.
Example: In a geographic ontology, Belgium is a individual of the class Country and Berlin is an individual of the class City.
IRI (Internationalized Resource Identifier) (noun)
1. Standard format for identifiers* on the Internet which generalizes and internationalizes URI* by accepting thousands of Unicode characters (UTF-8 encoding).
2. Any identifier in this format.
J
-
K
KG (noun)
See knowledge graph.
knowledge base (noun)
Organized resource used to collect, store and manage information and which may contain several knowledge* graphs.
knowledge graph (noun)
Graph* based on the structure and constraints of an ontology*.
Abbreviation: KG.
See also knowledge base.
L
label (noun)
1. Name used to designate class* members.
Note: The label must be distinguished from the class name.
Example: Two different classes, such as bat_(animal) and bat_(instrument), can have the same label, bat.
2. Metadata* associated with nodes* and edges*.
labeled property graph (noun)
Graph* with labels* (2.).
Abbreviation: LPG.
language model (noun)
Statistical or machine-learning model designed to understand and generate text in natural language.
See also LLM.
large language model (noun)
See LLM.
LLM (Large Language Model) (noun)
Language* model trained on very large text corpora and containing a very large number of parameters to understand and generate complex language data.
lower camel case (noun)
See camel case.
LPG (noun)
See labeled property graph.
M
max triplets (noun)
RAG* parameter indicating the maximum number of triples* (2.) to be used by the LLM* for final generation, including those returned by the vector* retriever (see graph* top k) and those added by the expander*.
metadata (noun)
Formal, standardized and structured data used to describe and process the content of digital data.
Example: In the metadata of a text, you can indicate the author's name or the year of publication.
N
n hops (noun)
Expander* parameter used to specify the number of steps (hops) to go through the graph* from a start node* to retrieve relations* (2.).
named entity (noun)
Entity* designated by a proper name (such as persons, organization, geographical units, artworks, etc.).
Example: In the sentence Angela Merkel always indulges in daydreaming, listens to ABBA and reads a book when she takes the train to London, there are three named entities: Angela Merkel, ABBA and London.
See also named entity disambiguation; named entity linking; named entity recognition.
named entity disambiguation (noun)
Disambiguation* of named* entities in a text.
named entity linking (noun)
Process of linking the named* entities of a text to the corresponding elements of a knowledge* base.
Example: In a named entity linking process that links named entities to elements of the Wikipedia knowledge base, occurrences of Orange that designate the French city will be linked to Orange, Vaucluse while those that designate the French company will be linked to Orange S.A.
Abbreviation: NEL.
named entity recognition (noun)
Process of recognizing and classifying the named* entities of a text.
Abbreviation: NER.
NEL (noun)
See named entity linking.
NER (noun)
See named entity recognition.
node (noun)
Element of a graph* representing an entity*.
Example: In the graphical representation of the sentence This 9 cm needle is a useful complementary instrument to the device, 9 cm, useful, needle, instrument, and device are represented by nodes.
See also blank node; edge.
O
object (noun)
Third element of a triple* (1.), the one that provides specific information on the subject* via the predicate*.
Example: In the triple Moby-Dick / hasAuthor / Herman Melville, Herman-Melville is the object, it precisely indicates the information given about the subject, the type of which is specified by the predicate.
object property (noun)
Property* (1.) that provides information about a class* or an individual* by linking it to another class or individual.
Example: In a geographic ontology, the hasCapitalCity object property is used to indicate the capital city of a country, with the class Country as the domain (1.)* and the class City as the range*.
ontology (noun)
Knowledge representation system, more complex than a taxonomy*, used to organize hierarchically the concepts of a domain* (2.) represented by classes* and individuals*, and to give information about them through properties* (1.) and logical axioms.
ontology alignment (noun)
Task of determining the correspondences between classes*, individuals* and properties* of different ontologies*.
Synonym: ontology matching.
ontology matching (noun)
See ontology alignment.
ontology population (noun)
Task of identifying instances of concepts* and properties* of an ontology*.
origin (noun)
Text element or set of text elements from which a graph* element originates.
OWL (Web Ontology Language) (noun)
Knowledge representation language in RDF* format, designed to define complex ontologies*.
Example: OWL allows the definition of logical rules.
P
to parse (transitive verb)
To proceed to the parsing* of (a document*).
parser (noun)
Tool that performs a parsing*.
parsing (noun)
Process of analyzing, structuring and enriching a document* so that it becomes an exploitable source* for information processing.
Pascal case (noun)
See camel case.
payload (noun)
All the information we have on an element of a graph* (its name, type, properties, ID of the chunk, etc.).
precision (noun)
Metric that measures the proportion of correct positive predictions (true positives) among all positive predictions (true positives + false positives).
Note: A high-precision model makes few errors in classifying a sample as “positiveˮ. This is particularly important in cases where false positives are costly (for example, in disease screening).
See also accuracy; recall; F1 score; F-measure.
predicate (noun)
Second element of a triple* (1.), indicating the type of information given about the first one (the subject*).
See also property (1.).
Example: In the triple Moby-Dick / hasAuthor / Herman Melville, hasAuthor is the predicate, it indicates the type of information given on the subject Moby-Dick, in this case its author.
prompt (noun)
Natural language command to a LLM*.
Note: Not to be confused with query or question.
property (noun)
1. Typed and named element of an ontology that allows to provide a specified information about a class* or an individual* by linking it to another class or individual, in the case of an object* property, or by assigning a value* (1.) to it, in the case of an data* property.
Note: Property names are written in lower camel* case.
See also predicate.
2. Information about an entity* or an edge* in a graph*.
Example: In a graph, the color or quantity of an entity is represented by a property.
See also value (2.).
property graph (noun)
Graph* whose edges* can have properties* (2.).
Q
query (noun)
What is formulated in natural language or in a specific language to obtain information from a knowledge* base.
Note: Not to be confused with prompt or question.
question (noun)
What is asked in natural language by a Lettriaʼs platform user to obtain information.
Note: Not to be confused with prompt or query.
R
RAG (Retrieval-Augmented Generation) (noun)
technique that combines information retrieval* and text generation approaches to improve the quality and relevance of responses generated by a language* model.
See also graphRAG; vector RAG.
range (noun)
Type of element that can be associated with a domain* (1.) by means of a property* (1.), either a class* or a set of classes in the case of an object* property, or a datatype* in the case of a data* property.
Example: In a geographic ontology, the hasForCapitalCity relation has the Country class as its domain and the City class as its range. The hasArea attribute has the Place class as its domain and an integer as its range.
RDF (Resource Description Framework) (noun)
1. Model for representing data as triples, enabling this data to be processed automaticall
2. Any graph that respects this model.
See also RDF Schema.
RDF Schema (noun)
Knowledge representation language using the RDF* format and providing basic elements for defining ontologies.
Example: RDF Schema allows the definition of classes* and a hierarchy between them.
Abbreviation: RDFS.
RDFS (noun)
See RDF Schema.
recall (noun)
Metric that measures the proportion of correct positive predictions among all positive cases (true positives + false negatives).
Note: A model with high recall detects almost all true positives, even at the cost of a certain number of false positives. This is crucial in situations where false negatives are unacceptable (for example, a cancer detection model needs to identify as many cancer cases as possible, even if this means generating a few unnecessary alerts).
reification (noun)
Process of reifying* (sthg).
Example: Without reification, the information The machine weighs 5 kilos is formalized with a single individual, the machine, which is associated with a value, 5 kilos, via a data property. With reification, the information is formalized with two individuals: the machine and its weight, linked by an object property. The weight of the machine is associated on the one hand with a value (5), and on the other hand with a unit of measurement (kilograms) with data properties.
to reify (transitive verb)
To transform (sthg) into an object; specifically, to transform (a complex value) into an individual*, thus being able to be involved in an object* property and be the subject of a data* property.
See also reification (example).
relation or relationship (noun)
1. See object property.
2. Set formed by two nodes* and the edge* that connects them, enriched with additional information, in particular properties* (2.).
Note: To describe a set of two nodes and the information that links them, but without the additional information, use the word triple* (2.).
3. See edge.
relation type (noun)
Type of element of a graph* that can link two nodes*, without any additional information.
Note: To talk about a particular element that connects two particular nodes with additional information like properties* (2.), use the word edge*.
reranker (noun)
Tool that performs a reranking*.
reranking (noun)
Process of re-classifying items found by a retriever* in order of relevance.
retrieval (noun)
Process of searching for and extracting relevant data from a vast set of resources, often before the generation of an answer to a specific query* in a RAG*.
See also chunk retrieval; graph retrieval.
retriever (noun)
Tool that performs a retrieval*.
See also chunk retriever; graph retriever; vector retriever.
S
section (noun)
Part of a source* with a logical unit.
Example: Paragraphs can be sections.
Note: A document is divided into sections during parsing*.
semantic graph (noun)
Graph* based on controlled properties* (1.) but without the structure of an ontology*.
source (noun)
Any element that generates information, whether it's a document* after parsing* or a graph* generated from the documents.
Synonym: data source.
See also section.
statement (noun)
See triple (1.).
subclass (noun)
Class* more specific than another.
Example: In a vehicle ontology, Train, Subway and Tramway are subclasses of the class RailVehicle.
subject (noun)
First element of a triple* (1.), the one to which the information relates.
Example: In the triple Moby-Dick / hasAuthor / Herman Melville, Moby-Dick is the subject, it's the thing the information is about.
T
T2G (noun)
See text-to-graph.
tailed entity (noun)
Entity* that is unknown by the used knowledge base.
taxonomy (noun)
Hierarchical classification of things or ideas.
Example: The classification of living beings is a taxonomy.
See also ontology.
text-to-graph (noun)
Solution that transforms natural language data into a knowledge* graph.
Abbreviation: T2G.
Text2KG Bench (proper noun)
Dataset* designed to evaluate the capabilities of models to generate knowledge* graphs from natural language text guided by an ontology*.
transformer (noun)
Embedding*-based deep-learning architecture used by LLMs* such as GPT or BERT.
triple (noun)
1. Ontology* information unit formed by a subject*, a predicate* and an object*.
Example: Moby-Dick / hasAuthor / Herman Melville is a triple in which we indicate information about the subject (Moby-Dick), this information being typed by the predicate (hasAuthor) and named by the object (Herman Melville).
Synonym: statement.
2. Part of a graph* made up of two nodes* and the relation* type that connects them, but without any additional information, unlike the relation* (2.).
triplestore (noun)
Database made up of triples* (1.).
Note: Triplestores are opposed, for example, to relational databases made up of data tables.
ttl (noun)
See turtle.
turtle (noun)
1. Syntax and file format for data representation in RDF* format (file extension: .ttl).
2. Any file in this format.
U
upper camel case (noun)
See camel case.
URI (Uniform Resource Identifier) (noun)
1. Standard format for identifiers* on the Internet.
2. Any identifier in this format.
See also IRI.
Notes:
- Like URLs, URIs correspond to unique resources, but unlike URLs, URIs do not necessarily correspond to resources available on the Web.
- A string beginning with http:// is a valid URI and, on the semantic Web, identifiers are formatted as http URIs.
use case (noun)
1. Concrete situation in which a specific consumer uses a solution to meet a need.
2. Sentence that describes this situation, which can be used as system input to define functional requirements and, specifically in knowledge technologies, to guide information processing.
value (noun)
1. Specific data assigned to an attribute* in an ontology* and subject to defined datatypes*.
2. Property* (2.) type, in a graph*.
V
value (noun)
1. Specific data assigned to an attribute* in an ontology* and subject to defined datatypes*.
2. Property* (2.) type, in a graph*.
vector (noun)
Ordered list of numbers where each number represents a dimension in a multidimensional space.
vector RAG (noun)
RAG* which processes information mainly structured in vectors*.
vector retriever (noun)
Retriever* which processes information in the form of vectors*.
W
word sense disambiguation (noun)
Disambiguation* of words in a text, whether they designate entities* or not.
See also entity disambiguation.
X
XSD datatype (noun)
Predefined datatype* used in the semantic web and ontologies*.
Note: Fundamental XSD dataypes are: string, boolean, decimal, integer, float, double, date, time, datetime and duration. XSD datatypes also include derived numeric types like positiveInteger, negativeInteger, etc.
Y
-
Z
-