Glossary

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Introduction

‍

This glossary defines key terms used in NLP, knowledge graphs, ontology design, machine learning, and AI-powered document intelligence, with a focus on regulated industries such as insurance, finance, and legal. Each entry provides a precise definition, relevant examples, and references to standards or best practices where applicable.

A

‍

accuracy (noun)

Metric that measures the proportion of correct predictions (true positives + true negatives) among all predictions (true positives + true negatives + false positives + false negatives). See also F1 score; F-measure; precision; recall.

B

‍

blank node (noun)

RDF* node* that allows users to not specify a resource URI* when it is not known, whilst still letting them express things about that resource within a knowledge* graph.

C

‍

camel case (noun)

Naming convention which consists of writing phrases without spaces or punctuation and with capitalized words, with an initial capital for the upper camel case (or Pascal case) and with an initial lower case for the lower camel case (or dromedary case).

Note: In ontologies, the upper camel case is used for class names and the lower camel case for property names.

‍

chunk (noun)

Part of a text divided according to size (number of words) or semantic criteria, small enough to be processed by a model.

‍

chunk retrieval (noun)

Retrieval* which processes information in the form of chunks*.

‍

chunk retriever (noun)

Retriever* which processes information in the form of chunks*.

‍

chunk threshold (noun)

RAG* parameter used to filter chunks* according to a relevance score, which is calculated using the similarity between the chunk and the query or any other weighting method.

‍

chunk top k (noun)

RAG* parameter aimed to filter the number of most relevant chunks* to be used.

Note: In mathematics and science, k is often used to represent a constant or an arbitrary positive integer. For example, in a list or set, k can represent the number of elements you wish to extract or consider.

‍

class (noun)

Category of concrete or abstract things (object, person, animal, idea, etc.) represented in an ontology*, which can be divided into subclasses* and which groups together individuals*.

Example: A vehicle ontology contains classes such as Train, Car, Two-wheeler, Aircraft, Boat and so on.

Note: Class names are written in upper camel* case.

‍

collection (noun)

Set of documents* used for a project in the Lettria platform.

Synonym: data collection.

Note: Not to be confused with dataset*.

‍

community (noun)

Set of nodes* grouped by semantic proximity.

‍

context (noun)

Element or set of elements of information provided by the retriever* in response to a query* and from which an answer is written by the LLM*.

D

‍

data collection (noun)

See collection.

‍

data model (noun)

Abstract framework used to describe the fundamental structure and organization of data in an information system.

‍

data property or datatype property (noun)

Property* (1.) used to indicate a characteristic of a class* or an individual* by associating a value* (1.) with it.

Example: In a geographic ontology, the HasPopulation attribute is used to indicate the number of inhabitants of a geographic entity, with the class Place as the domain* (1) and an integer as the value*.

‍

data source (noun)

See source.

‍

dataset (noun)

Set of data for model training.

Note: Not to be confused with collection.

‍

datatype (noun)

Type of literal value* (1.) that can be assigned to a data* property.

E

‍

edge (noun)

Element that connects two nodes* in a graph*, enriched with additional information such as properties* (2.).

Example: In the graphical representation of the sentence This 9 cm needle is a useful complementary instrument to the device, the edge length links the node needle to the node 9 cm, the edge hasQuality links the node needle to the node useful and the edge complementaryTo links the node needle to the node device.

‍

to embed (transitive verb)

To represent (sthg) in vectors*, especially for the creation of a RAG*.

‍

embedder (noun)

Algorithm for embedding*.

‍

embedding (noun)

Representation in vectors*, especially of semantic data for the creation of a RAG*. Examples:

- Written data, nodes, edges or even entire graphs can be transformed into vectors using embedding.

- A graph embedding is a representation of a graph in vectors.

‍

end-to-end (adjective)

Designed to handle all stages of a system or operation, from start to finish, without requiring intermediary steps or additional external processes.

Examples: End-to-end solution. End-to-end tests. End-to-end machine learning model.

Abbreviation: E2E.

‍

entity (noun)

Any distinct element of the world, concrete or abstract.

Example: In the sentence Angela Merkel always indulges in daydreaming, listens to ABBA and reads a book when she takes the train to London, there are six entities: daydreaming, book, train, Angela Merkel, ABBA and London. The last three are named* entities.

Note: In graphs*, entities are represented by nodes*.

F

‍

F-measure (noun)

Weighted version of the F1* score that gives more or less weight to recall* or precision*, depending on the context: Fβ = 1+β2 x ((precision x recall) / ((β2 x precision) + recall)). β controls the relative importance of recall and precision:

- If β > 1, recall is favored (more emphasis is placed on detecting true positives).

- If β < 1, precision is favored (we prefer to avoid false positives).

- The F1 score is a special case of the F-measure with β = 1 (precision and recall are equally weighted).

Notes:

- An F2-measure (β = 2) would be useful in a task where recall is twice as important as precision (such as the detection of serious diseases).

- An F0.5-measure (β = 0.5) would favor precision (as in recommender systems where error is expensive).

G

‍

GRAG or G-RAG (noun)

See graph-RAG.

‍

graph (noun)

Structure composed by nodes* linked together through edges*.

See also knowledge graph; property graph; RDF; semantic graph.

‍

graph retrieval (noun)

Retrieval* which processes information structured in graphs*.

‍

graph retriever (noun)

Retriever* which processes information structured in graphs*.

‍

graph threshold (noun)

RAG* parameter used to filter triples* (2.) according to a relevance score, which is calculated using the similarity between the chunk and the query or any other weighting method.

‍

graph top k (noun)

RAG* parameter indicating the number of triples* (2.) to be extracted by the vector* retriever.

‍

graph-RAG (noun)

RAG* which processes information mainly structured in graphs*.

Abbreviations: GRAG; G-RAG.

‍

ground truth

1. (noun) Data considered to be correct, used as a reference for evaluating other data.

2. (adjective) That is considered to be correct and is used as a reference.

Example: Ground truth answers.

H

‍

I

‍

identifier (noun)

String used to uniquely identify an item on a network, particularly on the Web.

J

‍

K

‍

KG (noun)

See knowledge graph.

‍

knowledge base (noun)

Organized resource used to collect, store and manage information and which may contain several knowledge* graphs.

knowledge graph (noun)

Graph* based on the structure and constraints of an ontology*.

Abbreviation: KG.

L

‍

label (noun)

1. Name used to designate class* members.

Note: The label must be distinguished from the class name.

Example: Two different classes, such as bat_(animal) and bat_(instrument), can have the same label, bat.

2. Metadata* associated with nodes* and edges*.

labeled property graph (noun)

Graph* with labels* (2.).

Abbreviation: LPG.

‍

language model (noun)

Statistical or machine-learning model designed to understand and generate text in natural language.

M

‍

max triplets (noun)

RAG* parameter indicating the maximum number of triples* (2.) to be used by the LLM* for final generation, including those returned by the vector* retriever (see graph* top k) and those added by the expander*.

metadata (noun)

Formal, standardized and structured data used to describe and process the content of digital data.

Example: In the metadata of a text, you can indicate the author's name or the year of publication.

N

‍

n hops (noun)

Expander* parameter used to specify the number of steps (hops) to go through the graph* from a start node* to retrieve relations* (2.).

‍

named entity (noun)

Entity* designated by a proper name (such as persons, organization, geographical units, artworks, etc.).

Example: In the sentence Angela Merkel always indulges in daydreaming, listens to ABBA and reads a book when she takes the train to London, there are three named entities: Angela Merkel, ABBA and London.

See also named entity disambiguation; named entity linking; named entity recognition.

‍

named entity disambiguation (noun)

Disambiguation* of named* entities in a text.

‍

named entity linking (noun)

Process of linking the named* entities of a text to the corresponding elements of a knowledge* base.

Example: In a named entity linking process that links named entities to elements of the Wikipedia knowledge base, occurrences of Orange that designate the French city will be linked to Orange, Vaucluse while those that designate the French company will be linked to Orange S.A.

Abbreviation: NEL.

‍

named entity recognition (noun)

Process of recognizing and classifying the named* entities of a text.

Abbreviation: NER.

‍

NEL (noun)

See named entity linking.

‍

NER (noun)

See named entity recognition.

‍

node (noun)

Element of a graph* representing an entity*.

Example: In the graphical representation of the sentence This 9 cm needle is a useful complementary instrument to the device, 9 cm, useful, needle, instrument, and device are represented by nodes.

O

‍

object (noun)

Third element of a triple* (1.), the one that provides specific information on the subject* via the predicate*.

Example: In the triple Moby-Dick / hasAuthor / Herman Melville, Herman-Melville is the object, it precisely indicates the information given about the subject, the type of which is specified by the predicate.

‍

object property (noun)

Property* (1.) that provides information about a class* or an individual* by linking it to another class or individual.

Example: In a geographic ontology, the hasCapitalCity object property is used to indicate the capital city of a country, with the class Country as the domain (1.)* and the class City as the range*.

‍

ontology (noun)

Knowledge representation system, more complex than a taxonomy*, used to organize hierarchically the concepts of a domain* (2.) represented by classes* and individuals*, and to give information about them through properties* (1.) and logical axioms.

‍

ontology alignment (noun)

Task of determining the correspondences between classes*, individuals* and properties* of different ontologies*.

Synonym: ontology matching.

‍

ontology matching (noun)

See ontology alignment.

‍

ontology population (noun)

Task of identifying instances of concepts* and properties* of an ontology*.

‍

origin (noun)

Text element or set of text elements from which a graph* element originates.

‍

OWL (Web Ontology Language) (noun)

Knowledge representation language in RDF* format, designed to define complex ontologies*.

Example: OWL allows the definition of logical rules.

P

‍

to parse (transitive verb)

To proceed to the parsing* of (a document*).

‍

parser (noun)

Tool that performs a parsing*.

‍

parsing (noun)

Process of analyzing, structuring and enriching a document* so that it becomes an exploitable source* for information processing.

‍

Pascal case (noun)

See camel case.

‍

payload (noun)

All the information we have on an element of a graph* (its name, type, properties, ID of the chunk, etc.).

‍

precision (noun)

Metric that measures the proportion of correct positive predictions (true positives) among all positive predictions (true positives + false positives).

Note: A high-precision model makes few errors in classifying a sample as “positiveˮ. This is particularly important in cases where false positives are costly (for example, in disease screening).

See also accuracy; recall; F1 score; F-measure.

‍

predicate (noun)

Second element of a triple* (1.), indicating the type of information given about the first one (the subject*).

Q

‍

query (noun)

What is formulated in natural language or in a specific language to obtain information from a knowledge* base.

Note: Not to be confused with prompt or question.

‍

question (noun)

What is asked in natural language by a Lettriaʼs platform user to obtain information.

Note: Not to be confused with prompt or query.

R

‍

RAG (Retrieval-Augmented Generation) (noun)

technique that combines information retrieval* and text generation approaches to improve the quality and relevance of responses generated by a language* model.

S

‍

section (noun)

Part of a source* with a logical unit.

Example: Paragraphs can be sections.

Note: A document is divided into sections during parsing*.

‍

semantic graph (noun)

Graph* based on controlled properties* (1.) but without the structure of an ontology*.

‍

source (noun)

Any element that generates information, whether it's a document* after parsing* or a graph* generated from the documents.

Synonym: data source.

T

‍

T2G (noun)

See text-to-graph.

‍

tailed entity (noun)

Entity* that is unknown by the used knowledge base.

‍

taxonomy (noun)

Hierarchical classification of things or ideas.

Example: The classification of living beings is a taxonomy.

U

‍

upper camel case (noun)

See camel case.

‍

URI (Uniform Resource Identifier) (noun)

1. Standard format for identifiers* on the Internet.

2. Any identifier in this format.

V

‍

value (noun)

1. Specific data assigned to an attribute* in an ontology* and subject to defined datatypes*.

2. Property* (2.) type, in a graph*.

‍

vector (noun)

Ordered list of numbers where each number represents a dimension in a multidimensional space.

‍

vector RAG (noun)

RAG* which processes information mainly structured in vectors*.

‍

vector retriever (noun)

Retriever* which processes information in the form of vectors*.

W

‍

word sense disambiguation (noun)

Disambiguation* of words in a text, whether they designate entities* or not.

X

‍

XSD datatype (noun)

Predefined datatype* used in the semantic web and ontologies*.

Note: Fundamental XSD dataypes are: string, boolean, decimal, integer, float, double, date, time, datetime and duration. XSD datatypes also include derived numeric types like positiveInteger, negativeInteger, etc.

Y

‍

Z

‍