Benchmark

Enpowering

Graph-Based Agents
with our new model

Lettria Perseus

40

models compared

19

ontologies refined

14,882

triples annotated

Introduction

Large Language Models (LLMs) have transformed text interaction but fall short on reasoning, memory, and explainability. The breakthrough comes with Knowledge Graphs: structured maps of entities and relationships that let agents reason, retrieve, and act with context.

Yet closed-source language models perform poorly on the critical Text2Graph task, which consists of transforming free text (often from documents) into a relational knowledge graph. This transformation can be done by following a schema defined in advance by ontologies, or without a schema.
At Lettria, we strongly recommend the use of ontologies because they allow for consistency of meaning and prevents for nonsensical graphs. They also enhance disambiguation and interoperability with external data and other systems.

After reviewing dozens of closed-sourced LLMs on the Text2Graph task. Our teams has identified several issues, notably :

  • Output reliability issues;
  • Extraction and classification issues (for entities, attributes, properties and relations);

Our Text2Graph fine-tuning methodology

Our R&D team has worked tirelessly to create a new standard for Text2Graph, and thus improve the quality of knowledge graph produced from unstructured text. We deem this will have a significant impact on the next generation of graph-based agents.

1/ Improve data quality

Current benchmarks like Text2KGBench fall short due to data quality, ontology gaps, and structural issues.

After a year of work, the team has finalized the edition of Text2KGBench-LettrIA, a benchmark designed to fix flaws in existing datasets. They refined 19 ontologies into clear, hierarchical, domain-specific structures with precise typing of entities and attributes. Annotation rules enforced consistency, grounding all triples strictly in the source text. Values such as dates, durations, and numbers were normalized into machine-readable formats, while entity names were cleaned and standardized. Corporate suffixes and pronoun resolutions preserved textual fidelity. Additional structural improvements added explicit typing (e.g., integer, string), corrected grammar, and removed dataset artifacts. This resulted in a corpus of 4,860 sentences yielding 14,882 high-quality triples. The benchmark is fully schema-aligned, auditable, and optimized for both research and enterprise-grade AI deployment.

This new dataset will be used for training our series of fine-tuned models, and also benchmark proprietary and open-source LLMs in zero-shot and fine-tuned settings.

2/ Fine-tune a series of models

Models were adapted through Supervised Fine-Tuning (SFT) using the Unsloth framework on Nvidia H100 GPUs. Each input combined a sentence with a compact ontology, while the output was JSON triples. Three strategies were tested: Classic (baseline), Extended (augmented with synthetic data for broader coverage), and Generalization (leave-one-out training to test adaptability to unseen domains). Extended fine-tuning expanded training to at least 500 examples per ontology, improving robustness.

Our results reveal a key finding: smaller, fine-tuned open-source models can achieve superior F1 accuracy compared to their larger, proprietary counterparts, underscoring the critical role of high-quality, schema-aligned training data.

3/ Benchmark output reliability

Output reliability is critical. A large share of errors comes from this stage: when the graph does not follow the defined ontology, the results cannot be parsed correctly or reused at scale. This makes downstream automation unreliable and drives up the cost of manual corrections.

Output reliability among models

Model
Performance (valid outputs)
gemini-2.0-flash
57%
gpt-4.1-nano-2025-04-14
83%
claude-3-haiku
91%
claude-3-sonnet
97%
Lettria's Model
100%

4/ Benchmark extraction and classification

When working with Knowledge Graphs, two core tasks matter most: extraction and classification. To make them concrete, it helps to identify and classify the building blocks of a graph:

Entities
The nodes, representing people, organizations, places, or concepts.
Examples:

  • “AXA” (Organization)
  • “Jean Dupont” (Person)

Attributes
Qualitative descriptors attached to an entity. They often describe what an entity is.
Examples:

  • A person: “Job title: Director”
  • A company: “Industry: Insurance”

Properties
Quantitative or factual details that further define an entity. They usually describe when, where, or how much.
Examples:

  • A person: “Date of birth: 1980”
  • A company: “Headquarters: Paris”

Relations
The edges linking entities to each other.
Example:

  • “Jean Dupont works at AXA”

From a text like “Jean Dupont joined AXA as Director in 2020”, the pipeline produces:

  • Entities → “Jean Dupont”, “AXA”
  • Attributes → “Director”
  • Properties → “2020”
  • Relation → “works at”

Once extracted, classification tasks ensure consistency and accuracy:

  • Classifying an entity into the right type (Person, Organization, Location, etc.)
  • Assigning attributes and properties to the correct schema fields
  • Validating relations against the ontology (e.g. only a Person can “work at” an Organization)

This alignment is critical for building reliable Knowledge Graphs. Without it, extracted data remains noisy and hard to use.

We have published detailed resources on this in the Lettria blog, including:

Together, these illustrate both the conceptual foundation and the practical workflows of extraction and classification.

The charts below present F1 attribute scores comparing Lettria Perseus with state-of-the-art LLMs. Results highlight the advantage of a model trained specifically for complex document understanding, where precision and recall directly impact compliance and efficiency. You'll find all the detailed results comparing over 40 models at the bottom of this page.

5/ Benchmark Latency

Fine-tuned open-source models achieved ultra-low latency, delivering structured outputs in under 20 milliseconds—orders of magnitude faster than proprietary API models, which average 2 to 37 seconds per query. This speed makes fine-tuned models ideal for high-volume, real-time applications where response time directly impacts user experience and business performance.

Latency Benchmark

Model
Latency (in seconds)
claude-opus-4
10s
mistral-medium-2505
6s
gemini-2.5-pro
4s
gpt-4.1-2025-04-14
4s
Lettria's Model
0.01s

In conclusion

With LettRAGraph, Lettria delivers the best fine-tuned Text-to-Graph model built for regulated industries.

  • Accuracy – Entity F1 up to 0.88, outperforming GPT-4, Claude Sonnet 4 et Gemini 2.5 Pro
  • Privacy – Self-hosted, your sensitive data never leaves your environment
  • Speed – Inference under 20ms
  • Reliability – 99–100% schema-valid outputs
  • Future-Proof – 19 ontologies, 14,882 curated triples

Why Business Leaders Care

Knowledge Graphs enable GraphRAG for precise, structured queries beyond vector search, explainable AI with decisions grounded in transparent facts, regulation and compliance to ensure outputs follow business rules, and next-gen enterprise AI :

  • Build AI you can trust: Make decisions in transparent, verifiable knowledge graphs instead of black-box predictions.
  • Unlock enterprise knowledge: Connect silos of unstructured text into a single, searchable knowledge fabric.
  • Ensure regulatory compliance: Simplify audits and reporting with traceable, explainable data flows.
  • Drive measurable ROI: Lower reliance on costly APIs and manual processes with scalable in-house intelligence.

Why Technical Leaders Care

Technical teams gain a robust, reliable stack for enterprise-grade knowledge graph generation and agent building:

  • Performance edge over proprietary APIs: Deliver higher accuracy with fine-tuned, domain-specific training.
  • Schema-guided extraction for reliability: Minimize error propagation in downstream analytics and workflows.
  • Scalable deployment with predictable costs: Easily adapt to growing data volumes without unpredictable pricing spikes.
  • Reduced hallucinations & guaranteed valid outputs: Every answer is backed by verifiable entities and relationships.

Experience the Future of Agents

Lettria is building the bridge from today’s LLMs to tomorrow’s trustworthy enterprise agents. With LettRAGraph, organizations can combine the entire Lettria technology stack into a seamless pipeline:

By linking these components, Lettria delivers a complete solution for extracting, structuring, and operationalizing enterprise knowledge, a foundation that makes AI both trustworthy and future-proof.

Detailed results

Get started with NLP in just 2 minutes.
Talk to an expert ->