Increase your RAG accuracy by 30% by joining Lettria's Pilot Program. Request your demo here.
Lettria's GraphRAG

Document Parsing

Effective document parsing is crucial for transforming unstructured data into structured formats, enabling efficient information retrieval and analysis.

GraphRAG enhances this process by integrating graph-based data structures with advanced AI-driven language models, offering more precise and contextually relevant insights.

Text cleaning Illustration

See how
Lettria outperforms competing tools

Read a detailed analysis of the performance of Lettria's Text Cleaning API vs. that of unstructured.io. See how Lettria compares and outperforms.

Connect with all your sources

GraphRAG's document parser efficiently extracts text from various formats, including PDFs and images, ensuring that all relevant information is captured accurately.

To ensure privacy, your data is safeguarded in a secure environment only you have access to. This data can be conveniently reused across all of your current and future word processing projects.

Create and manage your datasets

Our dataset manager empowers you to oversee your sources, facilitating seamless collaboration between teams and repurposing data for additional GraphRAG projects across your organization. Transform your datasets into valuable data assets with ease.

Automated Data Structuring

Manually structuring data can be tedious. Lettria automates this task, offering precise parsing tailored to each data type, making it easy to manage complex documents, such as legal contracts or financial reports.

Do this again and again, at scale

The majority of GraphRAG projects fail before they even get started. By leveraging Lettria's Document Parsing API, you can efficiently convert unstructured data into structured formats, paving the way for deeper analysis and informed decision-making.

Frequently Asked Questions

How can I start using Lettria’s Document Parsing?

To explore the platform, you can request a personalized demo through the website. A team member will guide you through the features and help determine how the parser can support your specific use case.

Are there tools for cleaning and refining the text?

Yes, the parser includes advanced cleaning features, such as:

  • Removing HTML tags and headers/footers
  • Fixing speech-to-text and formatting errors
  • Reconstructing tables and lists
  • Replacing special characters and trimming whitespace

These features help ensure your data is as clean and consistent as possible for downstream use.

How does it integrate with other Lettria tools, like GraphRAG?

Lettria’s parser works seamlessly with tools like GraphRAG by transforming raw documents into clean, structured knowledge that can be queried, visualized, or enriched with AI. It's a foundational step in the full document intelligence pipeline.

Can I manage and reuse my parsed datasets?

Yes. Lettria provides a dataset manager that allows you to organize parsed documents, collaborate across teams, and reuse data for different use cases. It turns isolated documents into a consistent, centralized knowledge base.

What kind of output does the parser generate?

After processing, the parser returns:

  • The full extracted text
  • A list of document sections or "chunks"
  • Metadata depending on the type of file

This structured output is optimized for follow-up tasks like classification, tagging, or feeding into AI models.

Can it handle complex documents like legal contracts or financial reports?

Yes. The parser is built to manage detailed, information-rich content. It automatically extracts key elements from dense documents, making it ideal for use cases like compliance reviews, contract analysis, or financial data extraction.

How is my data protected during the parsing process?

Your documents are processed in a secure, private environment that’s only accessible to your team. Lettria doesn’t retain or repurpose your data without your consent.

Which types of files can be parsed?

Lettria supports a broad range of formats:

  • Text: .txt, .docx, .odt, .html
  • Spreadsheets: .csv, .xls, .xlsx
  • PDF documents
  • Images: .jpg, .png, .webp
  • Audio: .mp3
  • Structured data: .json

This flexibility ensures teams can work with data from nearly any source.

What does Lettria's Document Parsing tool do?

Lettria's parser extracts and structures content from various document formats using a combination of linguistic intelligence and knowledge graph technology. This makes it easier to analyze and repurpose textual data, no matter how unstructured the source may be.

The key to any successful data science project is the data collection phase.

Patrick Duvaut

Head of Innovation

The key to any successful data science project is the data collection phase.

Patrick Duvaut

Head of Innovation

The key to any successful data science project is the data collection phase.

Patrick Duvaut

Head of Innovation

Talk to an expert about GraphRAG solutions
Request a demo ->