Reconstructing Logical Reading Order: A Deep Dive into Advanced Parsing Techniques with Lettria’s Docparsing API

5 min

Introduction

In the field of document parsing, one of the key challenges is designing a Reading Order Algorithm that can handle the wide range of document layouts encountered in practical applications. From invoices and scientific papers to newspapers, forms, and historical manuscripts, each document type can follow a completely different visual and logical structure. This diversity makes it difficult to rely on a single algorithm that performs well across all formats.

Our approach is based on layout predictions from a YOLOv10 model, which we've benchmarked as achieving higher accuracy than Detectron2 in identifying document elements. However, these predictions are inherently unordered— the model detects regions like text blocks, tables, and figures without assigning them a sequence that reflects natural reading flow.

As a result, an additional step is required: reconstructing a logical, human-readable order from spatially scattered elements.

This step is particularly critical in the context of GraphRAG applications, where document content is used to build structured knowledge graphs that power retrieval-augmented generation. An incorrect reading sequence can lead to fragmented or misleading relationships in the graph, directly impacting the quality and accuracy of generated answers. Ensuring a coherent reading order is therefore not just a formatting issue — it’s a foundational requirement for building trustworthy, explainable GenAI systems in domains where precision and traceability are non-negotiable.

In this article, we tackle that challenge by presenting our approach to building a flexible reading order algorithm — one that can adapt to diverse and often irregular document structures, based on unordered layout predictions.

Improve Your RAG Performance with Graph-Based AI.

Download our free white paper →

Understanding Reading Flow Through Layout Examples

To illustrate the challenge of reading order, we start with a few examples of layout predictions.

The first example shows a straightforward case: a single-column document where blocks can be read from top to bottom with minimal ambiguity.

The next example introduces a common variation — a two-column layout — where a naive top-to-bottom strategy fails. Here, correct reading order requires focusing on the entire left column first, then moving upward to the right column and finally reading the page number at the end.

The third example introduces another layer of complexity: background elements or visual anchors. These might include headers, shaded sections, watermarks, or even logos that signal structure or hierarchy, but aren’t explicitly marked as layout regions. Such elements often carry essential information or guide the reader's attention — even though they’re invisible to the layout detector. This challenges the reading order algorithm to interpret visual context beyond just bounding boxes.

Through these examples, we show that layout prediction alone is not enough. While all elements might be correctly detected, humans rely on visual cues — such as alignment, whitespace, headers, and background elements — to intuitively determine the correct reading order. These cues are not explicitly labeled in the layout, yet they play a critical role in guiding the reader’s eye. For an algorithm, however, this implicit understanding must be made explicit: it has to infer structure and intent from spatial and visual context to reconstruct a logical reading sequence.

Our Algorithmic Approach to Reading Order

To address this challenge and reconstruct a logical reading sequence, our core algorithm first works by recursively forming content groups. This grouping process, which we'll illustrate with examples, then directly helps us sort all the elements in the correct order.

Group elements:

The sorting process for this layout prioritizes reading content in a vertical, column-by-column manner from left to right, before moving to the table, and concluding with the footer. To facilitate this, we establish groups, shown in red, which guide the correct ordering of the layout's elements:

1. Column 1: All elements from top to bottom.
2. Column 2: All elements from top to bottom.
3. Column 3: All elements from top to bottom (including title, image, and text).
4. Table (Section 4)
5. Footer (Section 5).

Group elements recursively:

In the initial analysis of the document's structure, we identify two primary areas (outlined in red) that encapsulate the content. The larger red area, on the left, represents the main body of the page. The smaller red area, on the right, represents a dedicated sidebar or footer region. In a second iteration, within the first (main content) part, we distinguish two blue sub-groups.

We begin by reading the first red group on the left, following a top-to-bottom flow. The exception to this is the presence of the two blue sub-groups, which are read from left to right.

After completing these two blue sub-groups, the reading continues within the main left red group, resuming its top-to-bottom flow for any remaining elements below them.

After completing the left red group entirely, we proceed to the red group on the right, which contains an image element and a page's footer.

So far, we've shown how our algorithm groups elements recursively to reconstruct a logical reading flow — adapting to nested structures, column layouts, and spatial patterns.
This approach allows our algorithm to adapt dynamically and perform reliably across a wide variety of document types and layouts, no matter how complex or unconventional.

In the next part, we’ll go a step further: we’ll explore how we use prediction labels, leverage visual cues, and take advantage of our familiarity with the layout model itself — including its recurring patterns and typical mistakes — to refine and correct the reading order even further.

Refining Reading Order with Labels, Visual Cues and Model Awareness

Accurately determining the reading order within varied document layouts is crucial for effective content processing. To achieve a higher degree of accuracy, we incorporate advanced features designed to ensure groups are correctly formed before sorting and to refine specific edge cases, thereby significantly enhancing our performance in establishing a precise reading order. These capabilities include:

- leveraging Title elements to logically segment content
- utilizing header and footer labels to secure their respective positions
- interpreting visual elements for clear group separation
- applying specific rules to handle text outside standard layout bounding boxes
- integrating OCR results to compensate for layout inaccuracies.

Visual Element in Newspaper:

The visual elements indicate that the top section of the page, highlighted in red, functions as a distinct segment—potentially a continuation from a previous page or a separate header. Consequently, we initially divide the entire page content into two primary groups, delineated by the red boundaries.

Then, to establish the detailed reading order, we apply a recursive splitting process:

- Each red group is further analyzed. If it contains subsections, these are delineated as blue groups.
- Similarly, if a blue group contains further nested content, these are then delineated as yellow groups, and so on.

Inside each of these groups (red, blue, yellow, etc.), we read the content from top to bottom. If there are multiple columns or side-by-side elements, we read them from left to right first, then move down.

Ready to revolutionize your RAG?

Download our GraphRAG white paper →

Martin Leroy

Martin Leroy is an experienced NLP developer currently working at Lettria.

Reconstructing Logical Reading Order: A Deep Dive into Advanced Parsing Techniques with Lettria’s Docparsing API

Introduction

Understanding Reading Flow Through Layout Examples

Our Algorithmic Approach to Reading Order

Group elements:

Group elements recursively:

Refining Reading Order with Labels, Visual Cues and Model Awareness

Visual Element in Newspaper:

Keep reading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading