Point de référence

Meilleure analyse des sentiments
Lettria vs. unstructured.io

10

Modèles comparés

32

Heures de travail

17 374

Échantillons générés

Résumé

Cette recherche vise à découvrir les algorithmes d'analyse des sentiments les plus efficaces tout en mettant l'accent sur la neutralité, la non-partialité et l'impartialité dans notre évaluation. En utilisant le score F1 comme mesure d'évaluation fondamentale, notre étude examine rigoureusement les performances de divers modèles d'analyse des sentiments par le biais d'une analyse comparative. Notre objectif principal est d'identifier des algorithmes qui excellent dans le discernement impartial et impartial des sentiments dans un texte en langage naturel. À travers cette analyse, nous visons à identifier les solutions les plus performantes adaptées à des applications pratiques, en garantissant une évaluation juste et impartiale.

In total, 10 varied documents were analyzed.

The core methodology involved using a script for calling the APIs and systematically recording the outcomes, with a keen focus on the quality and efficiency of text cleaning.

Quels en sont les principaux résultats ?

modèle
Rendement
J-Hartmann/Sentiment-Roberta-Large-Anglais-3 cours
62 %
Le modèle de Lettria
94 %
28 total benchmark tests were performed to check the outputs of both tools, and see how they performed on basic tasks, depending on the file and input type.

Lettria passed 20 out of 28 tests, unstructured.io passed 7 out of 28 tests.

Méthodologie

A key factor in text cleaning is the processing time, which can significantly impact the efficiency of data projects.

The analysis revealed that Lettria occasionally outperformed unstructured.io in terms of local processing speed. However, this wasn't consistently observed across all tests.

unstructured.io, benefiting from local processing, eliminated internet latency, but its processing time included file open/read durations.

Dependencies and Preparation

Les 17 374 avis générés par ChatGPT sont également équilibrés entre 3 sentiments : positif (POS), neutre (NEU) et négatif (NEG) selon la distribution suivante :

These dependencies are crucial for tasks like OCR (Optical Character Recognition), handling different file formats, and ensuring the APIs' functionality in text extraction and cleaning.

Read Lettria's Text Parsing documentation here.

Évaluation des résultats

The benchmark test covered various documents to comprehensively evaluate each API's text cleaning prowess. Here are some examples.

You can also see the documents used to run the benchmark below, and download the results in JSON format at the end of the article.

PDF Documents (Text and Columns)

Lettria demonstrated excellence in removing irrelevant text like page numbers and adeptly separating titles from contents, showing superior text cleaning.

unstructured.io, while efficient, struggled with language detection and separating text elements cleanly.

OCR Accuracy on Images (JPG, PNG):

Lettria's robust OCR capabilities were evident, as it successfully minimized common character recognition errors, a critical aspect of text cleaning in image-based documents.

Conversely, unstructured.io showed weaknesses in OCR, impacting its text cleaning accuracy.

Handling CID Errors in PDFs:

Handling CID errors, a complex text cleaning challenge, saw unstructured.io returning these errors in its output. Lettria, on the other hand, chose to return empty outputs in such cases.

Large Document Processing:

In processing large documents, unstructured.io demonstrated a significant edge in speed, suggesting better efficiency for text cleaning in voluminous documents.

Handling HTML Content in TXT Files:

Lettria outshone unstructured.io by effectively removing HTML tags and logically organizing text, a crucial aspect of text cleaning.

unstructured.io did not remove HTML tags and struggled with logical text segmentation.

DOCX File Handling:

Both APIs showed comparable performance in handling DOCX files, with minor differences in dealing with specific elements.

Thanks! You can download the results from the link below.
Download Results
Oops! An error occurred while submitting the form.

Mises en garde

Lettria

A standout for its precision in OCR, efficient text segmentation, and handling complex elements like tables and HTML.

It's ideal for projects requiring detailed and accurate text cleaning.

unstructured.io

Shines in its processing speed and lighter JSON output, making it suitable for projects where quick text cleaning is essential.

Conclusion

Les initiatives de données ouvertes comme celle-ci servent de catalyseurs au progrès scientifique, permettant aux chercheurs de tirer parti des travaux des autres chercheurs, pour finalement faire avancer le domaine de l'analyse des sentiments et du traitement du langage naturel dans son ensemble.

Lettria is preferable for intricate text cleaning and accuracy, while unstructured.io offers advantages in speed and efficiency.

Each API has distinct capabilities, making them suitable for different text cleaning scenarios in data analysis and machine learning projects.

Get started with NLP in just 2 minutes.
Commencez ->