Why Schema.org can’t always be used to create your ontology

Data management is a key element in your company's success. Whether it's contracts, financial reports, customer data or product information, the knowledge you held is numerous, varied and evolving. It requires an appropriate representation system to manage them and extract the relevant information. Ontology is a common solution.

Édouard Trouillez

May 30, 2024

Talk to a GraphRAG expert

Get a quick demo ->

Introduction

Ontology reveals your key knowledge

An ontology is a data model used to describe knowledge in a particular domain. It consists of a structured set of interrelated concepts with associated properties. A factory ontology, for example, will thus contain concepts representing suppliers, customers, carriers, products and so on. The Product class will be assigned properties concerning technical characteristics, quantities, compatibility with other products, etc.

Ontologies also have the advantage of automatically revealing information that is crucial to your business. Thanks to inheritance, a characteristic concerning a product range, for instance, will be automatically passed on to each new product in the range. And thanks to inferences, non-explicit information such as an incompatibility between a delivery date and a lead time can be automatically formalized.

The downside is that creating your ontology from scratch is time-consuming and tedious. Not only can it contain hundreds of concepts and properties, but it also requires special conceptualization skills. So it's tempting to try and create your own ontology from an existing model. Schema.org is sometimes used for that. However, this is often not an appropriate solution.

What is Schema.org?

Schema.org is the result of collaboration between four major players in the search engine optimization industry: Google, Yahoo, Bing and Yandex. It was created within the framework of the W3C, a consortium dedicated to standardizing web technologies. Like an ontology, it contains interrelated classes with associated properties. These elements are used to describe the information contained in a web page's metadata to facilitate indexing. For example, if the telephone number and opening hours indicated on the website of a store are identified using Schema.org metadata, they will appear directly in search engine results, and you won't even have to visit the site.

Schema.org’s main advantage is the large number of properties attached to its classes. The Article class is associated with over a hundred properties that identify, among others, the author, the target audience and any quotes. In addition, it offers extensions for certain types of web data, such as GoodRelations, which specializes in e-commerce and can be used to display a company's products in response to a Google search.

Want to learn how to build a private ChatGPT using open-source technology?

Download our free white paper →

Schema.org doesn't match your business reality

Taking also into account its open Creative Commons license, the temptation is great to use Schema.org to create your own ontology. However, it is not a good solution. Adapting an existing ontology to the realities of your business requires it to be either very general, i.e. offering a basic, non-specialized representation of the world, or already specialized in your field of activity. Schema.org, on the other hand, is an ontology specialized in a domain that probably isn't yours: web page content.

Thus, classes and properties are mainly focused on this type of data, with very specific and unnecessary elements for you, such as the HowToSection class or the publishingPrinciples property. Conversely, very general concepts are absent. The Product class contains only a few subclasses, such as Drug or Vehicle, but nothing for other concepts at the same level, such as clothing or tools. The same goes for extensions. GoodRelations allows good web indexing of the product information displayed in an e-shop, but in no way manages the internal data relating to the products.

Another disadvantage of Schema.org is its lack of structure. It presents very broad classes with numerous, very precise direct subclasses. The subclasses are only listed side by side. For instance, the very vague Intangible class has subclasses such as GameServer, HealthPlanNetwork or MerchantReturnPolicy, without any intermediate classes such as Server, Network or Policy, which could have been useful. To adapt it, you have to restructure everything and review the inheritance of properties, which is even more time-consuming than creating something from scratch.

The risks involved in using such a resource can be numerous and serious. An ontology that doesn't perfectly represent your use case is bound to be a source of errors. A class that doesn't accurately represent reality, an erroneous relationship or an irrelevant inherited property are all elements that can produce false information. In the end, your ontology may turn out to be totally unusable.

Create your ontology with your unstructured data

To be frank, whether we're talking about Schema.org or other resources, all these risks are inherent in using an existing model to create your own. It's always a good idea to take advantage of certain standards. But if you want the information processed by your model to be true, it's essential to base it first and foremost on the realities of your business. As we saw earlier, creating an ontology from scratch is a mammoth task. There is a solution: the automatic creation of your ontology based on your documentation.

Be they product sheets, customer demands, financial reports or medical records, it's at the heart of your documents that the relevant information lies. The aim is to transform the content of these documents into ontological data. There is a major obstacle: most of these resources are unstructured. In concrete terms, this means that your documents contain no metadata or other elements to identify the type of information contained in a given part of the text. Without structuring, it's impossible to process the information in your documents. Fortunately, advances in research into the processing of unstructured data now make it possible to overcome this obstacle. This is what Lettria does: turn chaos into knowledge.

Lettria provides you a reference model, representing a general view that doesn't focus more on one domain than another. You can easily create your own ontology within the Lettria platform, or use all or part of our generic data as a starting point. Thanks to the finesse of the structure, you can delete an entire class and its subclasses with a single click, at any level, if they are too specific or irrelevant to your business. And you can just as easily add new subclasses by quickly locating the generic class to which they belong.

But Lettria isn't just for creating a structure. Much more than that, the platform automatically generates or enriches your ontology from your unstructured data. We have created tools that combine the best of LLMs and symbolic AI. They process information from raw documents, whatever their content and format. Each piece of information automatically finds its place in the ontology, and the most you’ll have to do is validate the suggestions. And models other than ontologies have been developed. Private GPT, Text to graph and GraphRAG are also available to suit different needs.

‍

Data processing has always been a major business issue. Today, technological advances combining text analysis and artificial intelligence offer an entirely new perspective. Finding reliable information among gigantic quantities of data is no longer an overwhelming challenge.

Édouard Trouillez

Senior Linguist