How to Build a Private ChatGPT Using Open-Source Technology? Download our free white paper.

Ademe: how to use NLP to clarify terms, expressions and concepts


Ready to go for the gold with Lettria?

Language & statistical analyses to better communicate about the circular economy

The multiplicity of topics that the circular economy encompasses makes communication difficult, even for one of its main advocates: ADEME. To support ADEME's mission, Lettria has made available its language expertise and NLP know-how to measure peoples' understanding of the issues surrounding the circular economy.

A few words about ADEME

The Agency for Energy Transition (formerly the Agency for the Environment and Energy Management or ADEME) participates in the implementation of public policies in the fields of environment, energy and sustainable development. In order to enable them to progress in their environmental approach, the agency provides companies, local authorities, public authorities and the general public with expertise and advice. It also helps finance projects, from research to implementation, in the following areas: waste management, soil conservation, energy efficiency and renewable energies, air quality and noise abatement.

ADEME is a public institution under the supervision of the Ministry of Ecology, Sustainable Development and Energy and the Ministry of Higher Education and Research.

Where and how to measure the knowledge about the circular economy among citizens?

As a key player in the circular economy, ADEME has a definite power of influence regarding the circular economy among the multiple stakeholders involved, including the general public. To ensure that the messages it disseminates are properly understood, the agency wanted to improve the clarity of terms, expressions and concepts related to the circular economy. The mission required going through three steps:

  • Evaluating the reasons why certain terms, expressions and concepts may be a source of confusion and/or misunderstanding for the targets of ADEME's communication actions.
  • Exploring possible alternatives by drawing inspiration from terms, expressions and concepts used abroad, by reflecting with citizens, by soliciting companies and associations in a common reflection.
  • Recommending semantic evolutions to be carried out to improve the right perception of the messages, through a new lexicon allowing the good adhesion of the target public to the actions and objectives posted by the ADEME

Collection and analysis of a corpus of text to measure the understanding of the issues

Thanks to open data collection technology, our teams were able to conduct their research on a varied corpus of texts composed of specialized press, general articles and social networks. Once the data was collected from various sources over several years, we implemented a heavy processing of the data before conducting a thorough semantic analysis. This work thus aimed to highlight the words / expressions related to the circular economy for which clarification work was necessary.

Language sciences and linguistics at the service of semantic analysis

The goal of the project was to measure the degree of complexity of 58 terms used in order not only to evaluate their terminological character but also to facilitate a better understanding of their usages.

Following consultation with the project team, we began by creating a corpus of texts on which to conduct our research. Among the sources concerned by the study, we made sure to select specialized and general public media outlets (Journal de l'Environnement, 20 Minutes) as well as social networks (Twitter) and documentary databases (Wikipedia).

More than 300,000 articles were then analyzed by machine, including more than 36 million words.

Different steps were then taken to study these corpora:

  • In order to apply certain methods of Natural Language Processing (NLP) and to facilitate its exploitation, in particular the study of the occurrence of lemmas (canonical form of a word), we first cleaned the raw text. The final result was a text that is not very readable by a human reader, but makes the processing much more efficient for the language models.
  • The occurrences of the keywords within the corpora to measure their adoption by the population.
  • Word vectoring and similarity analysis to highlight synonyms for each keyword to determine if the context of use is indeed that of the circular economy. Typically in some sources, the word "sobriety" (listed among the terms to be studied) was not generally related to the theme of the similar economy and therefore requires vigilance in observing its occurrence.
  • The contextual study to explain the various uses of the keywords (the semantic distance from one corpus to another, but also within the same corpus and between one and several keywords), to be completed by an analysis of the contextual distance of the keywords and their semantic variability.
  • Lettria then complemented these analyses with a similar study on the social network Twitter (word count, sentimental analysis, etc.) in order to have a more in-depth view of the adoption of circular economy terms by the general public over the years.

This series of analyses allowed us to give an adoption score for each of the key words in the list and thus provide a metric of understanding to prioritize the words to be studied. The lower the score, the more important it was to clarify the definition of these terms and/or to use different words to better signify the associated key issues and thus better communicate on the topic of the circular economy with the general public.


After this initial analysis phase carried out by Lettria, ADEME was able to direct its work towards an international comparison of best practices and a field survey of citizens, before formalizing a lexicon of all the recommended new terms and integrating them into its communication strategy.

Build your NLP pipeline for free
Get started ->