NLP vs. NLU for GDPR: Which tools are most suitable

In this article we'll walk you through the major concepts of language processing, and how it's being used to help companies comply with new EU regulations.

Maxime Jaillet

Feb 17, 2023

Get started on the future of NLP with Lettria.

Get a quick demo ->

In this article

Heading 2

4 minute read

Did you know: NLP can ensure your organization's GDPR compliance by automatically moderating your comment sections?

If you didn't understand that statement, don't worry! This article will walk you through the major concepts of language processing, and how it's being used to help companies comply with new EU regulations.

NLP vs. NLU and GDPR compliance, what's the big deal?

NLP: Natural Language Processing
NLU: Natural Language Understanding

The two terms are sometimes confused, but they cover different processes. An NLP processing chain corresponds to the morphological, syntactic and semantic analysis of the document in order to gather a literal understanding of it. It will separate words, label them grammatically and detect the key tags of the language. To build upon this first level of understanding, NLP is enriched via complementary bricks. It then becomes NLU (Natural Language Understanding), a term that encompasses all the efforts made to understand data entered in your user context and to give meaning to your sentences.

Morphological, syntactic and semantic analysis of data

Let's imagine that a human resources manager decides to fill in the personnel file of one of your company's employees. To do this, they enter information in a free comment zone provided in the HRIS.

In order for the machine to process this text, one must segment each sentence into elementary units and detail their characteristics. The NLP approach is therefore based on three levels of analysis:

morphological: each sentence is broken down into elementary tags, or tokens, comprising one word or a set of two or three words. Each set is labeled according to its grammatical class (preposition, verb, common noun, etc.) via a tagging process (or Part of Speech Tag);
syntactic: the analysis highlights the dependency links between each component of a sentence (between a direct object complement and the subject of the sentence, for example);
semantic: once the morpho-syntactic analysis is complete, the focus shifts to understanding the meaning of the sentence.

This analysis is essential for processing data that is not initially structured (e- mail; social network post, etc.). This is the preliminary step for automatically analyzing the GDPR compliance of free comments, unstructured data par excellence. The syntax used can be bad. Moreover, there is often no internal standardization on how to write such comments. NLP is therefore a preferred way to approach such complex content, to normalize it and to break it down into interpretable tags.

Improve Your RAG Performance with Graph-Based AI.

Download our free white paper →

Automatic text classification - Categorize comments for GDPR compliance

In many cases, an NLP-based approach is not enough. The meaning of an idiomatic expression cannot be understood by a simple syntactic and semantic analysis. This is where NLU comes in handy. An automatic text and document classification model can take over from the previous analyses in order to assign a category to free comments.

Unstructured data can be categorized according to the nature of the document: contract, email, product specs;
The attribute can also relate to other factors: the priority level for processing an e-mail; the GDPR compliance status of a free comment, etc. Classification is an important step because the assigned category can determine an action to be implemented in a computerized way;
reject spam;
block the entry of an insulting free comment;
alert when a free comment is entered that does not comply with the GDPR or contains sensitive data.

Named Entity Recognition - Identify data subject to the GDPR

Named Entity Recognition (NER) consists of extracting information from unstructured data and classifying it into pre-defined categories. Lettria does this by applying a list of regular expressions (regexes) and using machine learning. The machine can thus detect that a given comment mentions for example:

volumes;
dates;
first and last names;
e-mail addresses or telephone numbers;
social security numbers or an IP address.

Detecting a named entity can lead to an automatically applied action.

You minimize data collection via free-form fields by ensuring that you collect only what is necessary. An advisor handling a claim does not need to enter the customer's social security number. A popup can alert them to this at the time of data entry.
Your CRM teams can also take advantage of useful information contained in free-form comments. For example, if the customer advisor collects a new postal address, they could be invited to enter it in the customer record.

Sentiment Analysis

Sentiment Analysis consists of searching for language data and categorizing it according to its neutral, positive or negative tone. The detection of such a polarity then facilitates the classification of the captured comment according to a known typology: opinion, feeling, emotion, information. Sentiment analysis allows the monitoring of one's e-reputation. For example, an e-merchant may want to know how its customers perceive its brand or its products through the reviews they post online.

Conclusion: NLP and NLU automate the analysis of your free comment areas

NLP and NLU are combined. They allow you to delegate to the machine the tedious task of examining all the free comments in a given database to identify those that pose a problem. Part of your GDPR compliance can thus be automated.

The machine implements this process with self-learning logic, relying on artificial intelligence. This greatly increases its ability to identify personal data in a comment, whether it is only a contact detail or sensitive information. These analysis and categorization phases will naturally lead to the activation of comments blocking, and an awareness campaign that you must conduct within your organization.

Maxime Jaillet

GDPR Expert