Did you know: NLP can ensure your organization's GDPR compliance by automatically moderating your comment sections?
If you didn't understand that statement, don't worry! This article will walk you through the major concepts of language processing, and how it's being used to help companies comply with new EU regulations.
NLP vs. NLU and GDPR compliance, what's the big deal?
- NLP: Natural Language Processing
- NLU: Natural Language Understanding
The two terms are sometimes confused, but they cover different processes. An NLP processing chain corresponds to the morphological, syntactic and semantic analysis of the document in order to gather a literal understanding of it. It will separate words, label them grammatically and detect the key tags of the language. To build upon this first level of understanding, NLP is enriched via complementary bricks. It then becomes NLU (Natural Language Understanding), a term that encompasses all the efforts made to understand data entered in your user context and to give meaning to your sentences.
Morphological, syntactic and semantic analysis of data
Let's imagine that a human resources manager decides to fill in the personnel file of one of your company's employees. To do this, they enter information in a free comment zone provided in the HRIS.
In order for the machine to process this text, one must segment each sentence into elementary units and detail their characteristics. The NLP approach is therefore based on three levels of analysis:
- morphological: each sentence is broken down into elementary tags, or tokens, comprising one word or a set of two or three words. Each set is labeled according to its grammatical class (preposition, verb, common noun, etc.) via a tagging process (or Part of Speech Tag);
- syntactic: the analysis highlights the dependency links between each component of a sentence (between a direct object complement and the subject of the sentence, for example);
- semantic: once the morpho-syntactic analysis is complete, the focus shifts to understanding the meaning of the sentence.
This analysis is essential for processing data that is not initially structured (e- mail; social network post, etc.). This is the preliminary step for automatically analyzing the GDPR compliance of free comments, unstructured data par excellence. The syntax used can be bad. Moreover, there is often no internal standardization on how to write such comments. NLP is therefore a preferred way to approach such complex content, to normalize it and to break it down into interpretable tags.