Image courtesy: http://www.expertsystem.com/natural-language-processing/
Language is both rational and emotional —we use words to describe math and poetry as well. Human language is barely discerning. To understand human language is to understand not only the words, but the concepts and how they are associated together to make sense.
Natural Language Processing (NLP) is requisite for a computer program to comprehend human language and to perform as per the instruction. NLP is an integral part of Artificial Intelligence (AI) that bridges the gap between human communication and computer understanding.
History of NLP
NLP was first implemented in the 1950s to interpret amongst Russian and English. That was futile because it required Russian translator to pre-edit and English translator to post-edit.
In the 1960s natural language processing systems started to study and analyze sentence structure in an ad hoc manner. These systems were grounded on pattern matching and few procured rendition of meaning. ‘Eliza’ is one of the renowned NLP systems of its time.
In the early & mid 1970s, NLP progressed significantly with the invention of ‘Chatterbot’, a natural language interface to a database of information about US Navy ships. In 1980s, NLP flourished with the invention of Chatterbot that simulated natural human chat in an interesting way. In 2006, IBM developed ‘Watson’ software (artificial intelligence software), which was intended to answer questions posted in natural language. In February 2011, Watson won the ‘Jeopardy!’ contest by defeating the best human players.
Recently, research in computer linguistics has paved a way in acquiring information about grammar formation, and Artificial Intelligence (AI) researchers have delivered an effective system for parsing Natural Language (NL) and for meaning depiction. Today, linguistic study and exceptionally developed semantic representations serves as a foundation for NLP systems.
What NLP Does?
Image courtesy: https://in.pinterest.com/pin/541417186441551449/
NLP helps computer to parse text, perceive speech, elucidate it, weigh emotion and understand which parts are essential. With the help of NLP, present machines can scrutinize more language-based data than humans, without exhaustion and in a stable, unprejudiced way. Considering the astounding proportion of unstructured data that is generated daytoday, computerisation will be fundamental to thoroughly analyse information effectively.
We verbalize in innumerable ways. There are hundreds of languages and dialects, but within each language is an exclusive class of grammar and syntax rules, terminology, and slang. When we draft, we habitually misspell or condense words, or ignore punctuation. When we speak, we have provincial accents, and we mutter, stammer and embrace phrases from different dialects.
One of the goals of Natural Language Processing (NLP) is to replace computer programs and commands with natural human instructions and speech.
NLP settles uncertainty in dialect and enhances valuable numeric structure to the information for some downstream applications, like text analytics or speech recognition. Some popular services that use NLP include Cortana, Google Now, Siri etc.
For example, Siri is Apple's voice-to-text virtual assistant. It is a voice-operated digital assistant that receives commands from the user through user’s voice, tries to decipher it, and then performs the required task likely. Siri stands for ‘Speech Interpretation and Recognition Interface’. The combination of Natural Language Processing (NLP) and Artificial Intelligence (AI) made success attainable to Siri.
Major Stages Involved in NLP
Speech and text serves as the input and output of an NLP system.
NLP includes tasks that are followed sequentially. The stages or tasks are as follows:
1. Morphological Processing
2. Syntax and Semantics
3. Semantics and Pragmatics
Morphological Processing:
At this stage, a string is broken down into a collection of tokens, the token is transformed into discrete words and word is revamped to sub-words.
For example, the word “unsuccessfully” can be revamped to four sub-words: “un-success-full-ly”.
Syntax and Semantics:
This stage involves checking whether the string is broken down to tokens and the tokens still remain well-formed and meaningful.
For example,
“un” - prefix
“success” - word
“full” - word
“ly” - suffix
Semantics and Pragmatics:
In this stage, the sub-words are matched with the given context.
For example,
The word “success” means victory.
The word success is prefixed with “un”, meaning “not success”.
Unsuccessful = Failure
Semantics focuses on the meaning of the word and it is independent of the context, whereas pragmatics focuses on the usage of the language and it is context dependent.
Open Source NLP Libraries
Stanford's Core NLP Suite, Natural Language Toolkit (NLTK) and Apache OpenNLP are few open source NLP libraries that effortlessly incorporate into applications to extract insights and help developers unravel textual content. Summarizer, Sentiment Analysis algorithm and LDA are few algorithms of NLP.
Summarizer algorithm generates an abstract of a document, while retaining the significant aspects and discarding extraneous data. Summarizer takes a substantial block of text as a string, and extracts the key points determined by the recurrence of words and phrases it finds.
Image courtesy: https://algorithmia.com
Sentiment analysis algorithm determines the emotion, feeling, or tone of the textual document and assigns a rating from 0 to 4, which indicates very negative, negative, neutral, positive and very positive, respectively.
Image courtesy: https://algorithmia.com
Image courtesy: https://algorithmia.com
Latent Dirichlet Allocation (LDA) algorithm is used to place the most appropriate topic tags for a document. LDA lists the key topics and tags from text. Several web developers use LDA to effectually generate tags for posts and articles.
Image courtesy: https://algorithmia.com
The evolution of NLP has led to understand human language in different contexts and also to re-structure the information to valuable data. More eloquent and unambiguous understanding between humans and technologies will only brace the proficiencies of both. NLP will be crucial in mastering the true voice of the user and in facilitating more coherent and consistent interaction on any platform where language and human communication are employed.