Natural Language Processing An Introduction

What Is Natural Language Processing?
Natural Language Processing (NLP) is defined as “it is the technology by using which we make the software capable to understand the human’s natural language”.

It is a branch of artificial intelligence and it deals with the interaction between computers and humans using the natural language.

Natural language processing tasks are can be mainly divided into 4 parts and further in sub parts:

Syntax
1. Grammar induction
2. Lemmatization
3. Morphological segmentation
4. Part-of-speech tagging
5. Parsing
6. Sentence breaking
7. Stemming
8. Terminology extraction
Semantics
1. Lexical semantics
2. Distributional semantics
3. Machine translation
4. Named entity recognition (NER)
5. Natural language generation
6. Natural language understanding
7. Optical character recognition (OCR)
8. Question answering
9. Recognizing Textual entailment
10. Relationship extraction
11. Sentiment analysis
12. Word sense disambiguation
13. Topic segmentation and recognition
Discourse
1. Automatic summarization
2. Coreference resolution
3. Discourse analysis
Speech
1. Speech recognition
2. Speech segmentation
3. Text-to-speech

Syntax
This is sub part of Natural language processing in this section we study about the arrangement of words in a sentence such that they make grammatical sense.

Semantics
In this section we study about the meaning that is conveyed by a text. This is the difficult part of Natural Language Processing.

Discourse
In this section we study about the written or spoken communication also we study about the debates summarization and its analysis.

Speech
In this section we study about the speech like Speech recognition and converting the speech into written text.

Various NLP Libraries written in Python programming language

Library	Details
spaCy	Extremely optimized NLP library that is meant to be operated together with deep learning frameworks such as TensorFlow or PyTorch. spaCy comes with pre-trained statistical models and word vectors
Gensim	Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora.
Pattern	Web (data) mining / crawling and common NLP tasks.
NLTK	NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, etc.
TextBlob	TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, WordNet integration, parsing, word inflection, adds new models or languages through extensions, and more.
Polyglot	Polyglot is a natural language pipeline which supports massive multilingual applications. The features include tokenisation, language detection, named entity recognition, part of speech tagging, sentiment analysis, word embeddings, etc.
Vocabulary	Vocabulary is a Python library for natural language processing which is basically a dictionary in the form of Python module. Using this library, for a given word you can get its meaning, synonyms, antonyms, part of speech, translations.
PyNLPl	Extensive functionality regarding FoLiA XML and many other common NLP format (CQL, Giza, Moses, ARPA, Timbl, etc.). It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build a simple language model.
Stanford CoreNLP Python	Reliable, robust and accurate NLP platform based on a client-server architecture. Written in Java, and accessible through multiple Python wrapper libraries. Quepy is a python framework to transform natural language questions into queries in a database query language.
Quepy	Quepy is a python framework to transform natural language questions into queries in a database query language.

What is NLTK?

The Natural Language Toolkit (NLTK) is a platform used for building programs for text analysis written in the Python programming language.

We will see NLTK in details in next article…

http://mycloudplace.com/tokenize-of-words-and-sentences-using-nltk/

Tokenization of Words and Sentences using NLTK

http://mycloudplace.com/an-introduction-to-machine-learning/

An Introduction To Machine Learning

https://en.wikipedia.org/wiki/Natural_language_processing

Natural Language Processing An Introduction

Various NLP Libraries written in Python programming language

What is NLTK?

Related

3 thoughts on “Natural Language Processing An Introduction”

Leave a Comment Cancel Reply

About Us

Quick Links

Contact Us