What Is Natural Language Processing?
Natural Language Processing (NLP) is defined as “it is the technology by using which we make the software capable to understand the human’s natural language”.
It is a branch of artificial intelligence and it deals with the interaction between computers and humans using the natural language.
Natural language processing tasks are can be mainly divided into 4 parts and further in sub parts:
- Syntax
- Grammar induction
- Lemmatization
- Morphological segmentation
- Part-of-speech tagging
- Parsing
- Sentence breaking
- Stemming
- Terminology extraction
- Semantics
- Lexical semantics
- Distributional semantics
- Machine translation
- Named entity recognition (NER)
- Natural language generation
- Natural language understanding
- Optical character recognition (OCR)
- Question answering
- Recognizing Textual entailment
- Relationship extraction
- Sentiment analysis
- Word sense disambiguation
- Topic segmentation and recognition
- Discourse
- Automatic summarization
- Coreference resolution
- Discourse analysis
- Speech
- Speech recognition
- Speech segmentation
- Text-to-speech
- Syntax
This is sub part of Natural language processing in this section we study about the arrangement of words in a sentence such that they make grammatical sense.
- Semantics
In this section we study about the meaning that is conveyed by a text. This is the difficult part of Natural Language Processing.
- Discourse
In this section we study about the written or spoken communication also we study about the debates summarization and its analysis.
- Speech
In this section we study about the speech like Speech recognition and converting the speech into written text.
Various NLP Libraries written in Python programming language
Library | Details |
spaCy | Extremely optimized NLP library that is meant to be operated together with deep learning frameworks such as TensorFlow or PyTorch. spaCy comes with pre-trained statistical models and word vectors |
Gensim | Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. |
Pattern | Web (data) mining / crawling and common NLP tasks. |
NLTK | NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, etc. |
TextBlob | TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, WordNet integration, parsing, word inflection, adds new models or languages through extensions, and more. |
Polyglot | Polyglot is a natural language pipeline which supports massive multilingual applications. The features include tokenisation, language detection, named entity recognition, part of speech tagging, sentiment analysis, word embeddings, etc. |
Vocabulary | Vocabulary is a Python library for natural language processing which is basically a dictionary in the form of Python module. Using this library, for a given word you can get its meaning, synonyms, antonyms, part of speech, translations. |
PyNLPl | Extensive functionality regarding FoLiA XML and many other common NLP format (CQL, Giza, Moses, ARPA, Timbl, etc.). It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build a simple language model. |
Stanford CoreNLP Python | Reliable, robust and accurate NLP platform based on a client-server architecture. Written in Java, and accessible through multiple Python wrapper libraries. Quepy is a python framework to transform natural language questions into queries in a database query language. |
Quepy | Quepy is a python framework to transform natural language questions into queries in a database query language. |
What is NLTK?
The Natural Language Toolkit (NLTK) is a platform used for building programs for text analysis written in the Python programming language.
We will see NLTK in details in next article…
http://mycloudplace.com/tokenize-of-words-and-sentences-using-nltk/
http://mycloudplace.com/an-introduction-to-machine-learning/
https://en.wikipedia.org/wiki/Natural_language_processing
Pingback: Tokenization of Words and Sentences using NLTK - Mycloudplace
Pingback: An Introduction To Machine Learning - Mycloudplace
Pingback: Artificial Intelligence An Introduction - Mycloudplace