Tokenization of Words and Sentences using NLTK

Tokenization is the process by which string is divided into smaller sub parts called tokens. Tokenization is the first step toward solving the problems like Text classification, sentiment analysis, smart chatbot etc using Natural Language toolkit. Natural Language toolkit has ‘Tokenizer Interface’, now this  tokenize module is further divided into sub parts word tokenize sentence …

Tokenization of Words and Sentences using NLTK Read More »

Natural Language Processing An Introduction

What Is Natural Language Processing? Natural Language Processing (NLP) is defined as “it is the technology by using which we make the software capable to understand the human’s natural language”. It is a branch of artificial intelligence and it deals with the interaction between computers and humans using the natural language. Natural language processing tasks …

Natural Language Processing An Introduction Read More »

Calculating executor memory, number of Executors & Cores per executor for a Spark Application

For better performance of spark application it is important to understand the resource allocation and the spark tuning process. This article help you to understand how to calculate the number of executors, executor memory and number of cores required for better performance of your application. Below is the sample spark submit command ./bin/spark-submit –class <class_name> …

Calculating executor memory, number of Executors & Cores per executor for a Spark Application Read More »

What is spark Executor?

Executor is a distributed agent that is responsible for executing tasks. Executors are managed by executor backend (ExecutorBackend is a pluggable interface that TaskRunners use to send task status updates to a scheduler). Executors reports heartbeat to HeartbeatReceiver RPC Endpoint on the driver. Executors provide in-memory storage for RDDs using via Block Manager. BlockManager is …

What is spark Executor? Read More »

Deep Understanding of SparkContext & Application’s Driver Process

DRIVER : A Spark driver is a JVM process that hosts SparkContext for a Spark application. It is the master node in a Spark application. The driver (an application’s driver process) splits a Spark application into tasks. It also schedules them to run on executors. It’s driver responsibility to coordinated with workers and also manage …

Deep Understanding of SparkContext & Application’s Driver Process Read More »