P.S.Verma

Tokenization of Words and Sentences using NLTK

Tokenization is the process by which string is divided into smaller sub parts called tokens. Tokenization is the first step toward solving the problems like Text classification, sentiment analysis, smart chatbot etc using Natural Language toolkit. Natural Language toolkit has ‘Tokenizer Interface’, now this  tokenize module is further divided into sub parts word tokenize sentence …

Tokenization of Words and Sentences using NLTK Read More »

Natural Language Processing An Introduction

What Is Natural Language Processing? Natural Language Processing (NLP) is defined as “it is the technology by using which we make the software capable to understand the human’s natural language”. It is a branch of artificial intelligence and it deals with the interaction between computers and humans using the natural language. Natural language processing tasks …

Natural Language Processing An Introduction Read More »

Calculating executor memory, number of Executors & Cores per executor for a Spark Application

For better performance of spark application it is important to understand the resource allocation and the spark tuning process. This article help you to understand how to calculate the number of executors, executor memory and number of cores required for better performance of your application. Below is the sample spark submit command ./bin/spark-submit –class <class_name> …

Calculating executor memory, number of Executors & Cores per executor for a Spark Application Read More »

What is spark Executor?

Executor is a distributed agent that is responsible for executing tasks. Executors are managed by executor backend (ExecutorBackend is a pluggable interface that TaskRunners use to send task status updates to a scheduler). Executors reports heartbeat to HeartbeatReceiver RPC Endpoint on the driver. Executors provide in-memory storage for RDDs using via Block Manager. BlockManager is …

What is spark Executor? Read More »

Deep Understanding of SparkContext & Application’s Driver Process

DRIVER : A Spark driver is a JVM process that hosts SparkContext for a Spark application. It is the master node in a Spark application. The driver (an application’s driver process) splits a Spark application into tasks. It also schedules them to run on executors. It’s driver responsibility to coordinated with workers and also manage …

Deep Understanding of SparkContext & Application’s Driver Process Read More »

Apache Spark Architecture

Apache Spark is an open-source distributed general-purpose cluster-computing framework. Spark provides an interface for programming entire clusters with implicit data parallelism and fault tolerance. Sparks performance is up to 100 times faster in memory and 10 times faster on disk when compared to Hadoop. It is a general-purpose distributed computing engine used for processing and …

Apache Spark Architecture Read More »