DRIVER :
A Spark driver is a JVM process that hosts SparkContext for a Spark application. It is the master node in a Spark application.
The driver (an application’s driver process) splits a Spark application into tasks.
It also schedules them to run on executors. It’s driver responsibility to coordinated with workers and also manage the execution of task.
Driver’s Memory :
In client deploy mode the driver’s memory is the memory of the JVM process the Spark application runs on. The driver memory can be set using spark-submit’s –driver-memory command-line option or by setting the spark.driver.memory configuration.
SparkContext :
Main entry point for Spark functionality. A SparkContext represents the connection to a Spark cluster, and can be used to create RDDs, accumulators and broadcast variables on that cluster.
Only one SparkContext may be active per JVM. You must stop() the active SparkContext before creating a new one.
Once a SparkContext is created you can use it to create RDDs, accumulators and broadcast variables, access Spark services and run jobs (until SparkContext is stopped).
A Spark context is essentially a client of Spark’s execution environment and acts as the master of your Spark application.
Function of Spark Context :
- Running jobs synchronously
- Creating Distributed RDD, Accumulators, Broadcast variables
- Configuration Setup
- Access of different services
- Get the current Status of application
- Cancelling a job
- Cancelling a stage
- Programmable Dynamic Allocation
For detail spark architecture please read my previous article Spark Architecture.
Good article, can you please describe the ‘Executor’ ?
Executor is a distributed agent that is responsible for executing tasks. Executors are managed by executor backend.Executors reports heartbeat. For detail please go through the below link
http://www.mycloudplace.com/what-is-spark-executor/
Pingback: Lazy Evaluation in Apache Spark and its Advantage - Mycloudplace