Lazy Evaluation means that the execution will not start until an action is triggered.
Transformations are lazy in nature meaning when we call some operation in RDD, it does not execute immediately.
Now some advantages of this Lazy Evaluation in Spark:
- Increases Manageability :Using Apache Spark RDD lazy evaluation, users can freely organize their Apache Spark program into smaller operations. It reduces the number of passes on data by grouping operations.
- Saves Computation and increases speed : Lazy Evaluation plays a key role in saving calculation overhead. Since value does not need to be calculated of, it is not used. Only necessary values are computed. It saves the trip between driver and cluster, thus speeds up the process.
- Reduces complexities : The two main complexities of any operation are time and space complexity. Using Spark lazy evaluation we can overcome both. Since we do not execute every operation, the time gets saved. It let us work with an infinite data structure. The action is triggered only when the data is required, it reduces overhead.
- Optimization : It provides optimization by reducing the number of queries.
http://mycloudplace.com/spark-rdd-transformations-actions/
http://mycloudplace.com/apache-spark-architecture/
Calculating executor memory, number of Executors & Cores per executor for a Spark Application
http://mycloudplace.com/what-is-spark-executor/
http://mycloudplace.com/deep-understanding-of-sparkcontext-applications-driver-process/
Deep Understanding of SparkContext & Application’s Driver Process
External Links:
https://en.wikipedia.org/wiki/Apache_Spark
https://data-flair.training/blogs/apache-spark-lazy-evaluation/
Pingback: pyspark dataframe | python spark dataframe with examples - Mycloudplace