Jun
22
Apache Spark Components CheatSheet
Troubled by confusing concepts such as Executors, Node, RDD, Task in spark? Invest just 2 minutes of your time to make some order in this mess!
I'll clean up these apache spark concepts for you!
Spark building blocks: executor,tasks,cache,sparkcontext,cluster manager
Executor => Multiple Tasks: is a JVM process sitting on all nodes. Executors receive tasks (jars with your code) deserialize it, and run it as a task.
Executors utilize cache so that the tasks can run faster.
Node => Multiple Executors: Each node has multiple executors.
RDD => Big DataStructure: Its main strength is that it represents data which cannot be stored on a single machine, so its data is distributed, partitioned, split across computers.
I'll clean up these apache spark concepts for you!
Spark building blocks: executor,tasks,cache,sparkcontext,cluster manager
Executor => Multiple Tasks: is a JVM process sitting on all nodes. Executors receive tasks (jars with your code) deserialize it, and run it as a task.
Executors utilize cache so that the tasks can run faster.
Node => Multiple Executors: Each node has multiple executors.
RDD => Big DataStructure: Its main strength is that it represents data which cannot be stored on a single machine, so its data is distributed, partitioned, split across computers.