Why/How Spark is faster than Hadoop?

What is Hadoop?

Hadoop is a framework that is used to process large data sets using a programming paradigm called MapReduce. Hadoop uses HDFS as underlying storage. HDFS is a specially designed file system to store large files with larger block than normal file system.

 

What is Spark?

Spark is also a data processing framework that is used to process large data sets. Spark supports various file systems for storage.

 

How Spark is better/faster than Hadoop?

Spark uses In-Memory computing, which means that it uses memory to store the intermediate data that is generated during the processing. So the disk I/O will not be there. Disk I/O is an expensive operation but Spark never uses it. So the performance of Spark processing time will be 10-100 times faster than Hadoop. Apart from these Spark has efficient data structures RDD, Dataframe and Dataset.

Leave a Reply

Your email address will not be published. Required fields are marked *