What is the need of large block size in Hadoop?

Why the block is large in Hadoop?

What is the use of having large block size in Hadoop?


These question look like simple but they have small logic. Here is the answer for it.


Normally block size in HDFS is 64 MB or 128 MB. Hadoop is used to process large data sets which will be of size terabytes and petabytes. This large size data will be processed in the cluster of computers. A Cluster a group of computers connected by network and each computer is called a node. which processes some amount of data of total data. Even then each node processes large data. So the data from the disk needs to accessed. But the disk I/O is very costly operation and consumes good amount CPU and processing time. So the block size will be commonly 64 MB or more.


As soon as you answer like this or similar to this, interviewer will shoot another question.

What happens if the block is larger?


How the block size increases the disk I/O performance?

Block size in normal file systems will 4kb. So when disk need to allocate memory for a file it looks for the blocks, where they are available and places the block there and this continues for the entire file and blocks are allocated randomly. If the block size is more then 64 MB data of total file will be stored in one block by aggregating small blocks together and these small block will be allocated contiguously. So, while accessing data it doesn't need to search for small blocks but big blocks are available on the disk. Overall the answer is memory will be allocated contiguously on the disk.

Leave a Reply

Your email address will not be published. Required fields are marked *