Why the block size is large in Hadoop?
What is the use of having large block size in Hadoop?
These question look like simple but it has small logic. Here is the answer for that..
Normally block size in HDFS is 64 MB or 128 MB. Hadoop is used to process large data sets which will be of size terabytes and petabytes. This large size data will be processed in the cluster of computers. A Cluster is a group of computers connected by network and each computer is called a node, which processes some amount of data of total data. Even then each node has to process large data. So the data from the disk needs to accessed frequently. But the disk I/O is a very costly operation and consumes good amount of CPU and processing time. So the block size will be commonly 64 MB or more.
As soon as you answer like this or similar to this, interviewer will shoot another question.
What happens if the block is larger?
How does huge block size increases the disk I/O performance?
Block size in normal file systems will be 4kb. So when disk needs to allocate memory for a file it looks for the blocks, where they are available and places the blocks there and this continues for the entire file and blocks are allocated randomly. If the block size is more then 64 MB data of total file will be stored in one block by aggregating small blocks together and these small block will be allocated contiguously. So, while accessing data it doesn't need to search for small blocks but big blocks are available on the disk. Overall the answer is memory will be allocated contiguously on the disk.
Please comment your thoughts about this post..