What happens if we add or delete partitions manually in Hive? (or) What is MSCK repair command in Hive?

We will never think of this kind of scenarios when we work with Hive tables in our projects. But in interviews we will be asked this kind of questions to test our knowledge in Hive failures. So in this blog we will see how to let the metastore know about the partitions added or deleted.

Hive stores the details about tables like table column details, partitions and their locations in metastore. Whenever we add a partition to HDFS or delete partitions from HDFS metastore will not aware of this background operations. So to make the metastore to know about these partitions we need to use below command:

MSCK REPAIR TABLE <table name>

This command will be useful when you have scheduled ingestion of data into HDFS as partitions. In that cases you are mandate to use this command to refresh the metadata or else Hive engine will never take the newly added partitions for processing.

Leave a Reply

Your email address will not be published. Required fields are marked *