How to get file of equivalent size while importing data using sqoop?

I don't think this question has a particular answer that certainly gives us the required result. Because data is peculiar. You can't decide how it will be. To get this task done we must have complete knowledge how the data is. Whether it has any primary key, size of the data, what kind of column it has and how many of them contains nulls or invalid values.

The simplest answer is if the data don't have a primary key column then it is better to with the argument --direct-split-size. Below is the complete syntax of the argument.

--direct-split-size 570000000

This line you can include in your sqoop import command. This will divide the entire imported data into files of size 570 MB. This is one of the ways to get equal sized or equivalent sized files.

 

Another way is to use --split-by argument. To use this we must have knowledge on one the columns which can be use with this argument. The column must be distributed evenly between smallest and largest values in the column.

 

Please comment your thoughts about this post..

Leave a Reply

Your email address will not be published. Required fields are marked *