How to set number of reducers for a Sqoop job?

How can you set number of reducers for Sqoop job?

How many reducers did you use for your Sqoop job?

How many default number of reducers will Sqoop job uses?

 

When you face this question during an interview and you don't know the answer, you get two questions in your mind: "Why don't I get these type questions when I'm preparing for the interview?" and "Did I ever set number of reducers for a Sqoop job?". Of course, I got the same thought. I faced this question in almost all the interviews. But don't worry. I'm here to help you with the answer and the reason for that.

 

Why Sqoop doesn't require reducers or reduce jobs?

Reducers are required for aggregation jobs. Sqoop never uses reducers, because it does parallel import of the data from RDBMS to Hadoop. So There won't be any aggregation operation involved.

When you tell this answer to interviewer he will be ready to shoot you with another question: "What if I specify a query in the option --query of Sqoop job which involves an aggregation?". This Sqoop job also doesn't require any reducers. Because whatever the aggregation query that is mentioned in Sqoop job, will be performed in RBMS only. After that the result data of the query will be imported to Hadoop using parallel processing.

Leave a Reply

Your email address will not be published. Required fields are marked *