Sqoop Interview Questions and Answers (Part-2)

1). How to import large object such as BLOB and CLOB using Sqoop?

There are no arguments available for direct import large objects in Sqoop. So in order to import the large objects JDBC based imports have to be made without using direct arguments.

 

2). What is Sqoop Job?

To perform incremental updates we need to the run Sqoop job with incremental column updates. So to accomplish this we can create a Sqoop job, which is stored in metastore along with update values and we can it any time we want. This is to continuously import the data incrementally.

 

3). How can we update the rows which are already exported to RDBMS?

To update the already exported rows we can use the argument --update-key which takes the Sqoop job to update mode. In the argument --update-key we can mention all the columns which uniquely identifies a row. All those columns will be used in WHERE clause of the update query that will be generated when we run the job. Remaining columns will be used SET clause to update the values.

 

4). How to compress the output file of a Sqoop job?

There are arguments available to do compress the output. The argument --compress enables compression and the argument --compression-codec allows the user to specify the compression algorithm that is to be used with this.

 

5). What is Sqoop merge tool.

Sqoop merge to is for combining two datasets. We can use Sqoop merge tool to merge to files where the older file's data will be replaced with new file's data and the new records will be added.

 

6). What is codegen in Sqoop?

Codegen is a tool available in Sqoop to generate Java code for data records of a database.

 

7). How can you import a subset of based on a condition from a database?

We can use --where parameter we can do this.

Eg: --where "id_num = '12345'"

 

8). How to perform incremental data load in Sqoop?

Some times business might need to you process the that is daily generated along with data that is already getting processed everyday. So in such cases you need to get the latest records after the last record imported. This type of extraction is called incremental load.

To perform incremental load we need to use three arguments.

--incremental: This is to set the mode incremental loads. This will take one of the two arguments, append or lastmodified. The mode append will take latest appended records based on the value of --check-column argument.

--check-column: This argument is to check the column from which latest records to be identified.

--last-value: This argument is to specify the last imported value. Based on this values Sqoop will take next record onwards.

 

9).What is the default database that Apache Sqoop has?

The default database in Sqoop is MySql.

 

10). What is eval tool in Sqoop?

The tool eval is used to test the database connections and to execute some sample queries to check whether the desired results are coming or not.

 

11). How to set number of reducers for a Sqoop job?

Sqoop never uses reducers.

 

12). You have set number of mappers to 4 while importing data but you are not sure whether the table has primary key column or not. How can you handle this situation?

If we have set number of mappers to more than 1 while importing and the database table has no primary key column then the Sqoop job fails. To avoid this we have use --autoreset-to-one-mapper argument, which automatically resets the number of mappers to 1.

 

13). Difference between Sqoop and Flume (or) Sqoop vs Flume.

  • Sqoop is used to move data from structured datasources like RDBMS whereas Flume is used in bulk data streaming operations
  • Sqoop has connector based architecture where as Flume has agent based architecture.
  • Data import is event driven in Flume but not is Sqoop.
  • In Sqoop data directly goes to HDFS where as in Sqoop data has to flow through channels.

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *