Hive – Miscellaneous interview questions.

1). How to convert an external table to internal table?

We can achieve this by using below command:

ALTER TABLE <table_name> SET TBLPROPERTIES('EXTERNAL'='FALSE');

If we have an external table called employee, to convert it from external to internal we can use the above command as:

ALTER TABLE employee SET TBLPROPERTIES('EXTERNAL'='FALSE');

To switch from internal to external we can use the same command with the property EXTERNAL changed to TRUE as shown below:

ALTER TABLE employee SET TBLPROPERTIES('EXTERNAL'='TRUE');

 

2). How to add a new column between two existing columns in Hive?
We can achieve this using alter table command.

Firstly, we need to add the new column to the table by using below command:

alter table <table_name> add columns(<new_column> <data_type>);

Now we can change the column position to wherever we want. For this we can use after command, which takes an extra column name after which we want place our new column. The command will be as shown below:

alter table <table_name> change column <newly_added_column> <data_type> after <target_colum>;

Here I have created employee with only two columns id and name. Then I have added a new column salary of data type int using alter command.
The column salary added after the last column name. After that I have changed the position of salary command usin alter command next to id column. The series of commands that I have used are below:

create table employee( id int, name string) stored as textfile;
alter table employee add columns(salary int);
alter table employee change column salary salary bigint after id;

 

 

3). How to rename a table in Hive?

To rename a table we can use the following command:

ALTER TABLE <table_name> RENAME TO <new_table_name>;

But there is small small problem involved with this command while rename table if it is an external table. If we rename an internal or managed table. If we rename an external table it doesn't change its location that is pointing to the data in HDFS, but if it is an internal or managed table then the location pointing to the data changes. So before renaming, we have to switch the table from internal to external and then we can rename. This switching concept was discussed in the first question of this post. Please refer to it.

 

4). When does HIVE launch only Map task but not Reducer task?

You might have heard something like map only task in MapReduce concept. Sometimes Hive does that. One of them is Map side join. Whenever we do a Broadcast join or Map side join then Hive launches only Mapper task but it won't start any Reducer task.

 

5). What happens if a partitioned column has null values in it? 

(or)

In which partition the null values of partitioned column will be stored in hive?

It is a good practice to use partitioned tables while working with Hive. Hive engine will create a separate folder for each value of a column and stores the remaining columns into files and puts them in the respective partition folder. So if the partitioned column has null value then Hive will create a default partition with the name __HIVE_DEFAULT_PARTITION__ and stores the data for all the null valued rows.

 

7). How to remove qualifier or table name while printing column names while printing the header in Hive query results?

Sometime we observe that while printing query results, Hive will assign table concatenated with a dot(.) as prefix for all the column names. To avoid this we set below configuration to false.

set hive.resultset.use.unique.column.names=false;

This configuration is related to hive server. So it can be set via CLI or configuration file.

 

If you liked this post or if you think any thing to be modified on this post please comment..

Leave a Reply

Your email address will not be published. Required fields are marked *