How to perform minus operation in Hive using joins?

What is Minus operation?

Below is a picture that shows Venn diagram of result of minus operation between two tables A & B upon join.

So it says that when we perform minus operation on two tables resultant data contains only rows from table A, excluding rows from table B and excluding rows that are common between the two tables A & B.

 

How to perform minus operation in Hive?

Most of the RDBMS softwares will have a built-in operation called "minus". As of latest version of Hive, we don't have any in-built operation for this. So we need to achieve it using simple joins.

Firstly, let's try to understand the result of a minus operation, which is the only records from left table excluding records from right table and common row as well. So to get this result either we can use a left join and eliminate common records or we can use "not exists" operator to get the expected result.

 

Here we have two tables called foo and foo1 with two columns in each of the tables id and name.

select * from foo;
+---------+-----------+
| foo.id  | foo.name  |
+---------+-----------+
| 12      | xyz       |
| 13      | xyz       |
| 14      | xyz       |
| 15      | xyz       |
+---------+-----------+

select * from foo1;
+----------+------------+
| foo1.id  | foo1.name  |
+----------+------------+
| 12       | xyz        |
| 13       | xyz        |
+----------+------------+

Below is tone of the ways to get a result that will be same as minus operation on foo and foo1 tables by using not exists operator:

select * from foo f where not exists(select * from foo1 f1 where f.id=f1.id);
+-------+---------+
| f.id  | f.name  |
+-------+---------+
| 14    | xyz     |
| 15    | xyz     |
+-------+---------+

And here is another solution to get the same result by using left join on foo and foo1 tables:

select f.* from foo f left join foo1 f1 on f.id=f1.id where f1.id is null;
+-------+---------+
| f.id  | f.name  |
+-------+---------+
| 14    | xyz     |
| 15    | xyz     |
+-------+---------+

In the result you can see the only rows with id 14 & 15 are displayed, which are neither present in foo1 table nor common between the two tables.

Leave a Reply

Your email address will not be published. Required fields are marked *