Difference between IN operator and EXISTS operator in HIVE or SQL.

EXISTS

EXISTS operator will be used when we need to check if there is any row exists with a condition. A sample query with EXISTS operator will look like below:

SELECT t1.*
FROM t1 
WHERE EXISTS (SELECT * FROM t2 WHERE t1.id = t2.t1id)

 

IN

IN operator can be used when we want to check if the value of some column is equal to any of the values provided to it or any values returned from inner query. Below is the sample query with IN operator.

Eg 1:
SELECT t1.* 
FROM   t1 
WHERE  t1.id IN ('id1','id2','id3')

(or)

Eg 2:
SELECT t1.* 
FROM   t1 
WHERE  t1.id IN (SELECT t2.t1id FROM t2)

 

Although these two operators can be used interchangeably there are some differences between them. Below are the category wise differences between them:

 

Functionality :

EXISTS operator will be used when we want to check if any values are returned or not from the sub-query.

IN operator will be used as multiple OR conditions, i.e., if the value of a column is equal to any of the values supplied to IN operator or values return from the sub-query provided you are using sub-query inside the IN operator.

 

Comparison :

EXISTS will never compare the values between the outer query result and the inner query result.

Whereas IN operator will compare the values returned from the inner query with outer query result.

 

Inner Query Execution :

In case of EXISTS operator, outer query will be executed first. Then each row will be taken from the outer query result and inner query executed for that row. The outer query will use inner query result to check the condition. So if the outer query results 100 rows then inner query will be executed for 100 times.

In case of IN operator, the inner query will be executed only once and the outer query will use this result as input for IN operator for condition checking.

 

Stop condition :

Once a positive result comes from the EXISTS operator then execution will be stopped immediately.

But for IN operator all the values will be compared.

 

Performance :

We always need better performance queries to avoid the delay. So it is very important to select one of these two operators based on the requirement. Because the performance of these two operators will change based on the inner query result.

If the inner query results in large output then then EXISTS operator should be used. Because in this case once a positive result comes from the inner query execution will be stopped.

If the inner query result is lesser then we can go for IN operator because it checks for all the values returned.

Leave a Reply

Your email address will not be published. Required fields are marked *