What is Minus operation? Below is a picture that shows Venn diagram of result of minus operation between two tables…
In Big Data applications we rarely get the requirement to pivot the data because making transpose of billions of rows…
EXISTS EXISTS operator will be used when we need to check if there is any row exists with a condition.…
To delete duplicate records in a DataFrame we can either use distinct or dropDuplicates method. But dropDuplicates method comes with…
While working with Big Data applications we might have used the methods withColumn and select(). They both are few of…
Spark has lot of performance enhancement techniques. Two of them are cache() and broadcast variables. Although they both used to…
In this post we will see how we can extract unique records from a Hive table. This can be achieved…
One of the most frequent questions during Data Engineering interviews. These are called Ranking functions in Hive. These are the…
This post will focus on calculating moving average or sum using Hive queries. We might have come across this question…
Hive CLI and Beeline both can be used to interact with Hive execution engine. But there are few differences between…