Map join in Hive has several different names like Auto Map join, Map side join and Broadcast join. It is…
Adaptive Query Execution(AQE) Spark is one of the vastly used frameworks in Data Engineering to process huge data. As…
What is the use of GROUPING SETS clause in Hive queries? This is little bit rarely used clause but it…
How can you handle skewed data in Hive or Spark? How can we get rid of data skewness in Spark…
What is Skewness in Data? Data skew means data is distributed unevenly or asymmetrically. Let’s try to understand this in…
Bigdata developers always have to have some knowledge about internal working of all the components. That’s where we get to…
How to create a dataframe using a custom schema in Spark? This is one of the most common interview questions.…
What is Scala Monad? Monad is neither a data type nor class/trait. Monad is a concept. There are lot ways…
What is SCDs or Slowly Changing Dimensions? Slowly changing dimensions is a concept related to data warehousing. They track the…
While executing Hive queries you might have observed that the MapReduce task won’t start when you do perform a Select…