Spark Archives - Page 3 of 5 - Big Data Interview

Map join in Hive (or) Map side join in Hive (or) Auto Map join in Hive (or) Broadcast join in Hive

July 13, 2020 admin Leave a comment

Map join in Hive has several different names like Auto Map join, Map side join and Broadcast join. It is…

July 5, 2020 admin Leave a comment

Adaptive Query Execution(AQE) Spark is one of the vastly used frameworks in Data Engineering to process huge data. As…

May 14, 2020 admin Leave a comment

What is the use of GROUPING SETS clause in Hive queries? This is little bit rarely used clause but it…

April 13, 2020 admin Leave a comment

How can you handle skewed data in Hive or Spark? How can we get rid of data skewness in Spark…

admin Leave a comment

What is Skewness in Data? Data skew means data is distributed unevenly or asymmetrically. Let’s try to understand this in…

April 12, 2020 admin Leave a comment

Bigdata developers always have to have some knowledge about internal working of all the components. That’s where we get to…

March 30, 2020 admin 1 Comment

How to create a dataframe using a custom schema in Spark? This is one of the most common interview questions.…

July 21, 2019 admin 2 Comments

What is Scala Monad? Monad is neither a data type nor class/trait. Monad is a concept. There are lot ways…

July 13, 2019 admin Leave a comment

What is SCDs or Slowly Changing Dimensions? Slowly changing dimensions is a concept related to data warehousing. They track the…

July 5, 2019 admin Leave a comment

While executing Hive queries you might have observed that the MapReduce task won’t start when you do perform a Select…