How can you handle skewed data in Hive or Spark? How can we get rid of data skewness in Spark…
What is Skewness in Data? Data skew means data is distributed unevenly or asymmetrically. Let’s try to understand this in…
Bigdata developers always have to have some knowledge about internal working of all the components. That’s where we get to…
How to create a dataframe using a custom schema in Spark? This is one of the most common interview questions.…
What is Scala Monad? Monad is neither a data type nor class/trait. Monad is a concept. There are lot ways…
What is SCDs or Slowly Changing Dimensions? Slowly changing dimensions is a concept related to data warehousing. They track the…
While executing Hive queries you might have observed that the MapReduce task won’t start when you do perform a Select…
I don’t think this question has a particular answer that certainly gives us the required result. Because data is peculiar.…
All the functions mentioned below are more or less same functionally, but there very minor differences among them. createOrReplaceTempView createTempView…
mapValues – This function works with PairRDDs only. So this function always requires an RDD of type RDD[(a,b)]. mapValues functions…