Skip to content
Big Data Interview
The Interview Hacker and Technical guide
  • Home
  • Blogs
  • About Us
  • Contact Us
  • Privacy Policy

Category: Spark

How to handle skewed data in Bigdata applications?

April 13, 2020 admin Leave a comment

How can you handle skewed data in Hive or Spark? How can we get rid of data skewness in Spark…

Continue Reading →

Posted in: Hive, Spark

Explain skew join in Hive?

admin Leave a comment

What is Skewness in Data? Data skew means data is distributed unevenly or asymmetrically. Let’s try to understand this in…

Continue Reading →

Posted in: Hive Filed under: Skew join, Skew join Hive

Explain Hive architecture.

April 12, 2020 admin Leave a comment

Bigdata developers always have to have some knowledge about internal working of all the components. That’s where we get to…

Continue Reading →

Posted in: Hive, Spark Filed under: Hive

How to create a dataframe with custom schema in Spark?

March 30, 2020 admin 1 Comment

How to create a dataframe using a custom schema in Spark? This is one of the most common interview questions.…

Continue Reading →

Posted in: Spark, Spark SQL

What is Monad in Scala?

July 21, 2019 admin 2 Comments

What is Scala Monad? Monad is neither a data type nor class/trait. Monad is a concept. There are lot ways…

Continue Reading →

Posted in: Scala, Spark Filed under: Monad in Scala, Scala Monad

Slowly Changing Dimensions in Hive|Hive Slowly Changing Dimensions

July 13, 2019 admin Leave a comment

What is SCDs or Slowly Changing Dimensions? Slowly changing dimensions is a concept related to data warehousing. They track the…

Continue Reading →

Posted in: Hive Filed under: Slowly Changing Dimensions

Why MapReduce task won’t run when we perform select * from table in Hive?

July 5, 2019 admin Leave a comment

While executing Hive queries you might have observed that the MapReduce task won’t start when you do perform a Select…

Continue Reading →

Posted in: Hadoop, Hive, MapReduce

How to get file of equivalent size while importing data using sqoop?

admin Leave a comment

I don’t think this question has a particular answer that certainly gives us the required result. Because data is peculiar.…

Continue Reading →

Posted in: Hadoop, Sqoop Filed under: Equivalent size files, Sqoop, Uneven Distribution

Difference between createOrReplaceTempView and registerTempTable (or) createOrReplaceTempView vs registerTempTable.

June 18, 2019 admin Leave a comment

All the functions mentioned below are more or less same functionally, but there very minor differences among them. createOrReplaceTempView createTempView…

Continue Reading →

Posted in: Spark, Spark SQL

map vs mapValues

June 16, 2019 admin Leave a comment

mapValues – This function works with PairRDDs only. So this function always requires an RDD of type RDD[(a,b)]. mapValues functions…

Continue Reading →

Posted in: Scala, Spark

Post navigation

Page 3 of 5
← Previous 1 2 3 4 5 Next →

Recent Posts

  • Option, Some, None in Scala (OR) How to handle null values in Scala?
  • What is Singleton object in Scala?
  • How to process JSON data or file in HIVE without using JsonSerDe?

Recent Comments

  • curry 7 sour patch on Spark groupByKey vs reduceByKey vs aggregateByKey
  • jordan 4 on Hive – Order By vs Sort By vs Cluster By vs Distribute By
  • louboutin shoes on Spark RDD vs Dataframe vs Dataset

Archives

  • January 2021
  • December 2020
  • October 2020
  • July 2020
  • May 2020
  • April 2020
  • March 2020
  • November 2019
  • July 2019
  • June 2019
  • May 2019

Follow Us

Contact Us

  • Email
    sparkandbigdatainterview@gmail.com
Privacy Policy
Copyright © 2021 Big Data Interview