Skip to content
Big Data Interview
The Interview Hacker and Technical guide
  • Home
  • Blogs
  • About Us
  • Contact Us
  • Privacy Policy

Category: Spark

Difference between dropDuplicates() and distinct.

October 10, 2020 admin Leave a comment

To delete duplicate records in a DataFrame we can either use distinct or dropDuplicates method. But dropDuplicates method comes with…

Continue Reading →

Posted in: Spark Filed under: Distinct, dropDuplicates

Difference between withColumn and select() on DataFrame in Spark.

October 5, 2020 admin Leave a comment

While working with Big Data applications we might have used the methods withColumn and select(). They both are few of…

Continue Reading →

Posted in: Big Data, Spark Filed under: select(), withColumn, withColumnHiddenCost

What is the difference between cache and broadcast variable?

admin Leave a comment

Spark has lot of performance enhancement techniques. Two of them are cache() and broadcast variables. Although they both used to…

Continue Reading →

Posted in: Big Data, Spark Filed under: Broadcastvariable, Cache

How to delete duplicate records in Hive (or) How to extract unique records in Hive using analytical functions.

July 14, 2020 admin Leave a comment

In this post we will see how we can extract unique records from a Hive table. This can be achieved…

Continue Reading →

Posted in: Hive

RANK() vs DENSE_RANK() vs ROW_NUMBER() in Hive (or) Differences between RANK(), DENSE_RANK() and ROW_NUMBER() (or) Ranking window functions in Hive

admin Leave a comment

One of the most frequent questions during Data Engineering interviews. These are called Ranking functions in Hive. These are the…

Continue Reading →

Posted in: Hive Filed under: DENSE_RANK() in Hive(), RANK() in Hive, RANK() vs DENSE_RANK() vs ROW_NUMBER() in Hive, ROW_NUMBER() in Hive

How to calculate moving sum or moving average in Hive?

admin Leave a comment

This post will focus on calculating moving average or sum using Hive queries. We might have come across this question…

Continue Reading →

Posted in: Hive Filed under: Moving Average in Hive, Moving sum in Hive

Hive CLI vs Beeline (or) Difference between Hive CLI and Beeline

July 13, 2020 admin Leave a comment

Hive CLI and Beeline both can be used to interact with Hive execution engine. But there are few differences between…

Continue Reading →

Posted in: Hive Filed under: Beeline, Hive CLI, Hive CLI vs Beeline

Map join in Hive (or) Map side join in Hive (or) Auto Map join in Hive (or) Broadcast join in Hive

admin Leave a comment

Map join in Hive has several different names like Auto Map join, Map side join and Broadcast join. It is…

Continue Reading →

Posted in: Hive Filed under: Joins in Hive, Map join in Hive

What is Adaptive Query Execution in Spark?

July 5, 2020 admin Leave a comment

Adaptive Query Execution(AQE)   Spark is one of the vastly used frameworks in Data Engineering to process huge data. As…

Continue Reading →

Posted in: Spark

Explain about Grouping Sets in Hive (or) Grouping Sets in SQL?

May 14, 2020 admin Leave a comment

What is the use of GROUPING SETS clause in Hive queries? This is little bit rarely used clause but it…

Continue Reading →

Posted in: Hive Filed under: CUBE, GROUPING SETS, GROUPING__ID(), HIVE QL, ROLLUP, SQL

Post navigation

Page 2 of 5
← Previous 1 2 3 … 5 Next →

Recent Posts

  • Option, Some, None in Scala (OR) How to handle null values in Scala?
  • What is Singleton object in Scala?
  • How to process JSON data or file in HIVE without using JsonSerDe?

Recent Comments

  • curry 7 sour patch on Spark groupByKey vs reduceByKey vs aggregateByKey
  • jordan 4 on Hive – Order By vs Sort By vs Cluster By vs Distribute By
  • louboutin shoes on Spark RDD vs Dataframe vs Dataset

Archives

  • January 2021
  • December 2020
  • October 2020
  • July 2020
  • May 2020
  • April 2020
  • March 2020
  • November 2019
  • July 2019
  • June 2019
  • May 2019

Follow Us

Contact Us

  • Email
    sparkandbigdatainterview@gmail.com
Privacy Policy
Copyright © 2021 Big Data Interview