Skip to content
Big Data Interview
The Interview Hacker and Technical guide
  • Home
  • Blogs
  • About Us
  • Contact Us
  • Privacy Policy

Month: July 2020

How to delete duplicate records in Hive (or) How to extract unique records in Hive using analytical functions.

July 14, 2020 admin Leave a comment

In this post we will see how we can extract unique records from a Hive table. This can be achieved…

Continue Reading →

Posted in: Hive

RANK() vs DENSE_RANK() vs ROW_NUMBER() in Hive (or) Differences between RANK(), DENSE_RANK() and ROW_NUMBER() (or) Ranking window functions in Hive

admin Leave a comment

One of the most frequent questions during Data Engineering interviews. These are called Ranking functions in Hive. These are the…

Continue Reading →

Posted in: Hive Filed under: DENSE_RANK() in Hive(), RANK() in Hive, RANK() vs DENSE_RANK() vs ROW_NUMBER() in Hive, ROW_NUMBER() in Hive

How to calculate moving sum or moving average in Hive?

admin Leave a comment

This post will focus on calculating moving average or sum using Hive queries. We might have come across this question…

Continue Reading →

Posted in: Hive Filed under: Moving Average in Hive, Moving sum in Hive

Hive CLI vs Beeline (or) Difference between Hive CLI and Beeline

July 13, 2020 admin Leave a comment

Hive CLI and Beeline both can be used to interact with Hive execution engine. But there are few differences between…

Continue Reading →

Posted in: Hive Filed under: Beeline, Hive CLI, Hive CLI vs Beeline

Data Engineer interview preparation/Bigdata Interview Questions/Data Engineer Interview Questions

admin Leave a comment

If you are an aspiring Data Engineer and reading this article implies that you have landed on a wonderful website.…

Continue Reading →

Posted in: Big Data, Data Engineering Filed under: BigData Interview Questions, Data Engineer Interview Question, Scenario based Bigdata interview questions

Map join in Hive (or) Map side join in Hive (or) Auto Map join in Hive (or) Broadcast join in Hive

admin Leave a comment

Map join in Hive has several different names like Auto Map join, Map side join and Broadcast join. It is…

Continue Reading →

Posted in: Hive Filed under: Joins in Hive, Map join in Hive

What is Adaptive Query Execution in Spark?

July 5, 2020 admin Leave a comment

Adaptive Query Execution(AQE)   Spark is one of the vastly used frameworks in Data Engineering to process huge data. As…

Continue Reading →

Posted in: Spark

Recent Posts

  • Save action in Spark takes too long time/Save operation spills huge data on to disk and fails with the error “No space left on device”
  • How to set configuration to start Reduce jobs after completion of certain proportion of the Map jobs in Hive or Hadoop?
  • HDFS commands

Recent Comments

  • curry 7 sour patch on Spark groupByKey vs reduceByKey vs aggregateByKey
  • jordan 4 on Hive – Order By vs Sort By vs Cluster By vs Distribute By
  • louboutin shoes on Spark RDD vs Dataframe vs Dataset

Archives

  • August 2021
  • June 2021
  • May 2021
  • January 2021
  • December 2020
  • October 2020
  • July 2020
  • May 2020
  • April 2020
  • March 2020
  • November 2019
  • July 2019
  • June 2019
  • May 2019

Follow Us

Contact Us

  • Email
    sparkandbigdatainterview@gmail.com
Privacy Policy
Copyright © 2023 Big Data Interview