Skip to content
Big Data Interview
The Interview Hacker and Technical guide
  • Home
  • Blogs
  • About Us
  • Contact Us
  • Privacy Policy

Category: Big Data

How to set configuration to start Reduce jobs after completion of certain proportion of the Map jobs in Hive or Hadoop?

June 8, 2021 admin Leave a comment

Within the MapReduce framework in Platform Symphony, you can specify the proportion of the total number of map tasks in a…

Continue Reading →

Posted in: Big Data, Hive, MapReduce

HDFS commands

May 31, 2021 admin Leave a comment

HDFS commands Interview questions    1). Difference between the commands hadoop dfs and hadoop fs? hadoop dfs – This is…

Continue Reading →

Posted in: HDFSCommands Filed under: HDFSCommands, HDFSInterviewQuestions

How to process JSON data or file in HIVE without using JsonSerDe?

December 20, 2020 admin Leave a comment

It is very rare that the usage of HIVE with JSON. But sometimes business requirements might force the developers to…

Continue Reading →

Posted in: Big Data, Hive Filed under: HIVEJSON

How to add unique index or unique row number to reach row of a DataFrame?

December 7, 2020 admin Leave a comment

There are multiple ways to do this Spark. Here we have discussed two of the approaches to accomplish this task.…

Continue Reading →

Posted in: Big Data, Spark

Advanced performance enhancement techniques in Spark.

December 6, 2020 admin Leave a comment

Design choices: Language choice This impossible to answer and highly depends on your requirement. If you want to perform some…

Continue Reading →

Posted in: Big Data, Spark Filed under: Spark, SparkPerformanceEnhancementTechniques

zip, zipWithIndex and zipWithUniqueId in Spark

admin Leave a comment

These functions are little rarely used in Spark as they confined to be used with RDDs only and RDDs are…

Continue Reading →

Posted in: Big Data, Spark Filed under: zip, zipWithIndex, zipWithUniqueId

Which is the best programming language to use with Spark?

December 5, 2020 admin Leave a comment

Spark supports multiple programming languages. Out of them most used languages are Scala, Python and Java. But which is best…

Continue Reading →

Posted in: Big Data, Spark Filed under: JAva, Python, Scala, Spark

When do Spark spills the cached RDD or DataFrame onto disk? (or) What is the threshold to spill the cached data onto disk in Spark?

admin Leave a comment

Caching is one of the best optimization techniques available on Spark. When we cache any RDD or DataFrame, Spark will…

Continue Reading →

Posted in: Big Data, Spark

How to perform minus operation in Hive using joins?

December 4, 2020 admin Leave a comment

What is Minus operation? Below is a picture that shows Venn diagram of result of minus operation between two tables…

Continue Reading →

Posted in: Big Data, Hive Filed under: MinuOperationusingjoins

Miscellaneous interview questions on UNIX scripting.

admin Leave a comment

In this I have given all miscellaneous interview questions related to UNIX. For Unix interview questions we should try to…

Continue Reading →

Posted in: Big Data Filed under: Unix

Post navigation

Page 1 of 2
1 2 Next →

Recent Posts

  • Save action in Spark takes too long time/Save operation spills huge data on to disk and fails with the error “No space left on device”
  • How to set configuration to start Reduce jobs after completion of certain proportion of the Map jobs in Hive or Hadoop?
  • HDFS commands

Recent Comments

  • curry 7 sour patch on Spark groupByKey vs reduceByKey vs aggregateByKey
  • jordan 4 on Hive – Order By vs Sort By vs Cluster By vs Distribute By
  • louboutin shoes on Spark RDD vs Dataframe vs Dataset

Archives

  • August 2021
  • June 2021
  • May 2021
  • January 2021
  • December 2020
  • October 2020
  • July 2020
  • May 2020
  • April 2020
  • March 2020
  • November 2019
  • July 2019
  • June 2019
  • May 2019

Follow Us

Contact Us

  • Email
    sparkandbigdatainterview@gmail.com
Privacy Policy
Copyright © 2023 Big Data Interview