Skip to content
Big Data Interview
The Interview Hacker and Technical guide
  • Home
  • Blogs
  • About Us
  • Contact Us
  • Privacy Policy

Category: Uncategorized

What is DAG scheduler in Spark?

May 12, 2019 admin Leave a comment

DAG – Directed Acyclic Graph A DAG comprises of edges and vertices, in which edges represent rdds and  vertices represent…

Continue Reading →

Posted in: Uncategorized

What is RDD in Spark?

admin Leave a comment

Why RDD is immutable? What is the need of RDD in spark?   RDD – Resilient Distributed Dataset What is…

Continue Reading →

Posted in: Uncategorized

What do you know about ORC file format?

May 11, 2019 admin Leave a comment

What is ORC file format? How ORC is better than RC file format?   ORC stands for Optimized Record Columnar…

Continue Reading →

Posted in: Uncategorized

What is Avro file in Spark/Hadoop?

admin Leave a comment

Tell me something about Avro file format. What do you know about Avro files?   These are the common question…

Continue Reading →

Posted in: Uncategorized

What is Sequence file in Spark/Hadoop?

May 10, 2019 admin Leave a comment

You may have come across questions like below during any of your spark interview. So to get full knowledge on…

Continue Reading →

Posted in: Uncategorized

Is Java needed for Big Data/Spark interview?

May 8, 2019 admin Leave a comment

Many of us might be thinking is really Java required for a Big Data/Spark/Data engineer interview? If yes, what all…

Continue Reading →

Posted in: Uncategorized

What is difference between cache() and persist in Spark?

May 7, 2019 admin Leave a comment

Similar and related questions: How do you cache dataset in Spark? How many ways to cache the data in Spark?…

Continue Reading →

Posted in: Uncategorized

How to set number of reducers for a Sqoop job?

admin Leave a comment

How can you set number of reducers for Sqoop job? How many reducers did you use for your Sqoop job?…

Continue Reading →

Posted in: Uncategorized

What is meant by shared variable? What are the shared variables available in spark?

May 6, 2019 admin Leave a comment

What is a shared variable? A variable that is available on all of the executors or nodes that work on…

Continue Reading →

Posted in: Uncategorized

Post navigation

Page 4 of 4
← Previous 1 … 3 4

Recent Posts

  • Save action in Spark takes too long time/Save operation spills huge data on to disk and fails with the error “No space left on device”
  • How to set configuration to start Reduce jobs after completion of certain proportion of the Map jobs in Hive or Hadoop?
  • HDFS commands

Recent Comments

  • curry 7 sour patch on Spark groupByKey vs reduceByKey vs aggregateByKey
  • jordan 4 on Hive – Order By vs Sort By vs Cluster By vs Distribute By
  • louboutin shoes on Spark RDD vs Dataframe vs Dataset

Archives

  • August 2021
  • June 2021
  • May 2021
  • January 2021
  • December 2020
  • October 2020
  • July 2020
  • May 2020
  • April 2020
  • March 2020
  • November 2019
  • July 2019
  • June 2019
  • May 2019

Follow Us

Contact Us

  • Email
    sparkandbigdatainterview@gmail.com
Privacy Policy
Copyright © 2023 Big Data Interview