Skip to content

Big Data Interview

The Interview Hacker and Technical guide
  • Home
  • Blogs
  • About Us
  • Contact Us
  • Privacy Policy

Blog

  1. Pages:
  2. «
  3. 1
  4. 2
  5. 3
  6. 4
  7. 5
  8. 6
  9. 7
  10. 8
  11. 9
  12. »

Which is the best programming language to use with Spark?

December 5, 2020 admin Leave a comment

Spark supports multiple programming languages. Out of them most used languages are Scala, Python and Java. But which is best…

Continue Reading →

Posted in: Big Data, Spark Filed under: JAva, Python, Scala, Spark

When do Spark spills the cached RDD or DataFrame onto disk? (or) What is the threshold to spill the cached data onto disk in Spark?

admin Leave a comment

Caching is one of the best optimization techniques available on Spark. When we cache any RDD or DataFrame, Spark will…

Continue Reading →

Posted in: Big Data, Spark

Miscellaneous Spark interview questions – Part I

admin Leave a comment

1). Spark jargon – Job A Spark application will have a number of sub-processes and each of them can be…

Continue Reading →

Posted in: Spark

How to perform minus operation in Hive using joins?

December 4, 2020 admin Leave a comment

What is Minus operation? Below is a picture that shows Venn diagram of result of minus operation between two tables…

Continue Reading →

Posted in: Big Data, Hive Filed under: MinuOperationusingjoins

Miscellaneous interview questions on UNIX scripting.

admin Leave a comment

In this I have given all miscellaneous interview questions related to UNIX. For Unix interview questions we should try to…

Continue Reading →

Posted in: Big Data Filed under: Unix

How can we Pivot and Unpivot data in Spark (or) What is Pivot in Spark?

admin Leave a comment

In Big Data applications we rarely get the requirement to pivot the data because making transpose of billions of rows…

Continue Reading →

Posted in: Spark Filed under: Pivot, Unpivot

Difference between IN operator and EXISTS operator in HIVE or SQL.

October 26, 2020 admin Leave a comment

EXISTS EXISTS operator will be used when we need to check if there is any row exists with a condition.…

Continue Reading →

Posted in: Hive, Spark SQL Filed under: EXISTS operator, IN and EXISTS in SQL, IN Operator, SQL

Difference between dropDuplicates() and distinct.

October 10, 2020 admin Leave a comment

To delete duplicate records in a DataFrame we can either use distinct or dropDuplicates method. But dropDuplicates method comes with…

Continue Reading →

Posted in: Spark Filed under: Distinct, dropDuplicates

Difference between withColumn and select() on DataFrame in Spark.

October 5, 2020 admin Leave a comment

While working with Big Data applications we might have used the methods withColumn and select(). They both are few of…

Continue Reading →

Posted in: Big Data, Spark Filed under: select(), withColumn, withColumnHiddenCost

What is the difference between cache and broadcast variable?

admin Leave a comment

Spark has lot of performance enhancement techniques. Two of them are cache() and broadcast variables. Although they both used to…

Continue Reading →

Posted in: Big Data, Spark Filed under: Broadcastvariable, Cache
  1. Pages:
  2. «
  3. 1
  4. 2
  5. 3
  6. 4
  7. 5
  8. 6
  9. 7
  10. 8
  11. 9
  12. »

Post navigation

Page 2 of 9
← Previous 1 2 3 … 9 Next →

Follow Us

Contact Us

  • Email
    sparkandbigdatainterview@gmail.com
Privacy Policy
Copyright © 2023 Big Data Interview