How to prepare for Spark interview?

How to prepare resume for Bigdata interview?

Spark interview includes lot of other bigdata technologies like Hadoop, Hive, Sqoop, Flume etc., because these are commonly used combinations in Bigdata projects. Based on the skills that you mentioned your resume and you have worked on, this list may increase. So please be careful while mentioning the skills set in your resume and be thorough with all the concepts in all the skills that are there in resume.


Where to start my preparation?

Firstly start with some basic concepts of Hadoop like MapReduce, HDFS etc. and if you have good knowledge in Hadoop as well that is an add on. Next comes the Spark. As Hadoop is slowly becoming obsolete most of the companies are not using it. Spark has lot of concepts right from the motive of creating the Spark framework till enhancing the performance of the jobs. In Spark, the main concepts that will be asked in interviews are RDD, Dataframes, Datasets, some functions of RDD data structure like map vs flatmap, groupByKey vs reduceByKey vs combineByKey, what are all the file formats it supports and little deep dive into them, persistence and cache, accumulators and broadcast variables, spark sql, how to submit jobs, how to enhance the performance of the Spark jobs, how to allocate resources, how to debug, how to recover from failure and so on.


Most of the Bigdata resumes will have Hive in them. So be prepared for it also. In Hive you have to cover the topics from creating table till maintaining partitions. Hive is also a kind of ocean when you appear for Date engineer job opening. Mostly asked concepts are External and manged tables and their uses & usages, partitioning and bucketing, dynamic and static partitioning, how to recover partitions, what happens if we delete or add partitions manually using HDFS commands, how to optimize queries of hive.


Sqoop is another technology in Bigdata technology stack which is used for migrating data from relational databases to cluster. Looks like simple commands but never under estimate it. But there is simple way to cover all the concepts of sqoop that is Sqoop official documentation. Nothing else needed. In this topic you will always get some questions like how a basic Sqoop command looks and even interviewers ask us to write it down, how can we perform incremental updates, how to export to RDBMS tables, how to deal with hive tables in Sqoop.


Spark supports Java, Python, Scala and R but mostly the first three are used in real time. R language is used to statistical analysis mostly. But in the interviews you will be asked questions on Java too. Because it's little tricky language, it has lot of concepts and using this language alone you will be disqualified in that round. Put the language which you love the most, means you dove deeply into it. Questions that are specific to a  programming language will completely depends on your resume and the interviewer also asks you which language you prefer. So get complete knowledge on a programming and put it in your resume.


Last but not the least, Unix scripting. Of course we need it for running our Spark applications. Unix scripting will be almost all projects even if they are not Bigdata projects. And remember knowing cat and grep commands alone will never gonna help you in the interviews. Interviewers are very intelligent they will test your complete knowledge on these things. Please make sure you have good knowledge in scripting. If you are lucky enough, you may be asked any questions in UNIX. Make sure you cover some topics on it or else you'll be the loser.

Leave a Reply

Your email address will not be published. Required fields are marked *