In most of your interviews you might have came across the question to write word count program in MapReduce or using Spark RDD. But few interviewers will test your knowledge in Hive queries writing and may ask you to write the word count program or query in Hive. So let's see how we can do it.
The question goes like "Assume that we have a Hive table in which each record is of type String and contains one line of the file in each record. So write a query to get word count of this table records."
As the records in the table contain each line of the file, we need to split them first and then take each word out of split lines to get the count. So we need to in-built function split to divide the words. To take word out of split line we need to explode function on lateral view of this split words. So the solution for this will be as give below.
SELECT word, COUNT(*) FROM input_table
LATERAL VIEW explode(split(text, ' ')) temp_table as word
GROUP BY word;
Please comment your thoughts about this post.