How to write word count program in Hive? (Or) Write word count program in Hive?

In most of your interviews you might have came across the question to write word count program in MapReduce or using Spark RDD. But few interviewers will test your knowledge in Hive queries writing and may ask you to write the word count program or query in Hive. So let's see how we can do it.

 

The question goes like "Assume that we have a Hive table in which each record is of type String and contains one line of the file in each record. So write a query to get word count of this table records."

 

As the records in the table contain each line of the file, we need to split them first and then take each word out of split lines to get the count. So we need to in-built function split to divide the words. To take word out of split line we need to explode function on lateral view of this split words. So the solution for this will be as give below.

SELECT word, COUNT(*) FROM input_table 
LATERAL VIEW explode(split(text, ' ')) temp_table as word 
GROUP BY word;

Please comment your thoughts about this post.

11 thoughts on “How to write word count program in Hive? (Or) Write word count program in Hive?”

  1. Thank you for another magnificent article. The place else could anyone get that type of information in such a perfect approach of writing? I have a presentation subsequent week, and I’m at the look for such information.

  2. An fascinating dialogue is worth comment. I feel that you should write extra on this matter, it may not be a taboo topic however generally people are not sufficient to speak on such topics. To the next. Cheers

  3. Hey there! This post couldn’t be written any
    better! Reading through this post reminds me of my previous room mate!
    He always kept talking about this. I will forward this page to him.
    Pretty sure he will have a good read. Many thanks for
    sharing!

Leave a Reply

Your email address will not be published. Required fields are marked *