How to the update records in Hive?

Most of us might have worked on Hive in projects. But we never get a requirement to update records in Hive because business never requires to update records in Hive. So most people don't know the updating records procedure in Hive. In this we will get know what are all the preconditions required to update records in Hive and how to do it.

 

Precondition to update records in Hive:

  • Below properties must be set as specified:

set hive.support.concurrency=true

set hive.enforce.bucketing=true (Not required as from Hive version 2.0)

set hive.exec.dynamic.partition.mode=nonstrict

set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager

set hive.compactor.initiator.on=true

set hive.compactor.worker.threads=2

These properties can be set on hive command line using above commands or we can set in hive-site.xml file as below:

<property> 
<name>hive.compactor.initiator.on</name> 
<value>true</value> 
</property> 
<property> 
<name>hive.enforce.bucketing</name> 
<value>true</value> 
</property> 
<property> 
<name>hive.exec.dynamic.partition.mode</name> 
<value>nonstrict</value> 
</property> 
<property> 
<name>hive.txn.manager</name> 
<value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value> 
</property>
<property> 
<name>hive.compactor.worker.threads</name> 
<value>2</value> 
</property>
<property> 
<name>hive.support.concurrency</name> 
<value>true</value> 
</property>
  • Table must be stored as ORC file format.
  • Table must be bucketed
  • The property transactional¬†of the table must be set to true.

 

Let's see how to create a table in Hive and updating the rows.

To create employee table in Hive use below command:

CREATE TABLE employee (
emp_id int,
name string
) CLUSTERED BY (id) INTO 2 BUCKETS STORED AS ORC
TBLPROPERTIES ("transactional"="true");

 

To update records use below command:

UPDATE employee SET name = "Mark Berg" WHERE emp_id = 1;

 

Leave a Reply

Your email address will not be published. Required fields are marked *