How to set configuration to start Reduce jobs after completion of certain proportion of the Map jobs in Hive or Hadoop?

Within the MapReduce framework in Platform Symphony, you can specify the proportion of the total number of map tasks in a job that must be completed before any reduce tasks are scheduled.

Specify this ratio using the parameter specific to your Hadoop version:

  • 2.4.xmapreduce.job.reduce.slowstart.completedmaps
  • 1.1.1: mapred.reduce.slowstart.completed.maps

Configure reducer start using the command line during job submission or using a configuration file. The default value is 0.05, so that reducer tasks start when 5% of map tasks are complete. You can set this value to anything between 0 and 1. For example, at 0, the reducer tasks start even as the map tasks start. At 0.75, the reducer tasks start when 75% of the map tasks are complete.

 

The same configuration through command line:

  • Hadoop 2.4.x:
    $ mrsh jar jarfile [classname] -Dmapreduce.job.reduce.slowstart.completedmaps=n [args]
  • Hadoop 1.1.1:
    $ mrsh jar jarfile [classname] -Dmapred.reduce.slowstart.completed.maps=n [args]

 

Through configuration file:

  1. Open the mapred-site.xml configuration file from the $HADOOP_HOME/conf directory.
  2. Add the following property parameter depending on your Hadoop version. For example:
    • Hadoop version 2.4.x:
      <property>
        <name>mapreduce.job.reduce.slowstart.completedmaps</name>
        <value>0.5</value>
      </property>
      

    • Hadoop version 1.1.1:
      <property>
        <name>mapred.reduce.slowstart.completed.maps</name>
        <value>0.5</value>
      </property>
      

  3. If you did not set HADOOP_HOME to your Hadoop configuration before installing Platform Symphony or if you did not set PMR_EXTERNAL_CONFIG_PATH to your Hadoop configuration after installing Platform Symphony, copy the mapred-site.xml file to the $PMR_HOME/conf directory.

Leave a Reply

Your email address will not be published. Required fields are marked *