cloud computing: Hive

June 21, 2013

Hive is a data warehouse system for Hadoop.
Hive provides a SQL-like language called HiveQL. Due its SQL-like interface, Hive is increasingly becoming the technology of choice for using Hadoop

Prerequisites

The following are the prerequisites for setting up Hive and running Hive queries

You should have the latest stable build of Hadoop
To install hadoop,
Your machine should have Java 1.6 installed
It is assumed you have some knowledge of Java programming and are familiar with concepts such as classes and objects, inheritance, and interfaces/abstract classes.
Basic knowledge of Linux will help you understand many of the linux commands used in the tutorial

Setting up Hive

Platform

This tutorial assumes Linux. If using Windows, please install Cygwin. It is required for shell support in addition to the required software above.

Procedure

Download the most recent stable release of Hive as a tarball from one of the apache download mirrors. For our tutorial, we are going to use hive-0.9.0.tar.gz

Unpack the tarball in the directory of your choice, using the following command

$ tar -xzvf hive-x.y.z.tar.gz

Set the environment variable HIVE_HOME to point to the installation directory:

You can either do

$ cd hive-x.y.z

$ export HIVE_HOME={{pwd}}

or set HIVE_HOME in $HOME/.profile so it will be set every time you login.

Add the following line to it.

export HIVE_HOME=<path_to_hive_home_directory>

e.g.

export HIVE_HOME='/Users/Work/hive-0.9.0'

export PATH=$HADOOP_HOME/bin:$HIVE_HOME/bin:$PATH

Start Hadoop (Refer to the Single-Node Hadoop Setup Guide for more information). It should show the processes being started. You can check the processes started by using the jps query

$ start-all.sh

<< Starting various hadoop processes >>

$ jps

3097 Jps

2355 RunJar

2984 JobTracker

2919 SecondaryNameNode

2831 DataNode

2743 NameNode

3075 TaskTracker

In addition, you must create /tmp and /user/hive/warehouse (aka hive.metastore.warehouse.dir) and set aprpopriate permissions in HDFS before a table can be created in Hive as shown below:

$ hadoop fs -mkdir /tmp

$ hadoop fs -mkdir /user/hive/warehouse

$ hadoop fs -chmod g+w /tmp

$ hadoop fs -chmod g+w /user/hive/warehouse

Search This Blog

Network Simulator (NS2)