cloud computing: Hive
Hive is a data warehouse system for Hadoop.
Hive provides a SQL-like language called HiveQL. Due its SQL-like interface, Hive is increasingly becoming the technology of choice for using Hadoop
Hive provides a SQL-like language called HiveQL. Due its SQL-like interface, Hive is increasingly becoming the technology of choice for using Hadoop
Prerequisites
The following are the prerequisites for setting up Hive and running Hive queries
- You should have the latest stable build of Hadoop
- To install hadoop,
- Your machine should have Java 1.6 installed
- It is assumed you have some knowledge of Java programming and are familiar with concepts such as classes and objects, inheritance, and interfaces/abstract classes.
- Basic knowledge of Linux will help you understand many of the linux commands used in the tutorial
Setting up Hive
Platform
This tutorial assumes Linux. If using Windows, please install Cygwin. It is required for shell support in addition to the required software above.
Procedure
Download the most recent stable release of Hive as a tarball from one of the apache download mirrors. For our tutorial, we are going to use hive-0.9.0.tar.gz
Unpack the tarball in the directory of your choice, using the following command
$ tar -xzvf hive-x.y.z.tar.gz
Set the environment variable HIVE_HOME to point to the installation directory:
You can either do
$ cd hive-x.y.z
$ export HIVE_HOME={{pwd}}
or set HIVE_HOME in $HOME/.profile so it will be set every time you login.
Add the following line to it.
export HIVE_HOME=<path_to_hive_home_directory>
e.g.
export HIVE_HOME='/Users/Work/hive-0.9.0'
export PATH=$HADOOP_HOME/bin:$HIVE_HOME/bin:$PATH
Start Hadoop (Refer to the Single-Node Hadoop Setup Guide for more information). It should show the processes being started. You can check the processes started by using the jps query
$ start-all.sh
<< Starting various hadoop processes >>
$ jps
3097 Jps
2355 RunJar
2984 JobTracker
2919 SecondaryNameNode
2831 DataNode
2743 NameNode
3075 TaskTracker
|
In addition, you must create /tmp and /user/hive/warehouse (aka hive.metastore.warehouse.dir) and set aprpopriate permissions in HDFS before a table can be created in Hive as shown below:
$ hadoop fs -mkdir /tmp
$ hadoop fs -mkdir /user/hive/warehouse
$ hadoop fs -chmod g+w /tmp
$ hadoop fs -chmod g+w /user/hive/warehouse
|
Comments
Post a Comment