Using Hive on the Big Data Cluster

What is Hive?

Hive is a SQL language that processes and analyzes data in Hadoop. Hive QL query can be performed from the terminal.
The purpose of this documentation is to show users how to access Hive on the big data cluster CLI.

Steps to access Hive on the Big data cluster

Requirement:

  1. User has the appropriate permission to access the cluster
  2. Open two terminal sessions

Step 1: Start an SSH Login to the cluster on one of the open terminals

    • Mac and Linux users:
    • Perform ssh UMBC_ID@login.hadoop.umbc.edu #Note: UMBC_ID is your myUMBC username login
  • Windows users will need to use putty to access the cluster.

NOTE: To connect to the cluster remotely please use your VPN connection.

Then you’ll be prompted to input your password. As shown below.

YOUR PC ~ % ssh UMBC_ID@login.hadoop.umbc.edu
WARNING: UNAUTHORIZED ACCESS to this computer is in violation of Criminal Law
         Article section 8-606 and 7-302 of the Annotated Code of MD.

NOTICE:  This system is for the use of authorized users only. Individuals using
         this computer system without authority, or in excess of their authority
         , are subject to having all of their activities on this system
         monitored and recorded by system personnel.

UMBC_ID@login.hadoop.umbc.edu's password: 
Last login: Tue Jun  9 11:44:12 2020 from 130.85.46.245

UMBC Division of Information Technology                    http://doit.umbc.edu/
--------------------------------------------------------------------------------
If you have any questions or problems regarding these systems, please call the
DoIT Technology Support Center at 410-455-3838, or submit your request on the
web by visiting http://my.umbc.edu/help/request

Remember that the Division of Information Technology will never ask for your
password. Do NOT give out this information under any circumstances.
--------------------------------------------------------------------------------

-bash-4.2$ 

This means you are in the Hadoop cluster and you have been granted the necessary permission.

Step 2: input “hive” to use Hive QL on the Big data cluster CLI

Note: The first time the commands run on the cluster, it produced more output messages than is displayed below.

-bash-4.2$ hive
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/jars/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
20/08/17 16:23:22 INFO conf.HiveConf: Found configuration file file:/etc/hive/conf.cloudera.hive/hive-site.xml

Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/jars/hive-common-2.1.1-cdh6.2.0.jar!/hive-log4j2.properties Async: false
20/08/17 16:23:24 INFO SessionState: 
Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/jars/hive-common-2.1.1-cdh6.2.0.jar!/hive-log4j2.properties Async: false
20/08/17 16:23:24 INFO session.SessionState: Created HDFS directory: /tmp/hive/tobi/bdd25d87-de63-4162-9597-76f942bfb6bc
20/08/17 16:23:24 INFO session.SessionState: Created local directory: /tmp/tobi/bdd25d87-de63-4162-9597-76f942bfb6bc
20/08/17 16:23:24 INFO session.SessionState: Created HDFS directory: /tmp/hive/tobi/bdd25d87-de63-4162-9597-76f942bfb6bc/_tmp_space.db
20/08/17 16:23:24 INFO conf.HiveConf: Using the default value passed in for log id: bdd25d87-de63-4162-9597-76f942bfb6bc
20/08/17 16:23:24 INFO session.SessionState: Updating thread name to bdd25d87-de63-4162-9597-76f942bfb6bc main

20/08/17 16:23:24 INFO CliDriver: 
WARNING: Hive CLI is deprecated and migration to Beeline is recommended.
hive> 

From here a user can perform all the Hive QL desired. For more reference on specific Hive QL please refer to the Hive Documentation page

Step 3: Perform a show table command to see all the tables created using the Hive QL

hive> show tables;
20/08/17 16:24:14 INFO conf.HiveConf: Using the default value passed in for log id: bdd25d87-de63-4162-9597-76f942bfb6bc
20/08/17 16:24:14 INFO session.SessionState: Updating thread name to bdd25d87-de63-4162-9597-76f942bfb6bc main
20/08/17 16:24:14 INFO ql.Driver: Compiling command(queryId=tobi_20200817162414_93756116-203f-4b51-8a6b-abadbba16872): show tables
20/08/17 16:24:14 INFO ql.Driver: Semantic Analysis Completed
20/08/17 16:24:14 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null)
20/08/17 16:24:14 INFO exec.ListSinkOperator: Initializing operator LIST_SINK[0]
20/08/17 16:24:14 INFO ql.Driver: Completed compiling command(queryId=tobi_20200817162414_93756116-203f-4b51-8a6b-abadbba16872); Time taken: 0.04 seconds
20/08/17 16:24:14 INFO ql.Driver: Executing command(queryId=tobi_20200817162414_93756116-203f-4b51-8a6b-abadbba16872): show tables
20/08/17 16:24:14 INFO ql.Driver: Starting task [Stage-0:DDL] in serial mode
20/08/17 16:24:14 INFO ql.Driver: Completed executing command(queryId=tobi_20200817162414_93756116-203f-4b51-8a6b-abadbba16872); Time taken: 0.01 seconds
OK
20/08/17 16:24:14 INFO ql.Driver: OK
20/08/17 16:24:14 INFO mapred.FileInputFormat: Total input files to process : 1
bucketed_users
complex
data_2010
data_2011
dec_cons
demo
doc
flight_summary2010
flight_summary2011
flight_summary_2010
flight_summary_2011
flights
fpm_data
g4_flight_summary_2010
g4_flight_summary_2011
g4_temp_2010
g4_temp_2011
game_txt
games
gender_id
gender_id222
group2_complex
group2_int
group2_sales
group2_summary2011
group2_things
group2_x
group3_summary2011
group3_table_summary2010
group3gender_id
group3sales
group3spark_summary2010
group3spark_table_summary2010
group3sparksql
group3things
group4_sales
group4_students
group4_things
hw4
logs
max1
n_games
n_player_games
n_players
n_word_search
names_text
names_text1
newstudents
newtable
newtable2
player_games_txt
playergames
q2
q2test
q3_player
qtest
question_two
sales22
sales222
src
src2
students
students2
summary
t4_sales
t4_things
t66_dataset
team1_kv
team1_lab4_sales
team1_max1
team1_sales
team1_students
team1_things
team2_students
test3
things
things22
things222
turbine
turbine2
users
view_data_2010
view_data_2011
word_count
Time taken: 0.058 seconds, Fetched: 84 row(s)
20/08/17 16:24:14 INFO CliDriver: Time taken: 0.058 seconds, Fetched: 84 row(s)
20/08/17 16:24:14 INFO conf.HiveConf: Using the default value passed in for log id: bdd25d87-de63-4162-9597-76f942bfb6bc
20/08/17 16:24:14 INFO session.SessionState: Resetting thread name to  main
hive> 

For more functionalities on how to use Hive see the official Hive Documentation page.

Reference.

Apache Hive TM