Learnerslesson
   JAVA   
  SPRING  
  SPRINGBOOT  
 HIBERNATE 
  HADOOP  
   HIVE   
   ALGORITHMS   
   PYTHON   
   GO   
   KOTLIN   
   C#   
   RUBY   
   C++   




HIVE Data Types

Whenever we define a variable in HIVE, we have to tell HIVE what kind of value is it going to store inside it. i.e. the value inside the variable can be a word or a sentence or a number.

Data Types are the way to tell a variable, what type of value is it going to store.

Below are the classification of data types in HIVE :



PRIMITIVE DATA TYPES IN HIVE
  1. Numeric Data Types :

The data type which holds numbers.

Tinyint : 1 byte, range -128 to 128

Smallint : 2 bytes, range -2^15 to 2^15-1

Int : 4 bytes, range -2^31 to 2^31-1

Bigint : 8 bytes, range -2^63 to 2^63-1

Float : 4 bytes

Double : 8 bytes

Decimal : 17 Bytes precision upto 38 digits.

- dec(10,2) :- It says a decimal value of 10 digits followed by 2 digits after the decimal point.

  1. String :
String : Can contain any variable length sentence.

Char : Contains fixed length string

Varchar : Can contain variable length string.
  1. Timestamp :
Integers : It uses Unix timestamp in nanoseconds.

Floats : Uses the same Unix timestamp in seconds with decimal precision.

String : Should be of the exact format : "YYYY-MM-DD HH:MM:SS.ffffffff"
  1. Boolean :
Can only be 'true' or 'false'.
  1. Binary :
It contains BLOB or a large data.

COMPLEX DATA TYPES

The complex data types supported in HIVE are Array, Map and Struct. These complex data types can combine primitive data types and provide a collection of data.

Say for example a primitive data type (ex. string) can hold a name but a complex data type is capable of holding a group of strings. i.e. it can hold a group of names.


Array

An Array in HIVE is used to hold a collection of primitive data types.


Example :

Say we are going to create a table which contains a group of students belonging to a particular city. Below is the example :


CREATE TABLE student_location (
city string,
names array <string>
);

In the above example we have created a table 'student_location' containing two variables 'city' of string type and 'names' of array type. But if you notice the array variable has a string variable attached to it (i.e. array<string> ). Which means the 'names' variable will contain not just contain one name but a group of names.

So, the above table will contain a city and a group of students belonging to that city.

After the data is inserted if we run the select statement, the below output is produced.

select * from student_location;


city names
Hydrabad ["Vinay","Ajay","Darshil"]
Delhi ["Sahil","Peter"]

So, as we can see above each row contains a 'city' and a group of 'names' associated to that city.

Note : An array can only hold data of a single data type. i.e. An array can hold either a group of 'string' or a group of 'int'. A mix of 'string' and 'int' is not possible.

Map

A Map in HIVE is used to hold a collection of data in the form of key and value pair. Where the 'key' should be of a primitive data type and the 'value' can be either a primitive or a complex data type.

Think of the key and value pair as a locker in a bank. There are so many lockers and each person owning the locker is given a key. Same concept applies for a map in HIVE. While inserting the value in a table, you insert both the key and value (same way while opening a locker you are given a key and the locker). And while retrieving using 'select' statement you just specify the key and the value will be retrieved for you.

Let's look at the below example to get a clear picture.


Example :

Now we are going to change the 'student_location' table in the 'array' example and redefine such that the 'names' column along with the name also contains the roll number of the students.


CREATE TABLE student_location (
city string,
names map <int,string>
)
row format delimited fields terminated by ','
collection items terminated by '*'
map keys terminated by ':';

In the above table we have 'names map <int,string> '. If you see closely, the 'map <int,string> ' has two variables associated to it. Where 'int' is the key (Which is the roll number of a student) and the 'string' is the value (Which is the name of the student). We will be explaining in a little while what the last three line means.

Now, we will be taking the data from a file(say csv file) and load the data to the 'student_location' table.


  1. student.csv

Hydrabad,1:Vinay*3:Ajay*5:Darshil

Delhi,2:Sahil*4:Peter

As we can see in the above student.csv file, every field is separated by comma(,). Where 'Hydrabad' is the first field and '1:Vinay*3:Ajay*5:Darshil' is the second field which is actually a map. The map actually contains three fields

1:Vinay

3:Ajay

5:Darshil

each separated by an asterix(*).
Where '1' is the key and 'Vinay' is the value. And the key and value is separated by colon(:).


Now match with the last three lines of the 'CREATE TABLE' statement. It's self explanatory.


How the data is loaded into the table from the file?

The 'load data' command is used to load data from the csv file.

LOAD DATA LOCAL INPATH 'student.csv' INTO TABLE student_location;

And all the data is loaded into the 'student_location' table.


How the data is retrieved from the table?

SELECT * FROM student_location;


city names
Hydrabad {1:"Vinay",3:"Ajay",5:"Darshil"}
Delhi {2:"Sahil",4:"Peter"}

As we can see the 'names' column returns the map containing key and value pair.

We can also specify the 'key' in the 'select statement' to retrieve it's value.

SELECT city, names['3'] FROM student_location;


city c1
Hydrabad Ajay

In the 'select statement' we have specified 'names['3']' where '3' is the key which has the value 'Ajay' associated to it.

So we have seen, how can we retrieve the value along with the key.


Struct Data Type

A struct data type in HIVE is used to hold one or more data types inside it. It is like a capsule holding various elements inside it.


Example :

Now we are going to change the 'student_location' completely where we will be having two fields 'name' i.e. name of the student and 'address' which will have three fields i.e. city, state, pin_code.


CREATE TABLE student_location (
name string,
address struct <city:string,state:string,pin_code >
)
row format delimited fields terminated by ','
collection items terminated by '*';

As we can see above the 'address' field is of 'struct <city:string,state:string,pin_code string>

Now, we will be taking the data from a file(say csv file) and load the data to the 'student_location' table.


  1. student.csv

Vinay,Hydrabad*Telengana*500013

Sahil,Delhi*Delhi*110005

As we can see in the above student.csv file, every field is separated by comma(,). Where 'Vinay' is the first field and 'Hydrabad*Telengana*500013' is the second field which is actually a struct data type. The struct actually contains three variables:

Hydrabad

Telengana

500013

each separated by an asterix(*).
Where each one is a city, state and pin_code clubbed inside the struct data type just like the medicine put inside a capsule.

Now match it with the last two lines of the 'CREATE' statement. It says each element in the row will be separated by a comma(,) and each element of the struct field will be separated by an asterisk(*).


How the data is loaded into the table from the file?

The 'load data' command is used to load data from the csv file.

LOAD DATA LOCAL INPATH 'student.csv' INTO TABLE student_location;

And all the data is loaded into the 'student_location' table.


How the data is retrieved from the table?

SELECT * FROM student_location;


name address
Vinay {"city":"Hydrabad","state":"Telengana","pin_code":"500013"}
Sahil {"city":"Delhi","state":"Delhi","pin_code":"110005"}

As we can see above the address column returns the contents of struct data type in the form of variable name along with it's values.

We can also specify the variable name of the struct data type (ex. city, state, pin_code) in the 'select statement' to retrieve it's value.

SELECT name, address.city FROM student_location;


name city
Vinay Hydrabad
Sahil Delhi

In the 'select statement' we have specified 'address.city' where city is the variable of the struct datatype. It can be accessed by a dot(.).