❮❮ PREV NEXT ❯❯

HIVE SELECT GROUP BY Statement

The GROUP BY clause is used to group the resulting data based on a particular key column. Usually it is used with Aggregate functions like count, sum, avg, min, max.

Syntax:

SELECT column1,column2, ...,columnN from <tablename> where <condition> GROUP BY column1,
column2,...,columnN;

column1,column2 are the columns of the table.

Example:

Employee table

emp_id	name	city	salary
101	John	California	50000
102	Tom	Mumbai	40000
103	Harry	Delhi	80000
104	Rahul	Bangalore	60000
105	Sakil	Mumbai	90000
106	Samir	Delhi	70000
107	Diljeet	Mumbai	30000
108	Manohar	Bangalore	20000

GROUP BY with single column

Say we need to get the number of employees belonging to a particular city.

SELECT city, count(city) from Employee GROUP BY city;

city	_c0
California	1
Mumbai	3
Delhi	2
Bangalore	2

In the above example we have grouped all the cities together using GROUP BY, then used the count(city) function to count the number of cities. As a result the city followed by their count is displayed.

GROUP BY with multiple column

This goes a little complex. Say we want to get the names of each employees in the city getting the highest salary.

SELECT name, max(salary) AS max_sal FROM Employee GROUP BY city, salary;

name	max_sal
John	50000
Sakil	90000
Harry	80000
Rahul	60000

In the above example we have grouped the cities at first, then grouped the salary and used a max(salary) function. As a result the maximum salary of the person from an individual city is selected. The statement 'max(salary) AS max_sal' gives an understandable name to max(salary), i.e. max_sal.

❮❮ PREV NEXT ❯❯