Learnerslesson
   JAVA   
  SPRING  
  SPRINGBOOT  
 HIBERNATE 
  HADOOP  
   HIVE   
   ALGORITHMS   
   PYTHON   
   GO   
   KOTLIN   
   C#   
   RUBY   
   C++   




HIVE SELECT GROUP BY Statement

The GROUP BY clause is used to group the resulting data based on a particular key column. Usually it is used with Aggregate functions like count, sum, avg, min, max.


Syntax:


SELECT column1,column2, ...,columnN from <tablename> where <condition> GROUP BY column1,
column2,...,columnN;

column1,column2 are the columns of the table.



Example:

Employee table

emp_id name city salary
101 John California 50000
102 Tom Mumbai 40000
103 Harry Delhi 80000
104 Rahul Bangalore 60000
105 Sakil Mumbai 90000
106 Samir Delhi 70000
107 Diljeet Mumbai 30000
108 Manohar Bangalore 20000


GROUP BY with single column

Say we need to get the number of employees belonging to a particular city.


SELECT city, count(city) from Employee GROUP BY city;


city _c0
California 1
Mumbai 3
Delhi 2
Bangalore 2

In the above example we have grouped all the cities together using GROUP BY, then used the count(city) function to count the number of cities. As a result the city followed by their count is displayed.


GROUP BY with multiple column

This goes a little complex. Say we want to get the names of each employees in the city getting the highest salary.


SELECT name, max(salary) AS max_sal FROM Employee GROUP BY city, salary;

name max_sal
John 50000
Sakil 90000
Harry 80000
Rahul 60000

In the above example we have grouped the cities at first, then grouped the salary and used a max(salary) function. As a result the maximum salary of the person from an individual city is selected. The statement 'max(salary) AS max_sal' gives an understandable name to max(salary), i.e. max_sal.