How to use Group By in SQL Server query?


In this post, we will talk about how to group rows using the GROUP BY clause, filtering groups, and the difference between the WHERE and HAVING clauses.

What is group by clause?

Now, let's delve into the definition of the GROUP BY clause. The GROUP BY clause is utilized to group a selected set of rows into a set of summary rows based on the values of one or more columns or expressions. It is always employed in conjunction with one or more aggregate functions.

There are various aggregate functions available in SQL Server, such as SUM(), AVERAGE(), COUNT(), MIN(), MAX(), etc.

Let's understand them through an example. We have the TblOrganizationEmployee table, which includes columns like EmployeeName, EmployeeSalary, Country, Address, and DepartmentId. Additionally, there's the Department table containing columns for Id and DepartmentName.

The relationship between TblOrganizationEmployee and Department is established using the foreign key DepartmentId. If you require the SQL script for these tables, it can be found at the end of the post.

what is the use of group by in sql server

If you examine the TblOrganizationEmployee table, you'll notice the EmployeeSalary column.

Now, suppose I want to calculate the total salary being paid within my organization. Naturally, if you were to do this manually, you would take each salary value and add them together. However, in SQL Server, we have an aggregate function called SUM(), which we can utilize. Here's an example query:

select SUM(EmployeeSalary) from [dbo].[TblOrganizationEmployee]

that gives us the total salary.

sql aggregate functions

Similarly, if you want to say let’s say, for example, I want to find out total employees in my organization.

select count(*) from [dbo].[TblOrganizationEmployee]

I want the total salary that I’m paying to my employees by country, how do we achieve this? Using group by.
sql aggregate functions group by

If you were to do it manually, you would group the records by country, then sum the salaries within each group. Essentially, you'd take each set of records for Australia, for example, and add up their salaries. The same process would apply for records from India, and so on.

This is precisely why we use the GROUP BY clause. Let's delve into how to use it.

Select Country, SUM(EmployeeSalary) as TotalSalary
from TblOrganizationEmployee
Group by Country

Query Explantion

If you examine the query we've written, we're utilizing the SUM() aggregate function to calculate the total salary paid to our employees.

In our SELECT list, we require both the country and the total salary. Since there isn't a predefined column name for the total salary, we assign one using 'AS', naming it 'totalsalary'.

Now, let's inspect the query: SELECT country, the sum of salary FROM TblOrganizationEmployee.

What we're essentially specifying is that we want the country and the total salary, but we also need to indicate that we want to group the results by country. This is accomplished with the 'GROUP BY' clause. So, we GROUP BY country and then calculate the total salaries for each country. When we execute the query, we should observe the same output.

Let’s see what’s going to happen if I remove this group by clause and then execute this query.

Select Country, SUM(EmployeeSalary) as TotalSalary
from TblOrganizationEmployee

This query will produce an error because you're selecting the country column and the sum of salaries without specifying how to group them by country and then total the salaries. If I exclude the country column, it would make sense because I'd be asking for the total salaries across the entire table. However, including the country column requires specifying that the results should be grouped by country. Therefore, if I execute this query as it is, I will receive an error.

“Msg 8120, Level 16, State 1, Line 4
Column ‘TblOrganizationEmployee.Country’ is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.”

Now let’s remove the sum aggregate function in the select list and execute the query.

Select Country, EmployeeSalary as TotalSalary
from TblOrganizationEmployee
Group by Country

“Msg 8120, Level 16, State 1, Line 8you will get below error
Column ‘TblOrganizationEmployee.EmployeeSalary’ is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.”

The columns you include in the select list must adhere to one of two conditions: either they should have an aggregate function applied to them, or they should be included in the group by clause. If neither of these conditions is met, you will encounter the error mentioned above.

Group by based on multiple columns

We want to further group the data by DepartmentId within each country. For instance, in Australia, we aim to ascertain the total salary for employees in each department: Marketing, Research and Development, Accounting, and Finance. Hence, we are employing multiple column groupings, initially by country and subsequently by DepartmentId.

sql sum group by multiple columns

It’s straightforward. In the select list, we should include three columns: country, TotalSalary, and departmentId. Additionally, the departmentId column should be specified in the group by clause. This instructs the SQL server engine to first aggregate the salaries by country and then further group them by departmentId, providing the desired breakdown in that sequence.

Select Country, DepartmentId, SUM(EmployeeSalary) as TotalSalary
from TblOrganizationEmployee
group by Country, DepartmentId

When we execute this query, you will see the output. However, if you observe our output, it's not sorted. Therefore, we used the ORDER BY clause to arrange the result.

Select Country, DepartmentId, SUM(EmployeeSalary) as TotalSalary
from TblOrganizationEmployee
group by Country, DepartmentId
order by TotalSalary

Now we have seen we can group based on multiple columns.

How to use multiple aggregate functions?

If it's possible to use multiple aggregate functions, let's understand with an example. We are showing the total salary by country and DepartmentId. Let's say I want the total number of employees as well.

Now, if you remember, if you want to find out the total number of employees we can use COUNT().

If you look at the TblOrganizationEmployee table, we have 11 rows there.

And if I want to find the total number of employees within my organization, I can say COUNT(*) from TblOrganizationEmployee and we get the total count, which is 11.

For performance reasons, instead of using *, you can specify a specific column name, such as EmployeeName, to get the same result.

select count(EmployeeName) from [dbo].[TblOrganizationEmployee]

So what do we want? We not only want total salaries by country and DepartmentId, but we also want the total number of employees. So, if you want the total number of employees, we use the COUNT() aggregate function.
sql count group by multiple columns

Select Country, DepartmentId, SUM(EmployeeSalary) as TotalSalary,COUNT(EmployeeName) as TotalEmployees
from TblOrganizationEmployee
group by Country, DepartmentId
order by TotalSalary

All right, so we have seen how to use multiple aggregate functions as well.

How to filter the groups?

But let’s say I want to see the Marketing (i.e., DepartmentId=1) department salaries. Now, usually to filter the rows, we can use the WHERE clause, where you can say DepartmentId is equal to 1.
sql group by multiple columns having count

Select Country, DepartmentId, SUM(EmployeeSalary) as TotalSalary,COUNT(EmployeeName) as TotalEmployees
from TblOrganizationEmployee
where DepartmentId=1
group by Country, DepartmentId
order by TotalSalary

The other way to get the same result is instead of using the where clause, we can use the having clause, but the having clause should come after the group by.

Select Country, DepartmentId, SUM(EmployeeSalary) as TotalSalary,COUNT(EmployeeName) as TotalEmployees
from TblOrganizationEmployee
group by Country, DepartmentId
having DepartmentId=1
order by TotalSalary

Difference between having and where clause

Now let’s see what the difference between these two queries is. When you use the WHERE clause, only DepartmentId=1 records are retrieved, and then they are grouped.

But when you use the HAVING clause, all the rows from the table TblOrganizationEmployee are retrieved, they are grouped by Country, DepartmentId, and then only DepartmentId=1 groups are shown.

The WHERE clause filters rows before aggregations, before groupings are performed, whereas the HAVING clause filters groups after the aggregations are performed.

Another difference is that you can use the HAVING clause only with the SELECT statement, whereas the WHERE clause can be used with other statements like INSERT and UPDATE as well.

Aggregate functions cannot be used in the WHERE clause. What do we mean by aggregate functions cannot be used in the WHERE clause? Let’s understand with an example.

select * from [dbo].[TblOrganizationEmployee] where count(id)>5

Excute above query you will get the error
“Msg 147, Level 15, State 1, Line 11
An aggregate may not appear in the WHERE clause unless it is in a subquery contained in a HAVING clause or a select list, and the column being aggregated is an outer reference.”

You cannot use aggregate functions with the WHERE clause, whereas aggregate functions can be used in the HAVING clause.

Select Country, DepartmentId, SUM(EmployeeSalary) as TotalSalary,COUNT(EmployeeName) as TotalEmployees
from TblOrganizationEmployee
group by Country, DepartmentId
having COUNT(EmployeeName)=1
order by TotalSalary

result

Table Script

/****** Object:  Table [dbo].[Department]    Script Date: 09/22/2020 6:06:10 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Department](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [Department] [nvarchar](max) NULL,
 CONSTRAINT [PK_Department] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)
)
GO
/****** Object:  Table [dbo].[TblOrganizationInfo]    Script Date: 09/22/2020 6:06:11 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[TblOrganizationInfo](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [Location] [nvarchar](50) NULL,
    [TravelAgent] [nvarchar](50) NULL,
    [BookingAmount] [float] NULL,
 CONSTRAINT [PK_TblOrganizationInfo] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)
)
GO
SET IDENTITY_INSERT [dbo].[Department] ON 
GO
INSERT [dbo].[Department] ([Id], [Department]) VALUES (1, 'Marketing')
GO
INSERT [dbo].[Department] ([Id], [Department]) VALUES (2, 'Research and Development')
GO
INSERT [dbo].[Department] ([Id], [Department]) VALUES (3, 'Accounting and Finance')
GO
SET IDENTITY_INSERT [dbo].[Department] OFF
GO
SET IDENTITY_INSERT [dbo].[TblOrganizationInfo] ON 
GO
INSERT [dbo].[TblOrganizationInfo] ([Id], [Location], [TravelAgent], [BookingAmount]) VALUES (1, 'France', 'Michael', 2000)
GO
INSERT [dbo].[TblOrganizationInfo] ([Id], [Location], [TravelAgent], [BookingAmount]) VALUES (2, 'USA', 'Jackson', 2200)
GO
INSERT [dbo].[TblOrganizationInfo] ([Id], [Location], [TravelAgent], [BookingAmount]) VALUES (3, 'UK', 'Jacob', 3000)
GO
INSERT [dbo].[TblOrganizationInfo] ([Id], [Location], [TravelAgent], [BookingAmount]) VALUES (4, 'India', 'Luke', 2500)
GO
INSERT [dbo].[TblOrganizationInfo] ([Id], [Location], [TravelAgent], [BookingAmount]) VALUES (5, 'USA', 'Michael', 1500)
GO
INSERT [dbo].[TblOrganizationInfo] ([Id], [Location], [TravelAgent], [BookingAmount]) VALUES (6, 'France', 'Jacob', 1000)
GO
INSERT [dbo].[TblOrganizationInfo] ([Id], [Location], [TravelAgent], [BookingAmount]) VALUES (7, 'India', 'Luke', 1200)
GO
INSERT [dbo].[TblOrganizationInfo] ([Id], [Location], [TravelAgent], [BookingAmount]) VALUES (8, 'USA', 'Jacob', 1700)
GO
INSERT [dbo].[TblOrganizationInfo] ([Id], [Location], [TravelAgent], [BookingAmount]) VALUES (9, 'UK', 'Michael', 1900)
GO
INSERT [dbo].[TblOrganizationInfo] ([Id], [Location], [TravelAgent], [BookingAmount]) VALUES (10, 'France', 'Jacob', 2100)
GO
INSERT [dbo].[TblOrganizationInfo] ([Id], [Location], [TravelAgent], [BookingAmount]) VALUES (11, 'USA', 'Luke', 2500)
GO
INSERT [dbo].[TblOrganizationInfo] ([Id], [Location], [TravelAgent], [BookingAmount]) VALUES (12, 'India', 'Michael', 3000)
GO
SET IDENTITY_INSERT [dbo].[TblOrganizationInfo] OFF
GO