In this post, we will discuss the difference between 'where' and 'having' clauses in SQL Server, along with examples.
We will be using the 'sales' table, which I have already created. Here is the SQL script to create and populate it with test data:
The 'TblProducts' table contains product information, while the 'TblProductSale' table contains sales-related data. So, whenever a customer purchases a product, an entry will be made in the sales table.
If you look at the data we have here, notice that we have sold books twice, mobile phones twice, and watches & computer accessories once."
/****** Object: Table [dbo].[TblProducts] Script Date: 12/07/2020 8:00:33 PM ******/ CREATE TABLE [dbo].[TblProducts]( [Id] [int] NOT NULL, [ProductName] [nvarchar](100) NOT NULL, [Quantity] [int] NOT NULL, [Price] [float] NULL, CONSTRAINT [PK_TblProducts] PRIMARY KEY CLUSTERED ( [Id] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO /****** Object: Table [dbo].[TblProductSale] Script Date: 12/07/2020 8:00:34 PM ******/ CREATE TABLE [dbo].[TblProductSale]( [Id] [int] NOT NULL, [ProductId] [int] NULL, [Datetime] [datetime] NULL, CONSTRAINT [PK_TblProductSale] PRIMARY KEY CLUSTERED ( [Id] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO INSERT [dbo].[TblProducts] ([Id], [ProductName], [Quantity], [Price]) VALUES (1, N'Books', 10, 100) GO INSERT [dbo].[TblProducts] ([Id], [ProductName], [Quantity], [Price]) VALUES (2, N'Mobile Phone', 100, 15000) GO INSERT [dbo].[TblProducts] ([Id], [ProductName], [Quantity], [Price]) VALUES (3, N'Watches', 50, 1000) GO INSERT [dbo].[TblProducts] ([Id], [ProductName], [Quantity], [Price]) VALUES (4, N'Cameras ', 30, 10000) GO INSERT [dbo].[TblProducts] ([Id], [ProductName], [Quantity], [Price]) VALUES (5, N'Computer Accessories.', 40, 2000) GO INSERT [dbo].[TblProductSale] ([Id], [ProductId], [Datetime]) VALUES (1, 1, CAST(N'2020-10-16T17:16:57.953' AS DateTime)) GO INSERT [dbo].[TblProductSale] ([Id], [ProductId], [Datetime]) VALUES (2, 2, CAST(N'2020-10-16T17:16:57.953' AS DateTime)) GO INSERT [dbo].[TblProductSale] ([Id], [ProductId], [Datetime]) VALUES (3, 1, CAST(N'2020-10-16T17:32:44.040' AS DateTime)) GO INSERT [dbo].[TblProductSale] ([Id], [ProductId], [Datetime]) VALUES (4, 3, CAST(N'2020-10-16T17:32:44.040' AS DateTime)) GO INSERT [dbo].[TblProductSale] ([Id], [ProductId], [Datetime]) VALUES (5, 5, CAST(N'2020-10-16T17:32:44.040' AS DateTime)) GO INSERT [dbo].[TblProductSale] ([Id], [ProductId], [Datetime]) VALUES (6, 2, CAST(N'2020-10-16T17:32:44.040' AS DateTime)) GO
Now, we want to write a query that will calculate the total sales by product. The result of that query should be as shown in the image below.
SELECT ProductName,SUM(Price) AS TotalSales FROM TblProductSale Inner JOIN TblProducts ON TblProductSale.ProductId = TblProducts.Id GROUP BY ProductName
We want to calculate the total sales by product. So we are using 'GROUP BY' on the product column since we want the total sale amount by using the 'SUM()' aggregate function on the sale amount column.
We can easily achieve this using the 'having' clause. Let's see it in action. I am going to use the 'having' clause, and we want to filter based on the total sales column.
SELECT ProductName,SUM(Price) AS TotalSales FROM TblProductSale Inner JOIN TblProducts ON TblProductSale.ProductId = TblProducts.Id GROUP BY ProductName HAVING SUM(Price) > 200
When we execute this, we should only get those products where the total sales are greater than 2000. Here, we are using the 'having' clause with the 'Sum()' aggregate function to filter the groups.
Now, what do you think will happen if we use the 'where' clause instead of 'having'? When we execute it, we get a syntax error straightaway.
Let's clarify what we mean by this. Now, we want to calculate the total sales of books and mobile phones since we already know the products.
We aim to compute the total sales for these two products, and there are two ways we can achieve this: using either the 'where' or 'having' clause.
Let's see this in action. I'm going to use the 'where' clause here, and we want to compute the total sales for books and mobile phones.
SELECT ProductName,SUM(Price) AS TotalSales FROM TblProductSale Inner JOIN TblProducts ON TblProductSale.ProductId = TblProducts.Id WHERE ProductName in ('Books', 'Mobile Phone') GROUP BY ProductName
So when we execute the above query, we get the total sales for Books and Mobile Phone.
Now we can achieve the same thing using having as well.
SELECT ProductName,SUM(Price) AS TotalSales FROM TblProductSale Inner JOIN TblProducts ON TblProductSale.ProductId = TblProducts.Id GROUP BY ProductName HAVING ProductName in ('Books', 'Mobile Phone')
The first query filters all the rows to retrieve only books and mobile phones, and then performs the aggregate calculation, i.e., sum.
In contrast, with the 'having' clause, it retrieves all products, performs the aggregate calculation on all products, and then filters the groups.
Clearly, we can see a performance impact when using the 'having' clause. From a performance standpoint, 'having' is slower than 'where' and should be avoided when possible.
This is because 'having' only filters the groups, whereas 'where' filters all the rows. It only fetches books and mobile phones and then performs aggregation on those two products. In the case of 'having', it performs aggregation for all products and then filters books and mobile phone groups.
Therefore, 'having' is definitely slower than 'where' in this case and should be avoided.