What is SELF JOIN ? With Realtime Example


In this session, we'll discuss self join, which involves joining a table with itself. We'll also explore the classification of self join and its purpose.

Many people perceive self join as a distinct type of join, but in reality, it can be categorized as inner, outer, or cross join based on how it's implemented.

Before proceeding further, I highly recommend reading the SQL join tutorial. In previous posts, we've covered how to join different tables.

What is Self Join?

If we examine the table provided, it consists of three columns: EmpId, EmployeeName, and Manager_Id.

Now, suppose I task you with writing a query to produce an output like this: I want the names of the employees along with the names of their respective managers.

TblEmployeeManager

If you require the SQL script to create these two tables, you can find it at the end of the post.

Now, examining the table on the left-hand side, if we consider Michael, his manager's ID is 5. Taking this 5 and looking it up in the employee ID column, we find that it corresponds to Grayson.

This implies that Michael's manager is Grayson. Similarly, for Jackson, his manager's ID is 6, and looking up this 6 in the employee ID column reveals Gabriel as his manager.

So in this scenario, we are utilizing the Manager_Id column to look up values in the Employee ID column within the same table, essentially referencing the same table.

If we desire an output consisting of employee names and their respective manager's names, we need to join this employee table with itself.

Now, you might be wondering how we can achieve this. If you examine the query below, you'll see that we've accomplished exactly that.

How to get the employees with their managers from the same table?

Select Emp.EmployeeName as Employee, Mng.EmployeeName as Manager
from TblEmployeeManager Emp
Left Join TblEmployeeManager Mng
On Emp.Manager_Id = Mng.EmpId

Self join to get employee manager name

So if you examine the output, we are interested in both the employee name and the manager's name. Therefore, we include these in the column list within the SELECT clause.

Query Explanation:

If you examine the FROM clause, we're referencing the table TblEmployeeManager twice. We provide an alias 'Emp' for the first instance and 'Mng' for the second instance. Then, we perform a left join between these two instances of the table.

By doing this, we essentially create two copies of the same table, one representing employees and the other representing managers. Both instances have identical structures.

Next, in the ON clause, we specify the condition for the join. We match the Manager_Id column from the employee table (represented by 'Emp') with the Emp_Id column from the manager's table (represented by 'Mng').

Finally, in the SELECT clause, we retrieve the employee name from the 'Emp' table using the alias 'Employee', and the manager's name from the 'Mng' table using the alias 'Manager'.

This approach allows us to retrieve the desired output, where each employee's name is paired with their corresponding manager's name.

This type of join, where a table is joined with itself, is known as a self-join. When we execute this query, we obtain the expected output.

Get List all employee’s names and their managers by manager name using an inner join

As we know, an inner join retrieves only the matching rows between the two tables. When we used a left join previously, we also obtained Anthony's record, resulting in 9 rows in total.

Self join to get employee manager name

but when I change it to an inner join.

Select Emp.EmployeeName as Employee, Mng.EmployeeName as Manager
from TblEmployeeManager Emp
Inner Join TblEmployeeManager Mng
On Emp.Manager_Id = Mng.EmpId

Now, when I execute this query, observe what happens. I only retrieve 8 rows instead of 9. What happened to Anthony's record? It's no longer retrieved. Why? Because if you examine Anthony's record, the managerId does not match with any of the employee Ids.

self Inner Join

There is no matching record. And as you may recall, the inner join only retrieves matching records between the two tables.

Now, you might be wondering, where are the two tables here? But remember, in the join, we are joining this table with itself. So we are treating a single table as if it were two tables. You can refer to this as an inner self join.

Why is it called a self join? Because you are joining the same table with itself, and you are performing an inner join—hence, inner self join.

Similarly, you can perform a right outer self join and a full outer self join. So a self-join is not an entirely different type of join. It can be classified under any type of join: inner, outer, left, right, and even cross join.

Cross join in SQL example employee manager

Select Emp.EmployeeName as Employee, Mng.EmployeeName as Manager
from TblEmployeeManager Emp
Cross Join TblEmployeeManager Mng

So if you want to convert this into a cross join, the first thing you have to do is get rid of the ON clause, and then convert the inner join into a cross join. That's all there is to it. Since we are joining the same table with itself, we call it a self join. But since we are using a cross join, it's called a Cross Self Join.

Now, remember, in the TblEmployeeManager, we have 9 rows. And if you recall from the definition of a cross join, when you do a cross join between two tables, you get the Cartesian product, i.e., 9*9=81 rows, as shown in the image.

self cross join

Let’s say we want to write a SQL query to display all employees under each manager. Let’s say we need to get all employees starting from ManagerId=7 i.e from the super manager.

SELGJOIN

;WITH QueryResult (Manager_Id, EmpId, LEVEL)
AS 
(
    SELECT DISTINCT Manager_Id, EmpId AS EmployeeId,  0 AS LEVEL
    FROM TblEmployeeManager  

    UNION ALL

    SELECT tbl.Manager_Id, tbl.EmpId, LEVEL + 1 AS LEVEL
    FROM TblEmployeeManager tbl
      INNER JOIN QueryResult T 
        ON tbl.Manager_Id = T.EmpId
    WHERE tbl.Manager_Id <> tbl.EmpId 
)  
SELECT DISTINCT EmpId, Manager_Id, LEVEL FROM QueryResult ORDER BY Manager_Id

Sql Table Script

--Self join Table
CREATE TABLE [dbo].[TblEmployeeManager](
    [EmpId] [int] IDENTITY(1,1) NOT NULL,
    [EmployeeName] [nvarchar](max) NULL,
    [Manager_Id] [int] NULL,
 CONSTRAINT [PK_TblEmployeeManager] PRIMARY KEY CLUSTERED 
(
    [EmpId] ASC
)
) 
GO
SET IDENTITY_INSERT [dbo].[TblEmployeeManager] ON 
GO
INSERT [dbo].[TblEmployeeManager] ([EmpId], [EmployeeName], [Manager_Id]) VALUES (1, 'Michael', 5)
GO
INSERT [dbo].[TblEmployeeManager] ([EmpId], [EmployeeName], [Manager_Id]) VALUES (2, 'Jackson', 6)
GO
INSERT [dbo].[TblEmployeeManager] ([EmpId], [EmployeeName], [Manager_Id]) VALUES (3, 'Jacob', 6)
GO
INSERT [dbo].[TblEmployeeManager] ([EmpId], [EmployeeName], [Manager_Id]) VALUES (4, 'Luke', 5)
GO
INSERT [dbo].[TblEmployeeManager] ([EmpId], [EmployeeName], [Manager_Id]) VALUES (5, 'Grayson', 8)
GO
INSERT [dbo].[TblEmployeeManager] ([EmpId], [EmployeeName], [Manager_Id]) VALUES (6, 'Gabriel', 9)
GO
INSERT [dbo].[TblEmployeeManager] ([EmpId], [EmployeeName], [Manager_Id]) VALUES (7, 'Anthony', NULL)
GO
INSERT [dbo].[TblEmployeeManager] ([EmpId], [EmployeeName], [Manager_Id]) VALUES (8, 'Lincoln', 7)
GO
INSERT [dbo].[TblEmployeeManager] ([EmpId], [EmployeeName], [Manager_Id]) VALUES (9, 'Ryan', 7)
GO