Hierarchical data in MS SQL

The usual solution how to handle hierarchy in a database was to use two columns. The first column contains the ID of the hierarchical member (in much cases also the primary, unique ID), and the other, the ID of its parent hierarchical member.

From SQL Server 2008 we can use the HierarchyID data type that is a data type written in .NET and exposed in SQL Server. HierarchyID offers us some performance benefits and simplified code. There are also available many methods (functions) that can be used for example to determine the ancestors and descendants of a hierarchical member, as well as determine other useful information, such as the level of a specific hierarchical member within the hierarchy. You can read about this methods here. Performance benefits and tips & tricks you can find also here.

Hierarchical data are widely used and there are many real-world scenarios where such data can be used. We can use the example of the family tree or an organizational chart (employee reports to another employee), etc.

For this post, we will create and use a test table with the following structure:

In the example table, the ID column would be the primary key of the table, and the ReportToID column would be a foreign key, which references the ID column, creating a hierarchy or in terms of database relation a self-join. In this example is visible that "Senior Tool Designer" (ReportToID = 3) directly reports to "Engineering Manager" (ID = 3). The root node (or the top of the hierarchy) is the "Chief Executive Officer" because he does not report to anyone (value for ReportToID is NULL).

Let’s create a test table and fill some test values.

CREATE TABLE [dbo].[Hierarchy_Std](
	[ID] [int] NOT NULL,
	[ReportToID] [int] NULL,
	[Role] [varchar](256) NOT NULL,
	[BirthDate] [date] NOT NULL,
	CONSTRAINT [PK_Hierarchy_Std] PRIMARY KEY CLUSTERED ([ID] ASC)
) ON [PRIMARY];
GO

INSERT INTO [dbo].[Hierarchy_Std] ([ID], [ReportToID], [Role], [BirthDate]) 
VALUES 
	(1, NULL, 'Chief Executive Officer', '1969-01-29')
	, (2, 1, 'Vice President of Engineering', '1971-08-01')
	, (3, 2, 'Engineering Manager', '1974-11-12')
	, (4, 3, 'Senior Tool Designer', '1974-12-23')
	...
	, (25, 21, 'Production Technician 3', '1983-01-07');
GO

SELECT * FROM [dbo].[Hierarchy_Std];
GO

Now, imagine we have two tasks. First, we need to find the level of the hierarchy for every employee. And secondly, we want to get the average age for all employees in the “Production department”.

The best way to solve both tasks is to use the recursive common table expression (CTE). A CTE is a temporary result set defined in the context of a SELECT, CREATE VIEW, UPDATE, INSERT or DELETE statements and can only be referenced within that query itself. CTE is like a subquery but with the difference that can be used multiple times, can be used in other CTE’s or even can reference itself and in that case, it become a recursive CTE. More about CTE’s I will talk in my SQL 4 beginners blog posts.

Now let's see how to solve our problems:

-- Get levels of hierarchy
;WITH CTE ([ID], [ReportToID], [Role], [BirthDate], [Level])
AS (
	SELECT
		A.[ID]
		, A.[ReportToID]
		, A.[Role]
		, A.[BirthDate]
		, 0 AS [Level]
	FROM
		[dbo].[Hierarchy_Std] A
	WHERE
		[ReportToID] IS NULL
	UNION ALL
	SELECT
		A.[ID]
		, A.[ReportToID]
		, A.[Role]
		, A.[BirthDate]
		, CTE.Level + 1 AS [Level]
	FROM
		[dbo].[Hierarchy_Std] A
		INNER JOIN CTE ON A.ReportToID = CTE.ID	
)
SELECT * FROM CTE;
GO

-- Get avarage age for all employees in "Production department"
;WITH CTE ([ID], [ReportToID], [Role], [BirthDate])
AS (
	SELECT
		A.[ID]
		, A.[ReportToID]
		, A.[Role]
		, A.[BirthDate]
	FROM
		[dbo].[Hierarchy_Std] A
	WHERE
		ID = 19
	UNION ALL
	SELECT
		A.[ID]
		, A.[ReportToID]
		, A.[Role]
		, A.[BirthDate]
	FROM
		[dbo].[Hierarchy_Std] A
		INNER JOIN CTE ON A.ReportToID = CTE.ID	
)
SELECT 
	AVG(DATEDIFF(YEAR, C.[BirthDate], GETDATE())) AS AvarageAge 
FROM 
	CTE C;
GO

We accomplished both tasks and notice how many code was used. It’s not a simple task because we must recursively find all node’s sub or parent nodes, pass the whole table and get the desired results.

Now, we will create a similar table but adjusted for the HierarchyID data type will look like this:

In this table you will notice that the hierarchy is represented by the format "/[Node]/[Child Node]/[Child Child Node]/...", in which a row containing the value "/" is the root of the hierarchy. In this example is visible that "Senior Tool Designer" (Hierarchy = "/1/1/1/") directly reports to "Engineering Manager" (Hierarchy = "/1/1/"). The root node (or the top of the hierarchy) is the "Chief Executive Officer" with the value of "/".

Let’s create a test table and fill some test values.

CREATE TABLE [dbo].[Hierarchy_HID](
	[ID] [int] NOT NULL,
	[Hierarchy] [hierarchyid] NULL,
	[Role] [varchar](256) NOT NULL,
	[BirthDate] [date] NOT NULL,
	CONSTRAINT [PK_Hierarchy_HID] PRIMARY KEY CLUSTERED ([ID] ASC)
) ON [PRIMARY]
GO

INSERT INTO [dbo].[Hierarchy_HID] ([ID], [Hierarchy], [Role], [BirthDate]) 
VALUES 
	(1, '/', 'Chief Executive Officer', '1969-01-29')
	, (2, '/1/', 'Vice President of Engineering', '1971-08-01')
	, (3, '/1/1/', 'Engineering Manager', '1974-11-12')
	, (4, '/1/1/1/', 'Senior Tool Designer', '1974-12-23')
	...
	, (25, '/5/1/1/4/', 'Production Technician 3', '1983-01-07')
GO

SELECT * FROM [dbo].[Hierarchy_HID];
GO

Notice that we have inserted human-readable strings into the Hierarchy column, but SQL Engine converted them to hexadecimal values and store them like that.

Now, let’s solve the same tasks as before. Now we will use the new table with HierarchyID data type.

-- Get levels of hierarchy
SELECT 
	A.[ID]
	, A.[Hierarchy]
	, A.[Role]
	, A.[BirthDate]
	, A.Hierarchy.GetLevel() AS [Level]
FROM
	[dbo].[Hierarchy_HID] A;
GO

-- Get avarage age for all employees in "Production department"
SELECT 
	AVG(DATEDIFF(YEAR, C.[BirthDate], GETDATE())) AS AvarageAge 
FROM 
	[dbo].[Hierarchy_HID] C
WHERE
	C.Hierarchy.IsDescendantOf(0x8C) = 1;
GO

The result is the same as the previous, but the code needed to accomplish the tasks are significantly smaller and more readable.

That’s enough for our intro about Hierarchical data support in MS SQL Server. If you like to read more about HierarchyID data type you can find more details about the supported methods (functions) here and also about some performance benefits and tips & tricks here.

*You can download the complete SQL Script with all examples from the post here: SQL Script.sql (5.54 kb)