MS SQL GENERATE_SERIES

In this post we will talk about MS SQL GENERATE_SERIES, table-valued function introduced in MS SQL version 2022 for generating a series of numbers within a given interval. The interval and the step between series values are defined by the user. We will see also how we did it before and test its performance compared to the old way.

Syntax:

GENERATE_SERIES (start, stop, [step])

  • Start – the first value in the interval. start is specified as a variable, a literal, or a scalar expression of type tinyint, smallint, int, bigint, decimal, or numeric
  • Stop – the last value in the interval. stop is specified as a variable, a literal, or a scalar expression of type tinyint, smallint, int, bigint, decimal, or numeric. The series stops once the last generated step value exceeds the stop value. The data type for stop must match the data type for start.a single character of any character type (nvarchar(1), varchar(1), nchar(1) or char(1)) that is used as separator for concatenated strings
  • Step - Indicates the number of values to increment or decrement between steps in the series. step is an expression of type tinyint, smallint, int, bigint, decimal, or numeric. step can be either negative or positive, but can't be zero (0). This argument is optional. The default value for step is 1 if start is less than stop, otherwise, the default value is -1 if start is greater than stop.If start is less than stop and a negative value is specified for step, or if start is greater than stop and a positive value is specified for step, an empty result set is returned.

The function returns a single-column table containing a sequence of values in which each differs from the preceding by step. The name of the column is [value]. The output is the same type as start and stop.

Old approaches

Let's first look at the ways we could generate numbers in a sequence before this function was introduced.

In this example, we want to generate numbers between 1,000 and 1,000,000.

Recursive CTE

Example:

-- From 1 to 100
;WITH cte(n) AS 
(
  SELECT 1 UNION ALL 
  SELECT n + 1 FROM cte n WHERE n < 100
)
SELECT value = n FROM cte;

-- From 1 000 to 1 000 000
;WITH cte(n) AS 
(
  SELECT 1000 UNION ALL 
  SELECT n + 1 FROM cte n WHERE n <= 1000000
)
SELECT value = n FROM cte;

In the first example, we got what we expected. But the second example failed because the recursive CTE is limited to a maximum of 100 recursions.

"Msg 530, Level 16, State 1, Line 22
The statement terminated. The maximum recursion 100 has been exhausted before statement completion."

Let's try something else.

Retrieving numbers from some system tables

Example:

;WITH Cte AS (SELECT ROW_NUMBER() OVER (ORDER BY C.name) Rn FROM sys.columns C, sys.objects)
SELECT C.Rn FROM Cte C WHERE C.Rn BETWEEN 1000 AND 1000000

So this worked out, but let's look at some statistics.

Table 'sysschobjs'. Scan count 1, logical reads 116720...
Table 'sysobjvalues'. Scan count 1516, logical reads 3238...
Table 'Worktable'. Scan count 0, logical reads 0...
Table 'syscolpars'. Scan count 1, logical reads 22...

SQL Server Execution Times: CPU time = 172 ms

We have a significant number of logical reads, we needed 172 ms of CPU time, and we need to find and have rights to system tables in which we have a sufficient number of rows for the desired interval.

Custom function

Example:

CREATE OR ALTER FUNCTION [dbo].[NumberRange]
(	
	@start BIGINT
	, @end BIGINT
)
RETURNS TABLE 
AS
RETURN 
(
	WITH CTE(n) AS(
		SELECT 1 AS Number UNION ALL SELECT 1
	),
	CTE2(n) AS (SELECT 1 AS Number  FROM CTE x, CTE y),
	CTE3(n) AS (SELECT 1 AS Number  FROM CTE2 x, CTE2 y),
	CTE4(n) AS (SELECT 1 AS Number  FROM CTE3 x, CTE3 y),
	CTE5(n) AS (SELECT 1 AS Number  FROM CTE4 x, CTE4 y),
	CTE6(n) AS (SELECT 0 AS Number  UNION ALL 
				SELECT TOP (@end-@start)
				ROW_NUMBER() OVER (ORDER BY (SELECT NULL))  AS Number
				FROM CTE5 x, CTE5 y)
	SELECT @start+n  AS Number
	FROM CTE6
	WHERE @start+n <= @end
)

SELECT Number FROM  [dbo].[NumberRange] (1000, 1000000) ORDER BY Number;

Of course this example also works. Let's focus on performance.

SQL Server Execution Times: CPU time = 263 ms

We spent 263 ms. Of course, sometimes we don't have the right to create an object/function on the environment we're working on, so that could be a problem with this approach.

Table with numbers

Example:

CREATE TABLE [dbo].Numbers (
Number INT NOT NULL, 
CONSTRAINT [PK_Number] PRIMARY KEY CLUSTERED (
	[Number] ASC
));
-- Fill table with numbers
SELECT Number FROM  [dbo].[Numbers] WHERE NUmber BETWEEN 1000 AND 1000000;
Table 'Numbers'. Scan count 1, logical reads 1615...
SQL Server Execution Times: CPU time = 15 ms

We used 15 ms which is the best so far. Also, we had some logical reads. And as before, the question is whether we have the right to make a table with numbers on the target enviroment.

GENERATE_SERIES

And finally, let's use the new GENERATE_SERIES function and see how it works.

Example:

SELECT value FROM GENERATE_SERIES(1000, 1000000, 1);
SQL Server Execution Times: CPU time = 62 ms

We spent 62 ms, which is not as fast as in the previous example, but we did not need any special rights to create additional objects.
Things get a little different in other examples. Let's start by retrieving only even numbers in the range from 1,000 to 1,000,000.

Example:

SELECT Number FROM  [dbo].[Numbers] WHERE Number BETWEEN 1000 AND 1000000 AND Number%2 = 0;
SELECT value FROM GENERATE_SERIES(1000, 1000000, 2);
-- Table Numbers
Table 'Numbers'. Scan count 1, logical reads 1615...
SQL Server Execution Times: CPU time = 93 ms
-- GENERATE_SERIES
SQL Server Execution Times: CPU time = 32 ms


We can see that GENERATE_SERIES behaves much better than the previous solution.
Let's see how the execution plans differ depending on whether we want numbers in descending or ascending order.

Example:

SELECT value FROM GENERATE_SERIES(1, 100, 1) ORDER BY value ASC;
SELECT value FROM GENERATE_SERIES(1, 100, 1) ORDER BY value DESC;
SELECT value FROM GENERATE_SERIES(100, 1, -1) ORDER BY value DESC;

As you can see, the sort operator is used only in the case when we wanted to display numbers in descending order and we retrieved them with a positive step. When step was negative, SQL assumed that the output of the function would already be in descending order and did not use the expensive sort operator.

And finally, who says a step can't be a decimal number?

Example:

DECLARE @start decimal(3,1) = 0.0, @stop decimal(3,1) = 10.0, @step decimal(3,1) = 0.1;
SELECT value FROM GENERATE_SERIES(@start, @stop, @step);

*You can download the complete SQL Script with all examples from the post here: SQL Script.sql (2.63 kb)