Approximate Percentile Functions

MS SQL 2005 introduced PERCENTILE_CONT (calculates a percentile based on a continuous distribution) and PERCENTILE_DISC (computes a specific percentile for sorted values).

Starting SQL 2022 approximate percentile functions APPROX_PERCENTILE_CONT and APPROX_PERCENTILE_DISC are introduced.

If you’re okay with a performance versus accuracy tradeoff, the new functions are faster and have a smaller memory footprint than the older ones. Another benefit is that these functions are implemented as grouped ordered set functions.

Syntax:

APPROX_PERCENTILE_CONT (numeric_literal) WITHIN GROUP (ORDER BY order_by_expression [ASC|DESC])
APPROX_PERCENTILE_DISC (numeric_literal) WITHIN GROUP (ORDER BY order_by_expression [ASC|DESC])

Let's see them in action and compare them with the old function's.

Example:

SELECT DISTINCT 
	[SalesOrderID]
	, PERCENTILE_CONT (0.5) WITHIN GROUP (ORDER BY [OrderQty]) OVER (PARTITION BY [SalesOrderID]) AS medianscore_cont
	, PERCENTILE_DISC (0.5) WITHIN GROUP (ORDER BY [OrderQty]) OVER (PARTITION BY [SalesOrderID]) AS medianscore_disc
FROM [Sales].[SalesOrderDetail]
ORDER BY SalesOrderID;

SELECT 
	[SalesOrderID]
	, APPROX_PERCENTILE_CONT (0.5) WITHIN GROUP (ORDER BY [OrderQty]) AS medianscore_cont
	, APPROX_PERCENTILE_DISC (0.5) WITHIN GROUP (ORDER BY [OrderQty]) AS medianscore_disc
FROM [Sales].[SalesOrderDetail]
GROUP BY [SalesOrderID]
ORDER BY SalesOrderID;

First, let's see statistics:

1)

Table 'SalesOrderDetail'. Scan count 21, logical reads 1313...
Table 'Worktable'. Scan count 180, logical reads 1105542...
SQL Server Execution Times:
   CPU time = 4703 ms,  elapsed time = 1266 ms.

2)

Table 'SalesOrderDetail'. Scan count 1, logical reads 1248...
SQL Server Execution Times:
   CPU time = 656 ms,  elapsed time = 928 ms.

And, the execution plan:

Let's summarize in a few words.

The difference in statistics is huge. Over a million logical reads were reduced to just over a thousand. CPU time dropped from almost 5 seconds to just 656 ms.

If you are developing on large datasets, just want to get a quick insight into the data, or simply want to embed some preview, you should definitely consider these new functions.

*You can download the complete SQL Script with all examples from the post here: SQL Script.sql (856.00 bytes)