MS SQL COMPRESS AND DECOMPRESS

MS SQL COMPRESS and DECOMPRESS are introduced into MS SQL version 2016. In this post we will read the overview of both functions, limitations and benefits. We will also test the performance compared to ROW and PAGE compression methods.

MS SQL COMPRESS

As you know MS SQL already supports ROW and PAGE compression of data and COMPRESS function isn’t the replacement for them.

Syntax:

COMPRESS (expression)

  • Expression - A nvarchar(n), nvarchar(max), varchar(n), varchar(max), varbinary(n), varbinary(max), char(n), nchar(n), or binary(n) expression.

The function returns the data type of varbinary(max) that represents the compressed content of input.

The function uses GZIP algorithm for data compression.

Compressed data cannot be indexed! It is recommended to use it on rarely used data (maybe some log data or XML) because the data is not simple readable or usable if first not decompressed.

On short text values the compressed data could use more space than the original data. This is a normal behavior like the file compression of really small files.

Let’s see one example of usage.

Example:

DECLARE @Input NVARCHAR(MAX) = N'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam mollis maximus quam, quis malesuada felis sollicitudin eget. Nunc feugiat nisi et elit blandit, eget vulputate quam faucibus. Nullam vitae commodo nisi. Cras consequat sapien et urna malesuada rhoncus. Sed feugiat ornare ultricies. Nulla neque velit, tristique pretium erat ut, fermentum consequat nulla. Fusce pellentesque ornare lacus, tempor molestie libero tincidunt nec. Pellentesque ac purus mattis, semper sapien id, rhoncus elit. Morbi sagittis sapien sit amet condimentum mollis. Maecenas in mollis eros.'
SELECT COMPRESS(@Input) AS Compressed

Let’s see how efficient this compress operation was

Exampe:

SELECT
	DATALENGTH(@Input) AS "Original size"
	, DATALENGTH(COMPRESS(@Input)) AS "Compressed size"

It went well. The compression rate is approximately 60%.

Now we will compress a relatively small string value and see what happens.

Example:

DECLARE @Input NVARCHAR(MAX) = N'Damir like Gin and tonic!'
SELECT
	DATALENGTH(@Input) AS "Original size"
	, DATALENGTH(COMPRESS(@Input)) AS "Compressed size"

As I announced, short text value used more space compressed than the original one.

Now, we will compare COMPRESS with ROW and PAGE compression.

For that purpose we will use a slightly modified [Sales].[OrderLines] from the sample database WideWorldImporters. We will create 2 new tables with only 2 columns ([OrderLineID] and [Description]). [Description] column will be modified from nvarchar(100) to nvarchar(256) and it will contain concatenated descriptions from 2 consecutive rows from the same table. Let’s start!

Example:

USE [WideWorldImporters];

-- Test table
CREATE TABLE [Sales].[OrderLines_Copy](
	[OrderLineID] [int] NOT NULL,
	[Description] [nvarchar](256) NOT NULL
 CONSTRAINT [PK_Sales_OrderLines_Copy] PRIMARY KEY CLUSTERED 
(
	[OrderLineID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [USERDATA]
) ON [USERDATA]
GO

-- Test table COMPRESS
CREATE TABLE [Sales].[OrderLines_Compress](
	[OrderLineID] [int] NOT NULL,
	[Description] [varbinary](max) NOT NULL
 CONSTRAINT [PK_Sales_OrderLines_Compress] PRIMARY KEY CLUSTERED 
(
	[OrderLineID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [USERDATA]
) ON [USERDATA]
GO


-- Insert test records
INSERT INTO [Sales].[OrderLines_Copy]([OrderLineID], [Description])
SELECT O1.[OrderLineID], FORMATMESSAGE('%s %s', O1.[Description], O2.[Description]) FROM [Sales].[OrderLines] O1 LEFT JOIN [Sales].[OrderLines] O2 ON O1.OrderLineID = O2.OrderLineID + 1;

INSERT INTO [Sales].[OrderLines_Compress]([OrderLineID], [Description])
SELECT O1.[OrderLineID], COMPRESS(FORMATMESSAGE('%s %s', O1.[Description], O2.[Description])) FROM [Sales].[OrderLines] O1 LEFT JOIN [Sales].[OrderLines] O2 ON O1.OrderLineID = O2.OrderLineID + 1;


-- NO Compression 
ALTER TABLE [Sales].[OrderLines_Copy] REBUILD PARTITION = ALL
WITH 
(DATA_COMPRESSION = NONE
)

EXEC sp_spaceused N'[Sales].[OrderLines_Copy]'; 

-- ROW Compression 
ALTER TABLE [Sales].[OrderLines_Copy] REBUILD PARTITION = ALL
WITH 
(DATA_COMPRESSION = ROW
)

EXEC sp_spaceused N'[Sales].[OrderLines_Copy]'; 

-- PAGE Compression 
ALTER TABLE [Sales].[OrderLines_Copy] REBUILD PARTITION = ALL
WITH 
(DATA_COMPRESSION = PAGE
)

EXEC sp_spaceused N'[Sales].[OrderLines_Copy]'; 

-- COMPRESS 
EXEC sp_spaceused N'[Sales].[OrderLines_Compress]';

Results:

  • NO compression -> 43 200 KB
  • ROW compression -> data 22 600 KB (47,69% compression ratio)
  • PAGE compression -> data 22 248 KB (48,50% compression ratio)
  • COMPRESS -> data 32 656 KB (24,41% compression ratio)

As we can see in results, the COMPRESS is not actually bad at all, but ROW and PAGE compressions are better performing.

Next, we will test how COMPRESS will handle some XML data. We will create modified tables based on [Sales].[OrderLines]. Both tables will contain only 2 columns ([OrderLineID] and [Details]). The first table will be [Sales].[OrderLines_XML] and in [Details] column will contain XML serialized data from the original table per row. The second one ([Sales].[OrderLines_XML_Compress]) will contain compressed XML data.

Example:

USE [WideWorldImporters];

DROP TABLE IF EXISTS [Sales].[OrderLines_XML]
DROP TABLE IF EXISTS [Sales].[OrderLines_XML_Compress]

SELECT
A.[OrderLineID]
, CAST((SELECT D.* FROM [Sales].[OrderLines] D WHERE D.OrderLineID = A.OrderLineID FOR XML AUTO, ELEMENTS) AS XML) AS Details
INTO [Sales].[OrderLines_XML]
FROM
[Sales].[OrderLines] A

SELECT
A.[OrderLineID]
, COMPRESS(CAST((SELECT D.* FROM [Sales].[OrderLines] D WHERE D.OrderLineID = A.OrderLineID FOR XML AUTO, ELEMENTS) AS NVARCHAR(MAX))) AS Details
INTO [Sales].[OrderLines_XML_Compress]
FROM
[Sales].[OrderLines] A

-- NO Compression 
ALTER TABLE [Sales].[OrderLines_XML] REBUILD PARTITION = ALL
WITH 
(DATA_COMPRESSION = NONE
)

EXEC sp_spaceused N'[Sales].[OrderLines_XML]'; 

-- ROW Compression 
ALTER TABLE [Sales].[OrderLines_XML] REBUILD PARTITION = ALL
WITH 
(DATA_COMPRESSION = ROW
)

EXEC sp_spaceused N'[Sales].[OrderLines_XML]'; 

-- PAGE Compression 
ALTER TABLE [Sales].[OrderLines_XML] REBUILD PARTITION = ALL
WITH 
(DATA_COMPRESSION = PAGE
)

EXEC sp_spaceused N'[Sales].[OrderLines_XML]'; 

-- COMPRESS 
EXEC sp_spaceused N'[Sales].[OrderLines_XML_Compress]'; 

Results:

  • NO compression -> 156 488 KB
  • ROW compression -> data 155 752 KB (0,47% compression ratio)
  • PAGE compression -> data 155 720 KB (0,49% compression ratio)
  • COMPRESS -> data 79 776 KB (49,02% compression ratio)

As you can see, there are certainly cases where COMPRESS could be very useful (XML, LOG, etc.).

You could also use it for storing compressed data delivered from external applications. You must just implement the GZIP algorithm.

MS SQL DECOMPRESS

Now, when we know all about the COMPRESS function, let’s talk a little more about the DECOMPRESS function. Decompress should be the exact opposite of the compress mechanism. Or not?

Syntax:

DECOMPRESS (expression) 

  • Expression - Is a varbinary(n), varbinary(max), or binary(n)

The function returns the data type of varbinary(max) type.

Let’s start with an example and try to decompress the same compressed value from the previous example.

Example:

DECLARE @Input NVARCHAR(MAX) = N'Damir like Gin and tonic!'
SELECT
	DECOMPRESS(COMPRESS(@Input)) AS "Decompressed value"

In case you missed it, the return data type is of varbinary(max)! If you’d like to get the original value a you must cast it in the original data type.

Example:

DECLARE @Input NVARCHAR(MAX) = N'Damir like Gin and tonic!'
SELECT
	CAST(DECOMPRESS(COMPRESS(@Input)) AS nvarchar(max)) AS "Decompressed value"

This is much better. So, you must cast the data in the same data type before compression, otherwise you will not be able to read the result (if you don’t read so good the varbinary format 😊).

To finish, let’s see 2 quick examples of the chaos when changing start and finish data types. Hard to explain but so much fun.

Example:

-- Example with change nvarchar to varchar
DECLARE @Input NVARCHAR(MAX) = N'Damir like Gin and tonic!'
SELECT
	CAST(DECOMPRESS(COMPRESS(@Input)) AS varchar(max)) AS "Decompressed value"

Result to grid:

Result to text:

-- Example with change from varchar to nvarchar
DECLARE @Input VARCHAR(MAX) = N'Damir like Gin and tonic!'
SELECT
	CAST(DECOMPRESS(COMPRESS(@Input)) AS nvarchar(max)) AS "Decompressed value"

*MS SQL Server sample database "WideWorldImporters" can be downloaded at: https://github.com/Microsoft/sql-server-samples/releases/tag/wide-world-importers-v1.0

*You can download the complete SQL Script with all examples from the post here: SQL Script.sql (6.00 kb)