SQL 4 beginners - UNION

In most online articles about databases you can easily find a lot of articles explaining cool, complex and new features. But what’s about some basics? I decided to release some articles about the basics. The articles will mostly cover topics usable in most relational databases, but the examples and the focus will be on MS SQL Server database engine.

Let’s find out what is a UNION and what’s the difference between the UNION and the UNION ALL operators.

The main difference between UNION and JOIN operator is that union combines results vertically. In simple words, the UNION operator returns results from two or more queries in one result set.

There are two syntax for the UNION operator:

  • UNION - combines the result of two queries into one result set and removes duplicates
  • UNION ALL - combines the result of two queries into one result set but does not remove the duplicates

Let's see how it works on one example.

Example:

CREATE TABLE #SampleTable ( 
	Name VARCHAR(256)
	, Age SMALLINT
);

INSERT INTO #SampleTable (Name, Age) VALUES ('Ivan', 18);
INSERT INTO #SampleTable (Name, Age) VALUES ('Carl', 25);
INSERT INTO #SampleTable (Name, Age) VALUES ('Bill', 33);
INSERT INTO #SampleTable (Name, Age) VALUES ('Jack', 45);
INSERT INTO #SampleTable (Name, Age) VALUES ('Jill', 45);
INSERT INTO #SampleTable (Name, Age) VALUES ('Jack', 18);

SELECT * FROM #SampleTable WHERE Age = 18 UNION     SELECT * FROM #SampleTable WHERE Name = 'Ivan';
SELECT * FROM #SampleTable WHERE Age = 18 UNION ALL SELECT * FROM #SampleTable WHERE Name = 'Ivan';

DROP TABLE #SampleTable;

The results show us that in the first query using UNION there is no duplicate row returned (Ivan, age 18). UNION ALL returned us the duplicate row because the record was returned by both queries.

The truth is that we could accomplish this task using only one query with the OR operator in the WHERE statement. There was no need to use the UNION operator at all.

Example:

SELECT * FROM #SampleTable WHERE Age = 18 OR Name = 'Ivan';

The result is same as using the UNION operator. What is faster depends of many factors, primary based on indexes and query execution plans. Since this are the SQL basics, we will not discuss in this article about details.

The UNION operator is slower than the UNION ALL because the database engine must take care that no duplicates are returned by the consequent queries. If you are sure that the duplicates rows don’t exist in the result set, I advise you to always use the UNION ALL operator.

Let's see the performance difference between both operators returning the same data.

Example:

SELECT * FROM #SampleTable WHERE Age = 18 UNION     SELECT * FROM #SampleTable WHERE Age = 45;
SELECT * FROM #SampleTable WHERE Age = 18 UNION ALL SELECT * FROM #SampleTable WHERE Age = 45;

Look at the numbers. The first query with UNION takes almost 3 times more time than the second one even if they are returning the same result set.

When using UNION or UNION ALL you must be aware of some things:

  • the resulting data set column names will be taken from the first query
  • you even don’t need to specify the column names for the consequent queries
  • the data types of the columns must be same or similar in all queries combined

Examples:

-- Column names will be taken from the first query
SELECT Name, Age FROM #SampleTable WHERE Age = 18
UNION ALL
SELECT Name AS FirstName, Age AS UserAge FROM #SampleTable WHERE Name = 'Ivan';

-- you don’t need to specify the column names for the consequent queries
SELECT 'Ivan' AS Name, 18 AS Age
UNION ALL
SELECT 'Jack', 18 

-- the data types of the columns must be same or similar in all queries combined
SELECT 'Ivan' AS Name, 18 AS Age
UNION ALL
SELECT 18, 'Ivan' 

SQL Server was unable to convert "Ivan" to data type int.

"Conversion failed when converting the {DataType} value '{Value}' to data type {DataType}."

To conclude, the UNION operator is useful if you want to combine results from two or more tables with similar structure. In some cases, the UNION operator is even faster than writing one query with two or more conditions (using OR) in the where statement. If you are sure that you don't have redundant (duplicated) data that can be returned in queries use the UNION ALL operator.

 *You can download the complete SQL Script with all examples from the post here: SQL Script.sql (2.89 kb)