Sql Server - Using Ranking functions to perform paging

Performing database side paging has always been a headache in Sql Server (as opposed to MySql which provides a built-in LIMIT clause for easy paging). However, I recently found an easy, fast and intuitive way of paging the data on the database side in Sql Server 2005 and later.

Sql Server 2005 introduced 4 Ranking functions which basically number rows in a result-set starting from one (these functions optionally partition the result-set into sub-groups and assign sequential numbers to each sub-group independently, but we should not use partitioning for paging the data correctly). These functions differ in how they assign the numbers to rows.

One of these functions, ROW_NUMBER sorts the data and assigns sequentially increasing numbers to each row. We can easily leverage this function to perform paging on the database side and return only the required rows to the client (without your application needing to fetch all rows from the database and then perform paging on the application side, thus increasing performance.

Here’s an example of how you can use this function to perform paging in T-Sql:

DECLARE @pageIndex int;
DECLARE @pageSize int;

SELECT @pageIndex = 5,
		@pageSize = 20;

DECLARE @start int, @end int;

SELECT @start = (@pageIndex - 1) * @pageSize + 1,
		@end = @start + @pageSize - 1;


WITH cte AS
(
	SELECT ROW_NUMBER() OVER (ORDER BY ReservationID) AS RowNumber, *
	FROM Reservations
)
SELECT *
FROM cte
WHERE RowNumber BETWEEN @start AND @end

The above sql exploits a Common Table Expression and the ROW_NUMBER function to perform server-side paging in Sql Server. And voila, you have an easy way to delegate paging to the database. I have tested the performance of the ROW_NUMBER function over queries returning millions of rows, and haven’t noticed any performance degradation (in some cases, the queries with ROW_NUMBER even performed better than without it).

So far, so good, you would say, but I also need the total number of rows returned to be able to display the number of result pages in the UI.

The first attempt to do:

SELECT @@ROWCOUNT

fails, because it does not return the actual number of rows in the original query (that is inside the Common Table Expression (CTE)) above, but the number of rows in the paged result, which would always be less than or equal to @pageSize.

Further, you can only run a single query against the CTE which means you cannot also do a COUNT(*) from the cte as an independent SELECT statement.

Now, there are multiple ways to extract the total number of rows in the original query all having merits and demerits.

First, you can add a COUNT(*) column to the cte itself, something like the following:

WITH cte AS
(
	SELECT ROW_NUMBER() OVER (ORDER BY ReservationID) AS RowNumber,
		COUNT(*) OVER() as TotalRows,
		r.*
	FROM Reservations r
)
SELECT *
FROM cte
WHERE RowNumber BETWEEN @start AND @end

However, this can impact the performance of the query because of nested aggregation.

Next, you can COUNT the number of rows in the SELECT statement that uses the CTE as follows:

WITH cte AS
(
	SELECT ROW_NUMBER() OVER (ORDER BY ReservationID) AS RowNumber,
		r.*
	FROM Reservations r
)
SELECT *, (SELECT COUNT(*) FROM cte) AS TotalRows
FROM cte
WHERE RowNumber BETWEEN @start AND @end

However the above query has a very subtle problem. This query might not provide consistent results, depending upon the isolation level of the query.
The problem is that TotalRows can give you a different value than the number of rows returned by CTE while doing the SELECT *. This can happen due to data modifications between the duration of execution of the query in the CTE and the sub-query.
Remember, a CTE is not a virtual table of some sort against which you run an Outer query. SQL Server provides only syntax level substitution of CTE references – meaning each CTE reference in a query is expanded to the query expression that defines the CTE. This means that the CTE above is effectively executed twice and the number of rows can differ between the 2 executions if data changed in the base tables between the 2 executions of the CTE.

The performance above can also be an issue depending upon the complexity of the CTE expression (again remember, the CTE is effectively executed twice).

Thirdly, you can CROSS JOIN 2 instances of the CTE like this:

WITH cte AS
(
	SELECT ROW_NUMBER() OVER (ORDER BY ReservationID) AS RowNumber,
		r.*
	FROM Reservations r
),
cte_total AS (SELECT COUNT(*) As TotalRows FROM cte)
SELECT c.*, t.TotalRows
FROM cte c CROSS JOIN cte_total t
WHERE RowNumber BETWEEN @start AND @end

This query again uses a nested reference of the CTE and therefore suffers the same consistency problem with multiple executions of the CTE as with the second approach above. However, performance-wise, this should perform better than the second approach.

Lastly, you can use a UNION ALL clause to save some network bandwidth on an extra INTcolumn in each row (which would matter only if your page size is large).

WITH cte AS
(
	SELECT ROW_NUMBER() OVER (ORDER BY ReservationID) AS RowNumber,
		r.*
	FROM Reservations r
)
SELECT COUNT(*) AS TotalRows, NULL, NULL, NULL -- Add more NULLS to match the number of columns in the second SELECT below.
FROM cte
UNION ALL
SELECT c.*
FROM cte c
WHERE RowNumber BETWEEN @start AND @end

Performance wise, this is equivalent to the third approach, but removes an extra INT column from each row. You would need to take care that TotalRows occurs in the position of an INT/BIGINT column present in the second SELECT, and would also need to add appropriate number of NULLs to match number of columns in the second SELECT. In the application also, you would need to remember that the first row contains number of rows in the resultset, and is not a result row.

As is always the case in database queries, depending upon your performance requirements and whether you can tolerate occasional variation in the total number of rows reported, you can choose between the first and the third approach above.

However, if you find performance being impacted in a severe way by the first approach and need exact results also, then you might have to revert to the old ways of using a temporary table or doing an application-side paging.

Sql Server – Using Ranking functions to perform paging

Submit a Comment Cancel reply

Recent Posts

Recent Comments

Archives

Meta

Categories

Chit Chat

Projects