Yesterday I wrote a blog entry talking about dynamically generating a table of dates in MSSQL 2005. My original solution involved common table expressions (CTE) and involved recursion. While there's nothing wrong with using recursion this way, I mainly was using recursion to simplify the UDF code into a single SQL expression.
However, the more I started thinking about the problem last night, I decided I really needed to revise things a bit further. I decided I really needed to have the query return both a start and end date for each row in the recordset—as this would make it much easier to things such as find all the orders placed during each hour of the day.
So, this morning I came in and re-wrote my query and optimized a few things. The first thing I did was remove the recursion, so this UDF should now work in SQL Server 2000 as well as SQL Server 2005. It's important to note that you could do the same thing with a physical table, but for my uses my date range table generally would only include at most a few dozen rows and there's not enough of a performance difference to warrant the pain of maintaining a table of every date and possible combination you may want to return aggregate results over.
The new code looks like this:
SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO -- ============================================= -- Author: Dan G. Switzer, II -- Create date: Jan 8, 2009 -- Description: Creates a table with a set of date ranges -- ============================================= create function generateDateTable ( -- Add the parameters for the function here @start_date datetime , @end_date datetime , @datepart varchar(20) = 'day' , @step int = 1 ) returns @dates table ( -- Add the column definitions for the TABLE variable here start_date datetime, end_date datetime ) as begin -- if we're doing calculations based on the days (not time) then strip the time out if( @datepart in ('year', 'yy', 'yyyy', 'quarter', 'qq', 'q', 'month', 'mm', 'm', 'dayofyear', 'dy', 'y', 'day', 'dd', 'd', 'week', 'wk', 'ww') ) begin set @start_date = cast(floor(cast(@start_date as float)) as datetime) set @end_date = cast(floor(cast(@end_date as float)) as datetime) end declare @new_start datetime while @start_date <= @end_date begin -- get the new starting row set @new_start = (case when @datepart in ('year', 'yy', 'yyyy') then dateadd(yy, @step, @start_date) when @datepart in ('quarter', 'qq', 'q') then dateadd(qq, @step, @start_date) when @datepart in ('month', 'mm', 'm') then dateadd(mm, @step, @start_date) when @datepart in ('dayofyear', 'dy', 'y') then dateadd(dy, @step, @start_date) when @datepart in ('day', 'dd', 'd') then dateadd(dd, @step, @start_date) when @datepart in ('week', 'wk', 'ww') then dateadd(ww, @step, @start_date) when @datepart in ('hour', 'hh') then dateadd(hh, @step, @start_date) when @datepart in ('minute', 'mi', 'n') then dateadd(n, @step, @start_date) when @datepart in ('second', 'ss', 's') then dateadd(s, @step, @start_date) when @datepart in ('millisecond', 'ms') then dateadd(ms, @step, @start_date) else dateadd(dd, @step, @start_date) end) -- insert a new row insert @dates ( start_date , end_date ) values ( @start_date -- since MSSQL is only accurate to 3.33ms, we need to subtract 3 ms to get the upper range in time , dateadd(ms, -3, @new_start) ) -- update the starting row set @start_date = @new_start end return end GO
The key changes are the query now returns both a start_date and end_date for each row, there's no no recursion and for dateparts that don't involve time intervals, the UDF will strip out the time automatically. This makes it very easy to use with the getdate() or getutcdate() functions in SQL Server.
Now I can easily generate a table of date ranges for the past 30 days:
select * from dbo.generateDateTable(dateadd(d, -30, getutcdate()), getutcdate(), 'day', 1)
This will generate a table that looks like this:
Or, I can get a resultset broken down by hours in the day (from 8am until 5pm:)
select * from dbo.generateDateTable('01-01-2009 8:00:00', '01-01-2009 17:00:00', 'hour', 1)
This would look like this:
The introduction of the start_date and end_date columns also changes the syntax of the queries a bit. To take use of the start_date and end_date columns, you'll want to join rows based on whether the time falls between the start_date and end_date columns.
So, if we wanted to find out the total number of orders placed in each our of our business day (from 8am until 6pm) we'd use a query like this:
select /* we need to add all the order counts together, since there will be multiple rows for each match found within our start_date and end_date range */ d.start_date as OrderDate, sum(isnull(o.OrderCount, 0)) as OrderCount from -- create a table with hour intervals between 8am and 5pm dbo.generateDateTable('1-6-09 8:00:00', '1-6-09 17:00:00', 'hour', 1) as d left outer join -- generate an aggregate count of orders within our timeframe ( select OrderDate, count(OrderId) as OrderCount from Order group by OrderDate ) as o on -- join each match based if they fit within our timeframe o.OrderDate between d.start_date and d.end_date -- we need to by our start_date for our aggregate count group by d.start_date order by d.start_date
This would produce a resultset that might look like this:
Here's what our recordset would look like for orders placed on Jan 6, 2009:
As you can see, our aggregate query is showing the correct results for hours broken down by hour for this day.
If you have any other tips or suggestions, I'd love to hear them.
This approach might not be the best approach for every problem, but it'll work well for when you have date/time ranges that are very dynamic and are limited to a pretty small resultset (under a few hundred rows.) This works perfectly for creating records sets of the last few years broken into quarters, or the past year broken down by month, etc.