dans.blog: Dynamically generating a table of dates in MSSQL 2005

[UPDATED: Thursday, January 08, 2009 at 11:04:25 AM]

UPDATE:
I've revised this concept, so please see the new version: Dynamically generating a table of dates in Microsoft SQL (Part Deux!)

I'm working on some code where I need to show some trend date over various periods of time. One of the issues with working with SQL and aggregate values is that it'll only show you values where there was some data, so if you want to print the number of orders on each day and some days had no orders, they wouldn't show up in the results.

For what I was working on, I needed to ensure that even missing days would show up in my query results. This got me thinking about the best way to accomplish this task. After thinking about the problem for a while, I decided the ideal method would be if I could dynamically generate a table of dates that I could then use in a LEFT OUTER JOIN.

What I came up with is a UDF for MSSQL 2005 that will generate a resultset of dates based upon a start and end dates, an interval and the datepart to implement (see the generateDateTable UDF source code at the bottom of this article.) So, now I can use the following to generate a table containing all the days from 12-1-2008 to 12-31-2008:

select
    *
from
    dbo.generateDateTable('12-01-2008', '12-31-2008', 'day', 1)
/*
    Since the UDF uses a CTE to generate the data, if you're trying to output
    a resultset that produces more than 100 rows, you'll want to use the OPTION
    directive to set the maximum recursion number to use. This is only required
    if your output would generate more than 100 records.
*/
option (maxrecursion 32767)

This code would generate a table that looked like:

Date resultset for 12-1-2008 through 12-31-2008

The nice thing about this UDF is that you can specify any datepart you want and any interval/step to use. So, if you wanted to create for hours in the day, you could do:

dbo.generateDateTable('12-01-2008 8:00:00', '12-01-2008 17:00:00', 'hour', 1)

This would generate a resultset like:

Date resultset for 12-1-2008 8:00am through 12-1-2008 5:00pm

The syntax of the UDF is:

dbo.generateDateTable(start_date, end_date, datepart, step)

The datepart is any valid MSSQL datepart as a string (i.e. d, dd, day, etc.) The step is the incremental value to use and will stop once the date reaches the end_date.

Now, if I want to create an aggregate report that shows the number of orders on each day of the month, I can do:

select
    -- use the isnull() to make sure counts for dates with no orders is zero
    dates.date as OrderDate, isnull(orders.OrderCount, 0) as OrderCount
from
    -- generate a table from 12-1-2008 to 12-31-2008
    dbo.generateDateTable('12-01-2008', '12-31-2008', 'day', 1) as dates
        left outer join
    (
        select
            -- convert the OrderDate to just the date (with the time being 00:00:00)
            cast(floor(cast(OrderDate as float)) as datetime) as OrderDate
            , count(TicketId) as OrderCount
        from
            Orders
        where
            -- grab orders for the range we're showing
            OrderDate between '12-01-2008' and '12-31-2008'
        group by
            -- group by the order date
            cast(floor(cast(OrderDate as float)) as datetime)
    ) as orders
        on
    dates.date = orders.OrderDate
order by
    dates.date

This will generate a resultset which would look like this:

Order resultset

Finally, I've included the code for the UDF below. Just copy and paste the code and run it in your favorite query tool for MSSQL 2005. The UDF isn't really that complicated—the majority of the code comes from the massive CASE statement used to convert the datepart string into something SQL Server 2005 can actually use (since it uses a hardcoded constant, instead of a string that can be supplied dynamically.)

generateDateTable UDF

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- =============================================
-- Author:    Dan G. Switzer, II
-- Create date: Jan 7, 2009
-- Description:  Creates a table with a set of date ranges
-- =============================================
CREATE function [dbo].[generateDateTable]
(
  -- Add the parameters for the function here
  @start_date datetime
  , @end_date datetime
  , @datepart varchar(20) = 'day'
  , @step int = 1
)
returns table
as
return (
  with getdates (date)
    as (

  select
    @start_date as date

  union all

  select
    case
      when @datepart in ('year', 'yy', 'yyyy') then dateadd(yy, @step, date)
      when @datepart in ('quarter', 'qq', 'q') then dateadd(qq, @step, date)
      when @datepart in ('month', 'mm', 'm') then dateadd(mm, @step, date)
      when @datepart in ('dayofyear', 'dy', 'y') then dateadd(dy, @step, date)
      when @datepart in ('day', 'dd', 'd') then dateadd(dd, @step, date)
      when @datepart in ('week', 'wk', 'ww') then dateadd(ww, @step, date)
      when @datepart in ('hour', 'hh') then dateadd(hh, @step, date)
      when @datepart in ('minute', 'mi', 'n') then dateadd(n, @step, date)
      when @datepart in ('second', 'ss', 's') then dateadd(s, @step, date)
      when @datepart in ('millisecond', 'ms') then dateadd(ms, @step, date)
      else dateadd(dd, @step, date)
    end as date
  from
    getdates
  where
    (case
      when @datepart in ('year', 'yy', 'yyyy') then dateadd(yy, @step, date)
      when @datepart in ('quarter', 'qq', 'q') then dateadd(qq, @step, date)
      when @datepart in ('month', 'mm', 'm') then dateadd(mm, @step, date)
      when @datepart in ('dayofyear', 'dy', 'y') then dateadd(dy, @step, date)
      when @datepart in ('day', 'dd', 'd') then dateadd(dd, @step, date)
      when @datepart in ('week', 'wk', 'ww') then dateadd(ww, @step, date)
      when @datepart in ('hour', 'hh') then dateadd(hh, @step, date)
      when @datepart in ('minute', 'mi', 'n') then dateadd(n, @step, date)
      when @datepart in ('second', 'ss', 's') then dateadd(s, @step, date)
      when @datepart in ('millisecond', 'ms') then dateadd(ms, @step, date)
      else dateadd(dd, @step, date)
    end) <= @end_date
  )

  select
    date
  from
    getdates
)

Categories: SQL, HTML/ColdFusion

6 Comments

Daniel D
Jan 7, 2009 @ 6:06 PM

Good idea. Though if this is used often I would recomend using a phisical table with all days you would ever care about and a hour table that you join to also when you need hours. This would save the cpu cost of generating it every time you need it.

With a phisical table you can have additional columns like is holiday. Financial year, days till christmas ect.
Days tables can be very usefull tools. Numbers tables can be very usefull also. Do a search on days table and numbers tables and you will find some good stuff. Would put links but I am on my blackberry right now
Also 'sql for smarties' has a section that covers using days table and some uses.
Rick O
Jan 7, 2009 @ 6:11 PM

An interesting approach. As a thought experiment as to eliminating the recursive factor, how about this?

1. Figure out the final row count using the DateDiff and the DatePart. Call it @Parts. This is a constant-time operation.
2. Calculate the base-2 log of @Parts. Call it @Bits. This is also a constant-time operation.
3. Create a temp table #Digits ( i int ) and insert two rows (0) and (1).
4. Select #Digits into #Digits2.
4. Join #Digits with #Digits2 and have (i = (#Digits.i * 2) + #Digits2.i). Store the result back into #Digits.
5. Repeat Step #4 (@Bits - 1) times, which will give you a table of the numbers (0 .. 2**@Bits).
6. Return the result from a query of #Digits where (i between 0 and @Parts), but instead of returning column i, return date = dateadd(@datepart, i, @startdate).

Voila! A non-recursive version.

If you want to decrease the number of loops over Step #4, make the seed size for #Digits 10 rows instead of just 2, then use the base-10 log.
Dan G. Switzer, II
Jan 8, 2009 @ 9:56 AM

@Daniel D:

I did some looking into creating a physical table, but the pain of maintaining a physical table doesn't seem to provide enough of a performance boost to warrant doing it.

Also, the stuff I'm doing is highly dynamic, so I might need to grab "Orders" in 30 minute intervals throughout a specific day. A physical table would just be too much of a pain to maintain, especially since there is very little performance difference. In this dynamic method you're really creating rows over a few dozen, so there's very little overhead.
Dan G. Switzer, II
Jan 8, 2009 @ 10:26 AM

@Rick:

Before I read your post last night, I was doing some more thinking and realized it would be really nice to also include an "end_date" column (which is the end of the time range.) This will make it easier to things such as show all the orders in 30 minute intervals through a specific day.

Also, to get rid of the recursion issue, I just made the UDF a multi-statement function and I just do a simple loop--which I think should be less CPU intensive than your recommendation and require less cycles.

I'm going to write another blog entry with my new revised function.
Rick O
Jan 8, 2009 @ 10:43 AM

I suspect that yours would win for small n, but wouldn't scale when n is in the hundreds or thousands. But if I get some time today I'll code up a version of my solution and see how they compare.

But, I have to say that I agree with Danial D - creating a pivot/numbers/dates table takes a few minutes and then never has to be worried about again. (At least, not for a few years.) A table with every day between 1990 and 2020 only has 11000 rows, and thus wouldn't even take up 100KB, and yet would solve most date problems you'd have. Toss in denormalized columns for things like the day of the week, etc, and it's even more useful:

http://rickosborne.org/blog/index.php/2006/01/24/d...

-R
Dan G. Switzer, II
Jan 8, 2009 @ 11:14 AM

@Rick:

Take a look at the new article I posted today--I'm interested in your feedback. The problem is I need a variety of ways to break things up--days is just one way. I also need to get break downs by hours (or possible blocks of 30 minutes, etc.) Generally speaking I'm only dealing with resultsets that less than a few dozen rows.

I did some testing with a physical table, but the overhead of the indexes, etc didn't seem to offer any real performance difference between creating these small recordsets dynamically or calling them from a physical table.

If I needed to do this over 1,000 or rows, that's when there's a noticeable difference, but for small dynamic ranges this seems to work very well.

Comments for this entry have been disabled.