Understanding "derived" tables in SQL...

Posted by Dan on Nov 10, 2006 @ 4:09 PM

I was talking to a buddy this morning and he'd never seen the syntax for a derived table in MSSQL before. He couldn't quite grasp why they might be useful and I was having difficulty explaining via instant messenger, so I thought I'd blog a real world example of how derived tables can be useful.

In very basic terms, a derived table is a virtual table that's calculated on the fly from a select statement. They can be tremendously useful in certain situations. For my example below, I'm going to use the Northwind database that included with MSSQL 2000.

Let's say you've been asked to generate a report that shows off the total number of orders each customer placed in 1996. "Not a problem." you think to yourself. "This is just an easy aggregated join query." So, you sit down and generate your query and come up with:

select
    Customers.CustomerID, Customers.CompanyName,
    count(Orders.OrderID) as TotalOrders
from
    Customers
        left outer join
    Orders
        on
    Customers.CustomerID = Orders.CustomerID
where
    year(Orders.OrderDate) = 1996
group by
    Customers.CustomerID, Customers.CompanyName

Looks good, eh? But there's something missing. Customers that didn't place an order in 1996 aren't showing up. You could leave off customers that didn't place an order, but in many cases the customers that didn't buy place orders are the ones that are of the most interest.

If you think you can include those customers with a "is null" check, you'd be wrong. Run the following code and see if you notice what is wrong:

select
    Customers.CustomerID, Customers.CompanyName,
    count(Orders.OrderID) as TotalOrders
from
    Customers
        left outer join
    Orders
        on
    Customers.CustomerID = Orders.CustomerID
where
    (
        year(Orders.OrderDate) = 1996
    or
        Orders.OrderDate is null
    )
group by
    Customers.CustomerID, Customers.CompanyName

At first glance this may look right, but if you examine the results carefully you'll see that the customers with the zero count have never placed an order. If a customer has placed an order, but just not in the year 1996 they won't show up. This is because the "is null" check finds customers that have never placed an order—it still doesn't do anything to add customers who've placed an order, but just not in 1996.

This is where a derived table can come in handy. Instead of using the "Orders" table, we'll reduce the Orders table to a snapshot of itself—including only data from the year 1996.

select
    Customers.CustomerID, Customers.CompanyName,
    count(dOrders.OrderID) as TotalOrders
from
    Customers
        left outer join
    /* start our derived table */
    (
        select
            *
        from
            Orders
        where
            year(Orders.OrderDate) = 1996
    ) as dOrders
    /* end our derived table */
        on
    Customers.CustomerID = dOrders.CustomerID
group by
    Customers.CustomerID, Customers.CompanyName

Run the above query and examine the results. You should now see a row returned for each Customer and the total number or orders placed in the year 1996—including the customers that didn't place an order.

The reason this works is because the LEFT JOIN will include all matches or null values. If the LEFT JOIN has matches (like in our first query,) but non that match the criteria those rows are excluded. In our derived table, since we're only running against the orders from the year 1996 we're ensuring that we return results for each customer.

Hopefully this helps explain why derived tables can be useful.

Categories: SQL, HTML/ColdFusion

10 Comments

  • I know you were just trying to show an example, but ...

    Why not move the year into the join and use a CASE'd SUM instead of a COUNT()?

    SELECT
     Customers.CustomerID,
     Customers.CompanyName,
     COALESCE(SUM(CASE WHEN Orders.OrderID IS NOT NULL THEN 1 END),0) as TotalOrders
    FROM
     Customers LEFT OUTER JOIN
     Orders ON (Customers.CustomerID = Orders.CustomerID) AND (YEAR(Orders.OrderDate) = 1996)
    GROUP BY
     Customers.CustomerID,
     Customers.CompanyName

    That's off the top of my head, but it should work. Since the YEAR() is in the JOIN instead of the WHERE, it won't eliminate customers that didn't order in 1996. You can't use COUNT any more, as a customer with no orders in 1996 will still have the 1 NULL line, so you have to work around it with a SUM. This way, you don't lose performance because derived tables and lost keys and indexes. (Not a big deal on MS SQL Server, but a *huge* deal on older DBMSes like DB2/400.)
  • @Rick:

    You can get rid of the CASE statement altogether by using the isnull() function:

    SELECT
     Customers.CustomerID,
     Customers.CompanyName,
     ISNULL(COUNT(Orders.OrderID),0) as TotalOrders
    FROM
     Customers LEFT OUTER JOIN
     Orders ON (Customers.CustomerID = Orders.CustomerID) AND (YEAR(Orders.OrderDate) = 1996)
    GROUP BY
     Customers.CustomerID,
     Customers.CompanyName

    As we all know, there are always multiple solutions to the same problem. I was just trying to throw out a very simple example of how a derived table might be used (and this was the quickest I could think of.)
  • Fair enough, but I'd personally still use COALESCE() instead of ISNULL(). ;-)
  • Nice example, thanks!
  • Dan, Thank you so much for your blog. I had been struggling for almost a week by myself. Your example of derived tables helped me so much. I was able to follow your example and write my code and have it work within minutes.
  • Wo, just what I need for the changes I am making, thanks a lot, I am sure this would had saved me hours to figure out on my own.
  • Hi Dan,

    Thanks so much for your clear explanation. I've been puzzling over a customers SQL which does not look like ANSI standard. Now I know that what I'm seeing are derived tables. Is this standard SQL? I have an update that looks like this (simplified)

    update a from CUSTOMERS a,
    (select cust_id from cust_appl where typ_cd = 3) x,
    ( select grp_cd, rng_min, from AGE_GRP
    where a.cust_id = x.cust_id) c

    set grp_cd =c.grp_cd where a.cust_id = x.cust_id;

    There are no 'as' to indicate an alias, Is this optional?

    Thanks!
  • @Pam:

    Derived tables are not part of the ANSI standard, but I believe their supported by most major DB engines (I'm not aware of any that don't support the syntax, but 99.9999% of my database work in the past 10 years has been in MS SQL.)

    The "as" directive is purely optional in MSSQL (but I'm not sure if this is part of the ANSI standard or not.)

    However, the query is using an "implied join", which I recommend changing to an actual INNER JOIN statement (an implied join is an old shortcut to an INNER JOIN but is discourage and I believe even depreciated.) While the syntax is shorter, I find it much clearer to write out the SQL fully.
  • Good example to start with for derived tables
  • Thanks Dan. Your blog has taught me the concept of derived tables and I am going to use it a lot now :) .

Comments for this entry have been disabled.