Understanding "derived" tables in SQL...

Categories: HTML/ColdFusion, SQL

I was talking to a buddy this morning and he'd never seen the syntax for a derived table in MSSQL before. He couldn't quite grasp why they might be useful and I was having difficulty explaining via instant messenger, so I thought I'd blog a real world example of how derived tables can be useful.

In very basic terms, a derived table is a virtual table that's calculated on the fly from a select statement. They can be tremendously useful in certain situations. For my example below, I'm going to use the Northwind database that included with MSSQL 2000.

Let's say you've been asked to generate a report that shows off the total number of orders each customer placed in 1996. "Not a problem." you think to yourself. "This is just an easy aggregated join query." So, you sit down and generate your query and come up with:

select
   Customers.CustomerID, Customers.CompanyName,
   count(Orders.OrderID) as TotalOrders
from
   Customers
      left outer join
   Orders
      on
   Customers.CustomerID = Orders.CustomerID
where
   year(Orders.OrderDate) = 1996
group by
   Customers.CustomerID, Customers.CompanyName

Looks good, eh? But there's something missing. Customers that didn't place an order in 1996 aren't showing up. You could leave off customers that didn't place an order, but in many cases the customers that didn't buy place orders are the ones that are of the most interest.

If you think you can include those customers with a "is null" check, you'd be wrong. Run the following code and see if you notice what is wrong:

select
   Customers.CustomerID, Customers.CompanyName,
   count(Orders.OrderID) as TotalOrders
from
   Customers
      left outer join
   Orders
      on
   Customers.CustomerID = Orders.CustomerID
where
   (
      year(Orders.OrderDate) = 1996
   or
      Orders.OrderDate is null
   )
group by
   Customers.CustomerID, Customers.CompanyName

At first glance this may look right, but if you examine the results carefully you'll see that the customers with the zero count have never placed an order. If a customer has placed an order, but just not in the year 1996 they won't show up. This is because the "is null" check finds customers that have never placed an order—it still doesn't do anything to add customers who've placed an order, but just not in 1996.

This is where a derived table can come in handy. Instead of using the "Orders" table, we'll reduce the Orders table to a snapshot of itself—including only data from the year 1996.

select
   Customers.CustomerID, Customers.CompanyName,
   count(dOrders.OrderID) as TotalOrders
from
   Customers
      left outer join
   /* start our derived table */
   (
      select
         *
      from
         Orders
      where
         year(Orders.OrderDate) = 1996
   ) as dOrders
   /* end our derived table */
      on
   Customers.CustomerID = dOrders.CustomerID
group by
   Customers.CustomerID, Customers.CompanyName

Run the above query and examine the results. You should now see a row returned for each Customer and the total number or orders placed in the year 1996—including the customers that didn't place an order.

The reason this works is because the LEFT JOIN will include all matches or null values. If the LEFT JOIN has matches (like in our first query,) but non that match the criteria those rows are excluded. In our derived table, since we're only running against the orders from the year 1996 we're ensuring that we return results for each customer.

Hopefully this helps explain why derived tables can be useful.

Comments

Rick O's Gravatar I know you were just trying to show an example, but ...

Why not move the year into the join and use a CASE'd SUM instead of a COUNT()?

SELECT
Customers.CustomerID,
Customers.CompanyName,
COALESCE(SUM(CASE WHEN Orders.OrderID IS NOT NULL THEN 1 END),0) as TotalOrders
FROM
Customers LEFT OUTER JOIN
Orders ON (Customers.CustomerID = Orders.CustomerID) AND (YEAR(Orders.OrderDate) = 1996)
GROUP BY
Customers.CustomerID,
Customers.CompanyName

That's off the top of my head, but it should work. Since the YEAR() is in the JOIN instead of the WHERE, it won't eliminate customers that didn't order in 1996. You can't use COUNT any more, as a customer with no orders in 1996 will still have the 1 NULL line, so you have to work around it with a SUM. This way, you don't lose performance because derived tables and lost keys and indexes. (Not a big deal on MS SQL Server, but a *huge* deal on older DBMSes like DB2/400.)
Dan G. Switzer, II's Gravatar @Rick:

You can get rid of the CASE statement altogether by using the isnull() function:

SELECT
Customers.CustomerID,
Customers.CompanyName,
ISNULL(COUNT(Orders.OrderID),0) as TotalOrders
FROM
Customers LEFT OUTER JOIN
Orders ON (Customers.CustomerID = Orders.CustomerID) AND (YEAR(Orders.OrderDate) = 1996)
GROUP BY
Customers.CustomerID,
Customers.CompanyName

As we all know, there are always multiple solutions to the same problem. I was just trying to throw out a very simple example of how a derived table might be used (and this was the quickest I could think of.)
Rick O's Gravatar Fair enough, but I'd personally still use COALESCE() instead of ISNULL(). ;-)
Bruce Kutnick's Gravatar Nice example, thanks!
Lisa Miller's Gravatar Dan, Thank you so much for your blog. I had been struggling for almost a week by myself. Your example of derived tables helped me so much. I was able to follow your example and write my code and have it work within minutes.
yongke bill yu's Gravatar Wo, just what I need for the changes I am making, thanks a lot, I am sure this would had saved me hours to figure out on my own.

Add Comment



If you subscribe, any new posts to this thread will be sent to your email address.