Paginating data is one of those commonly required tasks. In previous version of MS SQL Server it's been tricky to handle, but made easy via the use of stored procedures that handle the pagination logic for you. However, if you're using SQL Server 2005 there's a much easier way to handle pagination and that's to use a Common Expression Table (CTE.)
Microsoft created the CTE syntax to make solving complicated tasks easier (such as returning a query of hierarchical tree data.) When you create a CTE, you're essentially creating a virtual table you can query against. CTEs generally perform very well and can often replace the need for creating temp tables.
Let's look at an example of query out records 11-20 from a Employee database:
-- create the Common Table Expression, which is a table called "pagination" with pagination as ( -- your normal query goes here, but you put your ORDER BY clause in the rowNo declaration select row_number() over (order by department, employee) as rowNo, -- a list of the column you want to retrieve employeeId, employee, department from Employee where disabled = 0 ) -- we now query the CTE table select -- add an additional column which contains the total number of records in the query *, (select count(*) from pagination) as totalResults from pagination where RowNo between 11 and 20 order by rowNo
The first part of the query creates the table expression we'll actually query against when we grab the actual results from the database. In the CTE you're going to write the SQL you'd normally write to grab all the records with one exception—you use the row_number() function to generate a virtual column that orders your result rows for you. The actual "order by" clause you want to use will actually go in the row_number() declaration—this makes SQL Server assign the correct row number for each record.
It's important to remember that a CTE can be as complex as you need it to be. You can do joins, pivots, etc—whatever you need.
The next step is to actually query against the CTE you created. This query will virtually be the same for all pagination queries you write. You are simply grabbing all of the columns and then limiting the returned data to just the rows you want to display (in this case rows 11 - 20.)
In this example, I added an additional column that contains the total results that were returned from the CTE. This is not needed, but it's handy if you need to do something like "Results 11 - 20 of 4,567 records." There are many ways you could generate the total records, this is just one method.
If you've upgraded to SQL Server 2005 and haven't started using CTEs yet, you really need to start getting familiar with them as they really simplify many tasks in SQL Server that previously were very difficult to solve.
Someone asked me this morning how they could determine distinct relationships. They wanted to be able to retrieve all the distinct rows where someone had defined another user as a friend or any users who had defined them as a friend. The table schema looked like this:
friend_user1 | friend_user2 |
1 | 10 |
2 | 6 |
10 | 1 |
6 | 1 |
2 | 1 |
10 | 2 |
As you can see friend 1 is pretty popular amongst other users. However, he only has 1 person defined as friend of his—friend 10. What we need to do is to find all distinct cross-column relationships in the table—which means only one row should be returned for the relationship between friend 1 and friend 10.
As I mentioned early I've been working on a SVN post-commit script. We've got a SVN repository that will be modified by several remote developers and I really need to keep an eye on this repository and I need to closely monitor changes to this repository.
There are two major functions that I needed in my post-commit script:
There are an abundant of examples showing off how to do this in various *nix flavors, but I couldn't find any good Windows-based solutions that didn't require Perl to be installed on the server. That led me to create the following post-commit.bat script.
I've been working on a post-commit hook for our Subversion install and was running in to a number of issues. The post-commit.bat file would run fine from command line, but I just could get things to work as I expected from SVN. After much debugging and scouring Google for answers, I've found a few tips that will hopefully help you to troubleshoot your own SVN repository hooks.
This was the biggest issue I was running in to, because I was expecting the my script to be able to find any programs in my %PATH% statement. That's the main reason my scripts were working fine from command line, but were breaking when executing from my SVN hook.
Pinal Dave posted this great little SQL snippet to view most recent queries executed in SQL Server 2005.
This is definitely a useful SQL snippet to see what's just happened when you're in a pinch.
I'm working on an table that uses the adjacency list model to store its hierarchical relationship. In a nutshell, it means my table has a parent-to-child relationship using the a foreign key to point to the parent primary key. While this is efficient for storage (and easy to understand,) this model was always extremely inefficient for retrieving the tree structure in MSSQL—that is until SQL Server 2005.
In SQL Server 2005, Microsoft implemented a new featured called Common Table Expressions (CTE,) which among other things allow you to transverse an adjacency list model table by using recursion.
A simple example looks like this:
Today I was trying to figure out a good way to determine the path through the tree. I started thinking that the MSSQL 2005's ROW_NUMBER() function would be a good way to do that. A quick Google search brought up an excellent article by Adam Machanic titled Swinging From Tree to Tree Using CTEs, Part 2: Adjacency to Nested Intervals.
It gives pretty thorough examples and even shows how using a CTE you can convert an adjacency list model table into a nested set model table.
In the past I've blogged about some free Windows-based Diff tools—like ExamDiff. However I recently came across WinMerge—an "Open Source visual text file differencing and merging tool for Win32 platforms."
I've only been playing around with it for a couple of days, but I'm very impressed so far. It does a very good job visualizing the differences in files and has a lot of configuration options (such as how to interpret whitespace.) You create patch files, merge the changes together and even compare folder contents.
The feature I really like is Windows Explorer Context Menu integration. The context menu allows you to select two files in Windows Explorer and right-click and select the "Compare" option and instantly get a diff from those two files. There's also an option called "Compare To" which allows you to select a single file and then you can explore to another folder highlight a second file and select the "Compare" option to compare those two files. I really like this option. For me this is generally much more convienent than open the program and using an "Open" dialog to compare two files. I just find when I need to compare two files and I'm not already in Eclipse, then I'm probably looking at the files in Explorer—so the context menu integration is extremely convienent.
Occassionally you come across something in a language you never knew existed, but always needed. Today I discovered a stored procedures included in MS SQL that I didn't know existed, but in the past have just used work around for.
Today I discovered the sp_refreshview stored procedure which updates "metadata for the specified non-schemabound view." In plane english that means if you've made changes to your tables that invalidates the view, it should refresh the view (this is problem if your view uses a select * from table—which should be avoid at all costs.) In the past I've always just done an ALTER VIEW to refresh the view—not very elegant, but it worked when I needed the view to pick up changes in the schema. This stored procedure provides to do the same thing.
For a simple explanation of the sp_refreshview see Vadivel's post on "Sp_refreshView explained..."
Sql-Server-Performance.com recently posted a really well written article on a new feature included in MS SQL 2005 SP2 called the Performance Dashboard. This article was written by Brad M. McGehee and really does a good job walking you through installation and usage of this new Performance Dashboard report.
I just installed this report this morning and it looks very promising. One issue I did run into, is that your database must be set to 9.0 compatibility in order to get the benefits of the Performance Dashboard. Here's a brief quote from the article:
In brief, the SQL Server 2005 Dashboard is a custom report (custom reports are a new feature of Service Pack 2) for Management Studio that gathers data from the many Dynamic Management Views (DMVs) and Dynamic Management Functions (DMFs) available in SQL Server 2005. It is not a replacement for other performance tools, but an adjunct tool to help DBAs better identify and troubleshoot performance problems. In many ways, the SQL Server 2005 Dashboard reports are similar to the many built-in reports already offered by Management Studio.
I was talking to a buddy this morning and he'd never seen the syntax for a derived table in MSSQL before. He couldn't quite grasp why they might be useful and I was having difficulty explaining via instant messenger, so I thought I'd blog a real world example of how derived tables can be useful.
In very basic terms, a derived table is a virtual table that's calculated on the fly from a select statement. They can be tremendously useful in certain situations. For my example below, I'm going to use the Northwind database that included with MSSQL 2000.
Let's say you've been asked to generate a report that shows off the total number of orders each customer placed in 1996. "Not a problem." you think to yourself. "This is just an easy aggregated join query." So, you sit down and generate your query and come up with:
You know, sometimes query analyzer in MSSQL 2000 just doesn't get things right. I had a query that was taking a very long time to execute in MSSQL. The query on average would take 30-60 seconds to complete execution. The query was running a LIKE search over a text data type. Not ideal by any means, but the execution was mind boggling slow.
Figuring my problem was in the fact that I'm doing a LIKE operation on a text data type over hundreds of thousands of records, I figured that was the main culprit. So one of the very first things I tried was setting up a Full Text Index on the data, so I could do a FTS over the data instead. Much to my chagrin, this really showed no noticeable performance increase (maybe like 2-3 seconds faster, but not nearly enough to warrant making radical changes to implement FTS.)
So where was my problem? I really started studying the execution plan. Everything I looked at seem to imply that the LIKE operator was the problem, and it was mandatory (especially since FTS provided no real performance increase.)
A while back I found a stored procedure that someone was using to create pagination resultsets in SQL Server 2000. While the stored procedure was nice, it didn't allow for the "DISTINCT" keyword—which is necessary for some of my queries. There were also a few other issues I didn't like about the stored procedure, so I re-wrote the stored procedure pretty much from scratch.
In order to support the "DISTINCT" keyword, I had to change the inner query to use a derived table. Since this provides a little more overhead, I added a bit parameter "@Distinct" which inserts the "DISTINCT" keyword into the query and uses the derived table.
Also, on the initial page of results, I simply use the TOP keyword. The previous stored proc used an inner query, which did a TOP 0—which means the code did nothing.
I'm working on some legacy code that uses a MS SQL Server 2000 database. In one of the tables, there's a phone number field that is a varchar column. The application essentially treats a "Phone Number" as straight text, so there can be any type of value in this column.
Normally when I design a "phone number" field, I always remove formatting and rely on the view to display the phone number correctly. This provides consistency and allows for easy searching of phone numbers.
The problem I've run into is that I need to provide phone searching capabilities that will find a phone number regardless of how it's formatted in the column, but I can't change the existing data. Unfortunately, MS SQL Server 2000 doesn't have support for Regular Expressions and it's built-in pattern searching isn't powerful enough to do what I want to do.
A follow up from my blog post on Wednesday. Looks like peer pressure really got to Microsoft. An official patch to the WMF vulnerability was released late yesterday. If you don't have your PC set to automatically update, make sure to go visit the Windows Update site (make sure you're using Internet Explorer) and patch your system ASAP.
So much for Microsoft sticking to their patch release schedule. Personally, I'm glad they didn't wait. They should be releasing patches as they're available. If corporations want to wait and release things on a schedule, that's fine, but let us contractors and personal users fix our computers as quickly as possible.
This isn't exactly breaking news, but there's a very nasty bug in Windows 2000, XP and 2003 that has just recently been revealed being called the "WMF vulnerability." The bug has to do with a vulnerable function in GDI32.DLL library that can allow a malicious hacker/web site to install a virus/spyware on your computer.
What makes this bug extremely dangerous is any program that views images (such as Internet Explorer, Firefox, etc) is vulnerable.
Microsoft is claiming that there will not be an "official" patch for this bug until next week. However, this thing is nasty enough that everyone running Windows 2000, XP or 2003 should take some kind of action now.