SQL: Searching formatted varchars with non-formatted strings

Posted by Dan on Jan 10, 2006 @ 5:11 PM

I'm working on some legacy code that uses a MS SQL Server 2000 database. In one of the tables, there's a phone number field that is a varchar column. The application essentially treats a "Phone Number" as straight text, so there can be any type of value in this column.

Normally when I design a "phone number" field, I always remove formatting and rely on the view to display the phone number correctly. This provides consistency and allows for easy searching of phone numbers.

The problem I've run into is that I need to provide phone searching capabilities that will find a phone number regardless of how it's formatted in the column, but I can't change the existing data. Unfortunately, MS SQL Server 2000 doesn't have support for Regular Expressions and it's built-in pattern searching isn't powerful enough to do what I want to do.

This lead me to searching Google in which I found two solutions that will solve my problem—both of which are free.

The first solution is a custom function (UDF) for MS SQL Server 2000 called PatternReplace. The PatternReplace() UDF is designed to mimic the built-in SQL Replace() function, but it allows a search pattern for the string to find. This allows allows me to do:

select
    *
from
    Customer
where
    dbo.PatternReplace(PhoneNo, '[^0-9]', '') like '%5551212%'

This would find any phone number in the database that is 555-1212—regardless of how it's formated in the database (i.e. "(800) 555-1212", "800/555-1212", "800 555 1212", etc, etc.)

The other solution is an extended stored procedure written in C++ that gives you true Perl-like Regular Expression support. This project is called xp_pcre - Regular Expressions in T-SQL and was written by Dan Farino. This project has a number of UDFs that uses the extended stored procedure to give you all sorts of functionality. You can do a RegEx match, replace, format and split.

I haven't decided which solution I'm going to use yet. The code I'm working on needs to be portable and has to be able to be distributed to a client. Using an extended stored procedure adds a level of complication in that the client would need the extended stored procedure installed as well, where as PatternReplace() is pure SQL, it can be distributed with the database.

Anyway, I plan on doing some testing over the next few days to see which solution will work best for me.

Categories: SQL, HTML/ColdFusion

6 Comments

  • If you google on these terms:

    "t-sql" query phone number "regular expression"

    Your article is top of the heap on Google. Congrats! So which route did you take? I gotta do the same thing.
  • I did nothing--other than try to have quality information that people might link to. I've noticed the ranking of my blog usually rank quite high. I suspect this is due to well labeled content and other people linking to my content.
  • Sorry, I should have made those separate paragraphs. What I mean was:

    A) Yer top of the heap. Congrats.

    B) In your article you detailed two possible methods to use but had not yet decided which to use. So which route did you take and how did it go?

    see ya!
  • Monty,

    Sorry--I completely misread your comment!

    Anyway, in my load testing the PatternReplace() work just fine, plus it encapsulates better.

    You can also speed things up even more by creating a seperate column that contains a stripped out copy of the phone number. You can use a trigger to update the column in the background. If you do that, you'll have no extra lookup/scans on select queries.
  • Just a note that the link to PatternReplace is broken. The correct article appears to be at http://sqlblog.com/blogs/adam_machanic/archive/200...
  • @AC:

    Thanks for the updated link. I've updated the blog entry to reflect the new URL. I compared the code and it is indeed identical to the code I'm using.

Comments for this entry have been disabled.