dans.blog


The miscellaneous ramblings and thoughts of Dan G. Switzer, II

Simple managing of one-to-many relationships w/audit trails using the OUTPUT clause in MSSQL 2005+

Earlier today I posted a generic SQL solution on easily manage one-to-many data in your SQL database. However, if you're using SQL Server 2005 (or above) you can use the OUTPUT clause to add an resultset to your SQL that will contain an audit trail of what data was removed and what data was added.

In the example, we had a database schema that looked like this:

image[10]

The User table holds the information on all your Users, the UserRole table holds all the roles of each User and the Role table contains a list of valid roles that a user can hold.

If we want to update all the roles for a specific User, I showed you could use the following 2-step SQL process to update all their roles without deleting or changing existing roles that are still valid:

delete
from
  UserRole
where
  UserId = 1
-- IF WE HAVE NO ROLES TO ASSIGN, DO NOT RUN ANY CODE BELOW THIS LINE
-- this deletes only rows not in our new selection
and
  RoleId not in (1,2,3)

insert into
  UserRole
(
  UserId, RoleId
)

select
  1 as UserId
  , RoleId
from
  Role
where
  RoleId in (1,2,3)
-- only grab roles not already inserted into the database
and
  RoleId not in (
    select
      RoleId
    from
      UserRole
    where
      UserId = 1
  )

However, we can take this code one step further we can use the OUTPUT clause in SQL Server 2005+ to capture the changed data so we can return a resultset with our query that provides an audit trail of all the changes:

-- create a table to store results from the various SQL operations
declare @results table (Type varchar(20), UserId int, SecurityLevelId int)

delete
from
  UserRole
-- store the records deleted into the temp table
output
  'deleted' as Type, Deleted.UserId, Deleted.RoleId into @results

where
  UserId = 1
-- IF WE HAVE NO ROLES TO ASSIGN, DO NOT RUN ANY CODE BELOW THIS LINE, EXCEPT FOR THE FINAL SQL STATEMENT
-- this deletes only rows not in our new selection
and
  RoleId not in (1,2,3)

insert into
  UserRole
(
  UserId, RoleId
)

-- store the records deleted into the temp table
output
  'inserted' as Type, Inserted.UserId, Inserted.RoleId into @results

select
  1 as UserId
  , RoleId
from
  Role
where
  RoleId in (1,2,3)
-- only grab roles not already inserted into the database
and
  RoleId not in (
    select
      RoleId
    from
      UserRole
    where
      UserId = 1
  )

-- output the audit trail of the delete & insert operations
select
  *
from
  @results

With the addition of the OUTPUT clause, our query will now return resultset with 3 columns:

  • Type – This a string that will be either "deleted" or "inserted", based on the operation that occurred
  • UserId – The userId that was removed (in this use case would always be the same userId)
  • RoleId – The role that was modified

Now you've got a great little SQL snippet that you can use that builds an audit trail that you can use to track that changes to your UserRole table! While this is a pretty simple example, you could use this basic syntax anytime you need to manage this type of data.


Easily manage one-to-many data in your SQL database without using multiple INSERT INTO statements

A task that most web developers have faced at one point or another is how to cleanly manage updating data in a one-to-many relationship in a relational database. For example, let's say that you have schema looks something like this:

image

The User table holds the information on all your Users, the UserRole table holds all the roles of each User and the Role table contains a list of valid roles that a user can hold. This type of schema gives us a ton of flexibility, but management of the UserRole table can often be tricky. The most common method I've seen developers use to maintain a table like this is to delete all the data in the UserRole table for a user and use a loop to do an insert on each role that needs to be assigned. This leads to SQL that looks something like:

delete
from
  UserRole
where
  UserId = 1

insert into
  UserRole
(
  UserId, RoleId
) values (
  1, 1
)

insert into
  UserRole
(
  UserId, RoleId
) values (
  1, 2
)

insert into
  UserRole
(
  UserId, RoleId
) values (
  1, 3
)

While this works, it has several issues:

  1. If there are many roles being modified, this can lead to a very lengthy SQL statement and/or many database transactions (depending on how your code is implemented.)
  2. There is no inherit constraint to ensure that an invalid role does not get inserted. If your database has foreign key relationships defined between the UserRole.RoleId and Role.RoleId columns your database would throw an error, but wouldn't it be better just to ignore invalid values?
  3. Since you are deleting all the roles on each update, your assigning new UserRoleId to roles that may not have changed. This can lead to more work on your database's side for managing indexes, etc.

Wouldn't it be nicer if you could just supply a list of the RoleIds you want to update instead of doing a bunch of individual INSERT INTO statements?

Well there is actually a very nice way to manage this that should work in any modern database. The trick is to use an INSERT INTO with SELECT statement to manage your data. The basic idea is instead of doing a bunch of individual INSERT INTO statements, we can write a single SQL statement that looks like this:

insert into
  UserRole
(
  UserId, RoleId
)

select
  1 as UserId
  , RoleId
from
  Role
where
  RoleId in (1,2,3)

What's nice about this code is work's around our first two issues by reducing the SQL statement and ensures that only valid Roles will actually get inserted into the the UserRole table—since we're selecting the valid rows right from the Role table.

But how do we solve issue #3, so that we don't alter roles that haven't changed? The trick is to only delete the roles not in our new list of values, then modify our insert to only insert rows not already assigned to us. So our complete code now looks like this:

delete
from
  UserRole
where
  UserId = 1
-- IF WE HAVE NO ROLES TO ASSIGN, DO NOT RUN ANY CODE BELOW THIS LINE
-- this deletes only rows not in our new selection
and
  RoleId not in (1,2,3)

insert into
  UserRole
(
  UserId, RoleId
)

select
  1 as UserId
  , RoleId
from
  Role
where
  RoleId in (1,2,3)
-- only grab roles not already inserted into the database
and
  RoleId not in (
    select
      RoleId
    from
      UserRole
    where
      UserId = 1
  )

This new code gives us 2 simple SQL statements (that can be run in a single transaction) which will now remove any roles that are no longer assigned to a user and only insert the new rows that need to be added. We're not altering data that hasn't changed.

Coming shortly, I'll post a follow up article that's specific to SQL Server 2005 (and above) on how you can modify the above example to return a query that will contains an audit of all the changes applied to the database.


Converting Microsoft Word Documents to Adobe PDF generates blurry images

This is an issue I keep running into I always forget how to resolve the issue, so I thought I'd better blog the solution (that way when I run into it again and search for the fix, I'll find it via Google!)

If you're running into issues when converting a Word 2002/2003/2007 document into a PDF with images from the Word document being extremely blurry, this due to the use of GDI+ technology:

Under some circumstances, Visio and other document types with graphics that are embedded within a Word document may not convert correctly to PDF. This issue affects Word documents opened in Microsoft Word 2002/XP, Word 2003 and Word 2007. This is because these versions of word use the GDI Plus (GDI+) technology to allow printer-independent operations for on-screen rendering. This is most often used when the embedded object uses Postscript fonts. Documents that exhibit this problem might have areas of color, as if someone highlighted the document and missing text and graphics, but only in the embedded object.

There are two solutions to this problem:

Use another document viewer to convert this document
Any document viewer that can read the document and does not use GDI+ can be used to render the embedded object correctly. For example, both Microsoft Word 2000 and OpenOffice.org Writer do not use GDI+ and can be used to convert documents affected by this issue with no problems.

Convert the embedded object to an image
Probably the easiest solution to this issue is to paste the data as an image, rather than embedding the data into the Word document.

  1. From your application from which you wish to embed data (e.g.: Visio), copy the shapes and the text that you want.
  2. Open Microsoft Word.
  3. On the Edit menu, click Paste Special.
  4. In the Paste Special dialog box, click Bitmap (or Picture XXX), and then click OK.

Note: Do not click Picture (Enhanced Metafile).

courtesy: http://support.software602.com/kb/view.aspx?articleID=1063

I've been using the "Bitmap" option for logos and that seems to work very well to keep the images sharp.

UPDATE:
Some asked for a screenshot, so I thought I'd update this entry.

So that you can see the difference, here's a screenshot showing the problem. On the right is a Word document with the same image pasted twice. As you can see, the first version ends up looking extremely blocky when converted to a PDF. The second image was copied from the first image, but pasted using the Paste Special > Bitmap option and looks remarkable better. I wanted both versions in the same PDF so that you can see it's has nothing to do with the PDF conversion settings, but everything to do with Word and GDI+.

image


Safely restart the Application scope in ColdFusion 8 and above

I've been doing some major restructuring of a legacy application lately and was looking for the safest method to "reset" the Application scope. We wanted to be able to safely push out small bug fixes to CFCs stored in the Application scope without having to restart the ColdFusion service just in order for the changes to take affect. We were looking for a way that would do this that was safe across threads, without having to go through and put locks on every one of our Application scope reads.

I did some playing with various techniques, but really wanted something like the ApplicationStop() function that was introduced in ColdFusion 9. Fortunately Dave, over at Mister Dai, had already done the ground work and discovered that ColdFusion 7 & 8 both have a coldfusion.runtime.ApplicationScopeTracker object that manages the Application scope. Inside this object is a method called cleanUp() which appears to do exactly what the ApplicationStop() function does in ColdFusion 9.

After a couple of quick tests, Dave's claim seem to be accurate—calling the cleanUp() method really did seem to safely end an Application lifecycle. However, I wanted to go deeper and really put together some tests I could run under load to see if my simple tests we accurate. Fortunately, even through significant load at the code showed stability. I put together several tests all of which should that the cleanUp() method seems to safely end the Application scope and the next new page request will re-initialize the Application scope as if the application was initialized for the first time.

While I'm not going to bother post all my load test scripts, I will share some very simple code which shows that the cleanUp() method does safely clean up the application:

<cfparam name="url.thread" type="numeric" default="1" />

<cfset secretKey = "your-secret-url-variable-name" />
<cfset secretValue = "your-secret-url-value" />

<!---// create random key //--->
<cfset key = createUUID() />
<cfif not structKeyExists(Application, "_")>
  <cfset Application["_"] = {} />
</cfif>

<cfoutput>
  <!---// show an existing random App keys //--->
  <cfdump var="#duplicate(Application['_'])#" />

  <!---// store the new random key in the App scope //--->
  <cfset Application["_"][key] = key />

  <!---// output key //--->
  <div>
    #Application["_"][key]#
  </div>

  <!---// for the first thread, let's run a CFHTTP to reset the Application scope //--->
  <cfif url.thread eq 1>
    <cfthread action="run" name="threadReinitApp">
      <cfset sleep(1500) />
      <cfhttp url="#getPageContext().getRequest().getRequestURL().toString()#?#secretKey#=#urlEncodedFormat(secretValue)#" method="get" result="thread.cfhttp" redirect="false">
      </cfhttp>
    </cfthread>
  </cfif>

  <!---// delay for 5 seconds, to make sure the reinit request has completed //--->
  <cfset sleep(5000) />

  <!---// output key, if Application scope is completely killed, then this would error //--->
  <div>
    #Application["_"][key]#
  </div>

  <!----// show the results of re-initializing app //--->
  <cfif isDefined("threadReinitApp")>
    <cfthread action="join" name="threadReinitApp" timeout="600" />
    <cfdump var="#threadReinitApp.cfhttp#" />
  </cfif>
</cfoutput>

What this code does is:

  1. Creates a random key in the Application scope
  2. Spawns a thread to re-initialize your application using your secret reinit key/value
  3. While the other thread has been spawned, the main thread waits 5 seconds so we have enough time for the re-init script to run
  4. Attempt to output the variable in the Application scope.

If the Application scope was reset with something like structClear(Application) then the second time we tried to output the Application["_"][key] the script would have generated an error. However, in all my load testing, this code has never thrown an error.

To actually re-initialize my Application, I implemented 2 custom UDFs that I add to my Application.cfc that are based on Dave's original code. The two functions are ApplicationReset() and ApplicationKill(). The ApplicationReset() method takes in your URL secret key parameter name and calls the ApplicationKill() method and then re-load the current page without the secret URL key. The ApplicationKill() is based on Dave's code, but provides a cross-ColdFusion version of ColdFusion 9's ApplicationStop() method.

<cffunction name="ApplicationReset" access="public" returntype="void" output="false" hint="Resets the application scope">
  <cfargument name="secretKey" type="string" required="true" hint="Secret URL key used to reset the application" />

  <!---// declare local variables //--->
  <cfset var currentUrl = getPageContext().getRequest().getRequestURL().toString() />
  <!---// get the current query string, but remove the secret key //--->
  <cfset var queryString = reReplaceNoCase(cgi.query_string, "(^|&)" & arguments.secretKey & "=.*?(&|$)", "\2", "all") />

  <!---// reset the application (from Application.CFC) //--->
  <cfset ApplicationKill() />

  <!---// add the query string //--->
  <cfif len(queryString)>
    <cfset currentUrl = currentUrl & "?" & queryString />
  </cfif>

  <!---// reload the current page w/out the reinit token //--->
  <cflocation url="#currentUrl#" addtoken="false" />
  <cfabort />
</cffunction>

<!---// stops an application, so that on next reque //--->
<cffunction name="ApplicationKill" access="public" returntype="boolean" output="false" hint="Implementation of ApplicationStop() that works with CF7 and CF8">
  <cfif isDefined("application")>
    <cftry>
      <cfif listFind("7,8", listFirst(server.coldfusion.productVersion))>
        <!---// this is just in case there's no app scope but variables.application //--->
        <cfset createObject("java", "coldfusion.runtime.ApplicationScopeTracker").cleanUp(application) />
      <cfelse>
        <!---// use CF9+'s thread safe method of restarting the application //--->
        <cfset applicationStop() />
      </cfif>

      <cfreturn true />
      <cfcatch type="any"></cfcatch>
    </cftry>
  </cfif>

  <cfreturn false />
</cffunction>

While you should always test stuff for yourself, I did spend a good amount of time running various load tests to ensure that these methods were safe. I was running 20 concurrent users simulating heavy load and would randomly cycle in a request to reset the Application scope at least 10% of the time. I ran dozens of these tests, resulting in 50,000+ page views without seeing any errors or unexpected results—and that was just with one specific load test, I actually ran a couple different tests and they all worked as expected.

So, if you're looking for a safe way to "restart" (or "reset") the Application scope in ColdFusion 8 it appears the safest method would be to use the ApplicationKill() method listed above.


Easy AJAX using ColdFusion, jQuery and CFCs

A recent post on CF-Talk asked whether ColdFusion could use AJAX to do a database lookup. This is actually extremely easy to do in ColdFusion 8+, because it natively supports returning data in the JSON format.

To show how easy this is to do, I decided to throw together a little demo. This took me about 10 minutes to write—most of which was writing the markup. However, the bulk of the work is going to be handled automatically by ColdFusion, which will handle converting your data to JSON and by the jQuery Field Plug-in (which I wrote) which will handle populating your form from the data it receives from ColdFusion.

To show just how easy this all is, here's the jQuery code required to make an AJAX call to a CFC:

$.ajax({
  // the location of the CFC to run
    url: "example.cfc"
  // send a GET HTTP operation
  , type: "get"
  // tell jQuery we're getting JSON back
  , dataType: "json"
  // send the data to the CFC
  , data: {
    // the method in the CFC to run
      method: "getUserById"
    /*
      send other arguments to CFC
    */
    // send the ID entered by the user
    , userId: $("#userId").val()
  }
  // this gets the data returned on success
  , success: function (data){
    // this uses the "jquery.field.min.js" library to easily populate your form with the data from the server
    $("#frmMain").formHash(data);
  }
  // this runs if an error
  , error: function (xhr, textStatus, errorThrown){
    // show error
    alert(errorThrown);
  }
});

Our CFC looks like this:

<cfcomponent output="false">
  <cffunction name="getUserById" access="remote" returnType="struct" returnFormat="json" output="false">
    <cfargument name="userId" type="numeric" required="false" />

    <cfset var user = structNew() />
    <!---//
      The only tricky part here is to use the bracketed notation
      to match the case in your HTML, JS is case sensentive.
      If you use the dot notation (user.name) then the keys will
      be returned in uppercase.
    //--->
    <cfset user["name"] = "User " & arguments.userId />
    <cfset user["email"] = "user_" & arguments.userId & "@example.com" />
    <cfif (arguments.userId mod 2) eq 0>
      <cfset user["gender"] = "f" />
    <cfelse>
      <cfset user["gender"] = "m" />
    </cfif>

    <cfreturn user />
  </cffunction>
</cfcomponent>

I've posted a working example so that you can see how this code looks. For the "User ID" just enter any number. If an error occurs, the error callback will handle displaying the error to the screen. I've also posted the source code as a zip file you can download.

NOTE:
If you're still on ColdFusion MX & 7, there's a little more work because non-string data automatically gets converted to WDDX. While there are JS libraries for converting WDDX to native JS objects, they can be hard to find now that the OpenWDDX site has been shut down. You can find tools at RIAForge though that can convert your data into a JS string--which jQuery will automatically evaluate when it's received.
UPDATE:
I've added some sample code to the download to show how you can use a proxy template in you're using ColdFusion 7 to get the same results. Just look at the example_cf7.cfm. The only difference is we call the example_cf7_proxy.cfm template instead of calling the example.cfc directly. The proxy template uses CFJSON to convert the results to JSON.

[UPDATED: Friday, March 04, 2011 at 9:59:11 AM]


Bug fixes to CFCParser.cfc used by various open source projects (like CFCDocs)

I recently installed a copy of CFCDocs from RIAForge to give our developers better documentation for all of our various components. However, I quickly ran into a few problems related to the CFCParser component that it uses to analyze components.

The CFCParser doesn't create an instance of the object, but instead analyzes the component using a custom parser. The problem is the parser uses some overly simple logic for detecting attribute name/value pairs and for finding starting/ending tag values.

The problem really presents itself when you have a piece of code as follows:

<cfargument name="hightlightStart" type="string" required="false" default="<span style=""background-color:##ffff66;"">" />

In XML, it would be invalid to have an attribute that looked like:  default="<span style=""background-color:##ffff66;"">". However, in ColdFusion this is perfectly legally.

After spending way too much time debugging the issue, I found there were two distinct issues:

  1. The getTagAttributes() function used a regex that could not properly handle attributes that had escaped quotes.
  2. The findTags() function did not analyze if the ending tag token was inside a quoted string. So, if you had a > character inside an attribute in a tag the parser would use first match it found as the ending tag token—instead of only the first match outside of the attribute value.

I was able to fix both of these issues by re-writing portions of both the findTags() and getTagAttributes() functions.

Since I don't know who wrote this code or who manages it, I thought I'd at least post the code here in case anyone else has issues w/this component.

<cfcomponent hint="The CFCParser component provides methods to parse a ColdFusion Component.">

  <cffunction name="init" access="public" returntype="CFCParser" output="false" hint="The init method currently does nothing.">
    <cfreturn this>
  </cffunction>

  <cffunction name="findTags" access="private" returntype="array" output="true" hint="The findTags method searches the document for the given startTag and endTag. It returns an array of structures containing the locations in the document of the start and end position of each tag, and the full contents of the tag itself.">
    <cfargument name="document" type="string" required="yes">
    <cfargument name="startTag" type="string" required="yes">
    <cfargument name="endTag" type="string" required="yes">

    <!--- Find and remove comments --->
    <cfset var tagLocations = arrayNew(1)>
    <cfset var nestingLevel = 1>
    <cfset var searchMode = "start">
    <cfset var position = 1>
    <cfset var i = 0>
    <cfset var j = 0>
    <cfset var tagBegin = 0>
    <cfset var tagEnd = 0>
    <cfset var tagBlock = "">
    <cfset var tmpPosition = 0>
    <cfset var nestCount = 0>
    <cfset var padding = "">
    <cfset var lastReturn = "">
    <cfset var lastSpace = "">
    <cfset var stTag = "">
    <!----// captures quoted strings and the end tag //--->
    <cfset var regexFindEndTag = '(((("")|".*?[^"]"(?!"))|(('''')|''.*?[^'']''(?!''))))|(#arguments.endTag#)' />
    <cfset var findEndTag = 0 />
    <cfset var findEndTagMatch = "" />

    <cfloop from="1" to="#len(document)#" index="i">

      <cfif searchMode is "start">

        <cfset tagBegin = findNoCase(startTag,document,position)>

        <cfif tagBegin>
          <cfset position = tagBegin + len(startTag)>
          <cfset searchMode = "end">
          <!--- <cfoutput>Start Tag found at character #tagBegin#<br></cfoutput> --->
        <cfelse>
          <cfbreak>
        </cfif>

      <cfelse>
        <cfset findEndTagMatch = "" />
        <!---// if finding complex ending tokens, use original simple find //--->
        <cfif arguments.endTag neq ">">
          <cfset tagEnd = find(endTag,document,position)>
        <!---// for other logic, we need to make sure we don't accidentally find the ending tag inside a quoted string //--->
        <cfelse>

          <!---// loop through quotes strings and end tag matches looking for the first end tag match //--->
          <cfloop condition="findEndTagMatch neq arguments.endTag">
            <!---// we're going to loop through quoted strings and matching end tags looking for a match //--->
            <cfset findEndTag = reFindNoCase(regexFindEndTag, document, position, true) />

            <!---// if a match was found, get it //--->
            <cfif findEndTag.pos[1]>
              <!---// get the match--which is either a quoted string or the matching end tag //--->
              <cfset findEndTagMatch = mid(document, findEndTag.pos[1], findEndTag.len[1]) />
              <!---// update the position in the search //--->
              <cfset position = findEndTag.pos[1] + findEndTag.len[1] />

            <!---// if no matches, stop //--->
            <cfelse>
              <cfbreak />
            </cfif>
          </cfloop>

          <!---// return the ending position //--->
          <cfset tagEnd = findEndTag.pos[1] />
        </cfif>

        <cfif tagEnd>
          <cfset tagEnd = tagEnd + len(endTag)>
          <cfset position = tagEnd>
          <!--- <cfoutput>End Tag found at character #tagEnd#<br></cfoutput> --->
        <cfelse>
          <cfbreak>
        </cfif>

        <cfset tagBlock = mid(document,tagBegin,tagEnd-tagBegin)>

        <cfset tmpPosition = 1>
        <cfset nestCount = 0>
        <cfloop from="1" to="#len(tagBlock)#" index="j">
          <cfif findNoCase(startTag,tagBlock,tmpPosition)>
            <cfset tmpPosition = findNoCase(startTag,tagBlock,tmpPosition) + len(startTag)>
            <cfset nestCount = nestCount + 1>
          <cfelse>
            <cfbreak>
          </cfif>
          <!--- <cfoutput>TmpPosition: #tmpPosition#(#htmlEditFormat(mid(tagBlock,tmpPosition,len(tagBlock)))#)<br></cfoutput> --->
        </cfloop>

        <!--- <cfoutput>count - #nestCount# :: Level - #nestingLevel#<br></cfoutput> --->
        <cfif nestCount EQ nestingLevel>

          <cfset lastSpace = reFindNoCase('[#chr(32)##chr(9)#][^#chr(32)##chr(9)#]+$',tagBlock)>
          <cfset lastReturn = reFindNoCase('[#chr(10)##chr(13)#][^#chr(10)##chr(13)#]+$',tagBlock)>

          <cfset padding = "">

          <cfif lastReturn AND lastSpace AND lastReturn LT lastSpace>
            <cfset padding = mid(tagBlock,lastReturn+1,lastSpace-lastReturn)>
          </cfif>

          <cfset stTag = structNew()>
          <cfset stTag.start = tagBegin>
          <cfset stTag.end = tagEnd>
          <cfset stTag.tagBlock = padding & tagBlock>
          <cfset arrayAppend(tagLocations,stTag)>
          <cfset searchMode = "start">
        <cfelse>
          <cfset nestingLevel = nestingLevel + 1>
        </cfif>

      </cfif>

    </cfloop>

    <cfreturn tagLocations>
  </cffunction>

  <cffunction name="removeComments" access="private" output="false" returntype="string" hint="Strips the comments from a document so that code inside comments gets ignored by the findTags method">
    <cfargument name="document" type="string" required="yes">

    <cfset var tagLocations = findTags(arguments.document,"<!---","--->")>

    <cfset var offset = 0>
    <cfset var i = 0>

    <cfset var start = 0>
    <cfset var count = 0>

    <cfloop from="1" to="#arrayLen(tagLocations)#" index="i">
      <cfset start = tagLocations[i].start - offset>
      <cfset count = tagLocations[i].end - tagLocations[i].start>
      <cfset arguments.document = removeChars(arguments.document,start,count)>
      <cfset offset = offset + count>
    </cfloop>

    <cfreturn document>
  </cffunction>

  <cffunction name="getMethods" access="private" returntype="array" output="false" hint="Calls the findTags method to retrieve all cffunction tags in the given document.">
    <cfargument name="document" type="string" required="true">
    <cfreturn findTags(document,"<cf"&"function ","</cf"&"function>") />
  </cffunction>

  <cffunction name="getProperties" access="private" returntype="array" output="false" hint="Calls the findTags method to retrieve all cffunction tags in the given document.">
    <cfargument name="document" type="string" required="true">
    <cfreturn findTags(document,"<cf"&"property ",">")>
  </cffunction>

  <cffunction name="getArguments" access="private" returntype="array" output="false" hint="Calls the findTags method to retrieve all cfarguments tags in the given document. This method should be passed the body of a cffunction tag as the document argument.">
    <cfargument name="document" type="string" required="true">
    <cfreturn findTags(document,"<cf"&"argument ",">")>
  </cffunction>

  <cffunction name="getTagAttributes" access="private" returntype="struct" output="false" hint="Parses the attributes out of the given document for the first occurrence of the tag specified and returns a structure containing name value pairs for the tag attributes.">
    <cfargument name="document" type="string" required="true">
    <cfargument name="tagname" type="string" required="true">

    <cfset var startTag = "">
    <cfset var stAttributes = structNew()>
    <!--- fails on escaped quotes inside the attribute
    <cfset var regex = '[[:space:]][^=]+="[^"]*"' >
    --->
    <!---// get attributes, make sure we get escaped quotes inside quoted strings //--->
    <cfset var regex = '\s[^=]+=((("")|".*?[^"]"(?!"))|(('''')|''.*?[^'']''(?!'')))' />
    <cfset var aTmp = reFindNoCase('<#arguments.tagname#(#regex#)*[^>]*>',document,1,true)>
    <cfset var i = 1>
    <cfset var position = 1>
    <cfset var attribute = "">
    <cfset var attrName = "">
    <cfset var attrValue = "">

    <cfif NOT aTmp.pos[1]>
      <cfreturn stAttributes>
    </cfif>

    <!---// refactored code to use reMatch //--->
    <cfset startTag = mid(document,aTmp.pos[1],aTmp.len[1])>
    <cfset aTmp = reMatchNoCase(regex, startTag) />

    <cfloop index="i" array="#aTmp#">
      <cfset attribute = trim(i) />
      <cfset attrName = listFirst(attribute, "=") />
      <cfset attrValue = listRest(attribute, "=") />
      <cfset stAttributes[attrName] = mid(attrValue, 2, len(attrValue)-2) />
    </cfloop>

    <cfset stAttributes.fullTag = startTag>

    <cfreturn stAttributes>
  </cffunction>

  <cffunction name="parse" access="public" returntype="struct" output="false" hint="Provides the public interface to the CFC parser. This method should be passed the contents of a full ColdFusion component file.">
    <cfargument name="document" type="string" required="true">

    <cfset var cleanDocument = "">
    <cfset var stComponent = "">
    <cfset var aMethods = "">
    <cfset var i = "">
    <cfset var j = "">
    <cfset var stMethod = "">
    <cfset var stArgument = "">
    <cfset var aProperties = "">
    <cfset var stProperty = "">
    <cfset var aArguments = "">
    <cfset var attribStruct = structNew()>

    <cfset cleanDocument = removeComments(document)>

    <cfset stComponent = structNew()>
    <cfset stComponent.isInterface = false>
    <cfset stComponent.attributes = structNew()>
    <cfset stComponent.attributes.hint = "">
    <cfset stComponent.attributes.extends = "cfcomponent">
    <cfset stComponent.attributes.implements = "cfinterface">
    <cfset stComponent.attributes.displayname = "">
    <cfset stComponent.attributes.output = "">

    <!--- check to see if it is a component --->
    <cfset attribStruct = getTagAttributes(cleanDocument,'cfcomponent')>
    <cfif structIsEmpty(attribStruct)>
      <!--- if no attribs found, it might be an interface --->
      <cfset attribStruct = getTagAttributes(cleanDocument,'cfinterface')>
      <cfif NOT structIsEmpty(attribStruct)>
        <cfset stComponent.isInterface = true>
      </cfif>
    </cfif>
    <cfset structAppend(stComponent.attributes,attribStruct,true)>

    <cfset stComponent.properties = structNew()>

    <cfset aProperties = getProperties(cleanDocument)>

    <cfloop from="1" to="#arrayLen(aProperties)#" index="j">
        <cfset stProperty = structNew()>
        <cfset stProperty.name = "">
        <cfset stProperty.type = "any">
        <cfset stProperty.required = "false">
        <cfset stProperty.default = "_an_empty_string_">
        <cfset stProperty.displayName = "">
        <cfset stProperty.hint = "">
        <cfset structAppend(stProperty,getTagAttributes(aProperties[j].tagBlock,'cfproperty'),true)>
        <cfset stComponent.properties[stProperty.name] = stProperty>
      </cfloop>

    <cfset stComponent.methods = structNew()>

    <cfset aMethods = getMethods(cleanDocument)>

    <cfloop from="1" to="#arrayLen(aMethods)#" index="i">
      <cfset stMethod = structNew()>
      <cfset stMethod.name = "">
      <cfset stMethod.access = "public">
      <cfset stMethod.returnType = "any">
      <cfset stMethod.roles = "">
      <cfset stMethod.output = "">
      <cfset stMethod.displayname = "">
      <cfset stMethod.hint = "">
      <cfset structAppend(stMethod,getTagAttributes(aMethods[i].tagblock,'cffunction'),true)>
      <cfset stMethod.fullTag = aMethods[i].tagBlock>
      <cfset stComponent.methods[stMethod.name] = stMethod>

      <cfset stMethod.arguments = arrayNew(1)>
      <cfset aArguments = getArguments(aMethods[i].tagBlock)>

      <cfloop from="1" to="#arrayLen(aArguments)#" index="j">
        <cfset stArgument = structNew()>
        <cfset stArgument.name = "">
        <cfset stArgument.type = "any">
        <cfset stArgument.required = "false">
        <cfset stArgument.displayName = "">
        <cfset stArgument.hint = "">
        <cfset stArgument.default = "_an_empty_string_">
        <cfset structAppend(stArgument,getTagAttributes(aArguments[j].tagBlock,'cfargument'),true)>
        <cfset arrayAppend(stMethod.arguments,stArgument)>
      </cfloop>

    </cfloop>

    <cfreturn stComponent>
  </cffunction>

</cfcomponent>
NOTE:
I also updated all the unscoped variables!

[UPDATED: Thursday, February 24, 2011 at 2:21:00 PM]


Excluding a folder or file from being searched in Eclipse

The code base I work in every day includes a few external development tools that are included in our source code. We include the tools in the source code, because it's the easiest way to ensure that every developer has access to the tool and can run them quickly.

However, one disadvantage to include external development tools into your Eclipse projects is they can slow down searching your enclosing projects because they include files you generally don't really want to search. If you have a library such as "MXUnit" in your source, that adds hundreds of files that would be added to the search which you don't want.

One way you can avoid searching specific folders is to set up a "Workspace" and stick to searching workspaces. This works great, but requires that you maintain your workspace and keep it up to date. Our code base has thousands of files and hundreds of folders over several different projects, so I just find maintaining the workspace a little more tedious than I'd like.

I much prefer using the "Enclosing projects" option in Eclipse, because it searches the files in your active project. I find these is much easier than changing the workspace when I'm looking at files in a different project. So I did a little research this morning to see if there was an option to exclude folders/files from showing up in the search by default.

Turns out Eclipse has a feature called Derived resources:

Many resources get created in the course of translating, compiling, copying, or otherwise processing files that the user creates and edits.  Derived resources are resources that are not original data, and can be recreated from their source files.  It is common for derived files to be excluded from certain kinds of processing.

When you mark a folder or file as derived, it's excluded searches. The caveat is Eclipse assumes derived resources are probably compiled resources, so if you try to edit a derived file you will be warned that the file is derived. However, since these are external tools that I'm not needing to regularly update, this behavior is fine—and actually a nice reminder.

To set a folder as derived:

  1. Right-click on the file/folder
  2. Click on the "Properties" option
  3. Select the "Resource" option
  4. Place a check next to the "Derived" attribute

Eclipse Resource Screenshot

This has done the job nicely. I've been able to exclude several hundred files from showing up in my searches that I really don't care about!

NOTE: Generally derived resources are typically not files you'd keep in a repository, because they can easily be rebuilt from source code. While I had no problems setting a folder already in my SVN to derived, it's possible some source control plug-ins might treat a derived resource as being "ignored".


Fixing Subclipse/SVNKit in Eclipse from constantly asking for credentials

I've been working on migrating to Eclipse v3.6.1 x64, but was having a ton of issues getting Subversion/SVN support working in Eclipse. I prefer using Subclipse, so it was my goal to get that up and running because I'm used to the interface.

I originally tried using SVNKit, because I wanted to use native libraries. However, when I tried to use the SVNKit connector, every single time an SVN operation would occur it would prompt me over and over for my credentials—like it was saving them (although it was updating the .keystore.) After getting frustrate with this, I decided to just use the JavaHL libraries.

If you're using Eclipse 32-bit, then everything you need is included with the distributed versions of Subclipse. Since I'm using the 64-bit version of Eclipse, I needed to download Slik's Subversion v1.6.16 (x64). So, I closed down Eclipse and install the 64-bit client. I verified that everything was working correctly by running the javahltests.jar.

However, after restarting and configuring Subclipse to use JavaHL, I started experiencing issues with Eclipse hanging up. I'd constantly get messages like "Building workspace"—which would stay at 0% and never progress. Switching back to SVNKit was working, but the constant credential checking was driving me crazy. So, after lots of researching, I finally found a thread on authentication issues on the SVNKit mailing list. Turns out I am using NTML  and Basic authentication on our server and that was causing an issue with SVNKit.

The fix is to edit your eclipse.ini file and add this line somewhere after the "-vmargs" line:

-Dsvnkit.http.methods=Digest,Basic,Negotiate,NTLM

Depending on your configuration, you may need to tweak the order of the methods, but as soon as I restarted Eclipse with this line in my eclipse.ini everything started working like I expected.


Duplex scanning in Adobe Acrobat using a Brother MFC

I was trying to scan in a bunch of pages that had printing on both sides. Adobe Acrobat supports duplex scanning, even for scanners that don't have native support for duplex scanning. The problem is I couldn't get it to work with my Brother MFC-8820D printer/scanner. The problem appears to be using the "TW-Brother MFC-8820D USB". As soon as I switched to using the "WIA-Brother MFC-8820D" scanner driver everything starting working well.

So, if you need to perform a duplex scan on an automatic feeder that doesn't support it, here's how you do it in Adobe Acrobat 9:

  1. Go to File > Create PDF > From Scanner > Custom Scan…
  2. In the "Scanner" box, select "WIA-Brother MFC-8820D."

    image
    NOTE: You can safely ignore this message.
  3. In the "Sides" box, select "Both Sides"
  4. Configure the rest of the options as meet your needs
  5. Click "Scan"
  6. When the "Scan using Brother MFC-8820D" dialog appears, select "Document Feeder" as the "Paper source"
  7. Select the appropriate scanning options (recommend using at least 300dpi in the "Custom Settings")
  8. Make sure to choose the "Page size" that matches your paper size

    image
  9. Click "Scan"
  10. Once the scanning has finished, take the stack of paper that just went through the document feeder and put it face up in the exact order it just came out in (you feed in the last sheet first.)
  11. Acrobat will have already prompted with you the dialog to scan the other sides. From this dialog, choose the "Scan reverse sides" option:

    image
  12. Click "Ok"
  13. Repeat the steps in 6-9 to finish scanning the back side pages

That's all there is to it!


Performance issues with <cfqueryparam /> on first execution when using Microsoft SQL Server

The last couple of weeks have been interesting for me. I spent a bulk of the time tuning queries to optimize for performance and a learned a number of interesting things about Microsoft SQL Server and where some of the bottlenecks in our application actually lie.

One interesting situations I ran into was that I wasn't able to replicate some of the performance issues I was seeing in our application when I'd run the same query from Query Analyzer. Now there's a number of reasons why this could happen, but in my case I thought I was running the query in manner that mimicked what was being executed on the server—but it turns out I was missing a key step in the process and I'll get to that in a minute.

So, what was the problem? Let me explain.

Let's say we have the following query in ColdFusion. We'll call this "Method 1":

select
  Name, Email, Phone
from
  Employee
where
  DepartmentId = <cfqueryparam cfsqltype="cf_sql_integer" value="1" />

When you use the <cfqueryparam /> tag, ColdFusion translates the code into a bound parameter. This has the benefit of helping prevent SQL injections, but it often helps with performance because it allows the backend database (like Microsoft SQL Server) to cache the execution plan—which can help speed up performance. So roughly speaking, when the following query is executed on the server, it's translated to (which we'll call "Method 2":)

declare @p1 int
set @p1 = 1

select
  Name, Email, Phone
from
  Employee
where
  DepartmentId = @p1

You'll see this if you turn on ColdFusion debugging, and you can verify that even the name @p1 is used if you try manually adding a declare @p1 statement to your ColdFusion query.

However, when I'd run the bound version of my query in Microsoft SQL Server Management Studio, I was seeing dramatically different results. With a cleared cache, the "Method 1" query was running 7 times slower than "Method 2" when I'd run it in a query analyzer window. In the process of trying to troubleshoot the problem, I tried running the following in my ColdFusion query (which we'll call "Method 3":)

declare @p1 int
set @p1 = <cfqueryparam cfsqltype="cf_sql_integer" value="1" />

select
  Name, Email, Phone
from
  Employee
where
  DepartmentId = @p1

What's interesting about "Method 3" is I was now seeing the exact same performance as I was seeing when I run "Method 2" from my desktop locally. This had my confused and I knew I had to be missing something. I was obviously missing a key execution step that was missing between "Method 1" and "Method 2". What's even more confusing is that I had already been running SQL Profiler on the database server to catch the TSQL statements being executed—it's how I originally captured the query I was using in "Method  2" to run locally. The problem was I was missing a key event that I should have been logging—the RPC:Starting and RPC:Completed events.

Once I enabled the RPC:Starting event, I saw where the bottleneck was coming through.

When you use <cfqueryparam /> in SQL Server, the query isn't just translated directly into a bound statement, but instead it's first passed into the sp_prepexec system stored procedure.  So, instead of actually executing "Method 2" immediately, something like the following executed first:

declare @p1 int
set @p1=68
exec sp_prepexec @p1 output,N'@P1 int',N'select
  Name, Email, Phone
from
  Employee
where
  DepartmentId = @p1',190
select @p1

It was the call to the sp_prepexec stored procedure that was causing the huge difference in performance between "Method 1" and "Method 2". It turns out it was taking this stored procedure a lot of time to figure out the best execution plan to use before it could cache it.

To fix the problem, I just spent some time with my query (which actually ended up involving modifying some core system views) so that sp_prepexec could analyze my query and returned the cached plan faster.

So, if you're having performance issues—especially on first run execution—you might want to monitor your server and see if the bottleneck isn't with the sp_prepexec trying to figure out the best plan to cache.


Search for SQL in Microsoft SQL Server Management Studio with Red Gate's SQL Search

If you do any amount of SQL Server development, at some point in time you've come across a situation where you need to search your database for all occurrences of a table, column or some specific SQL statement that might be used in a view, stored procedure, trigger etc. In the past I've ended up just dump the db schema to a SQL file and searched the file manually, but that's pretty slow and not very elegant.

This morning though I found a better way using Red Gate's SQL Search, which is a free add-in for Microsoft SQL Server Management Studio. This add-in (which is currently free, but will eventually be integrated into the commercial SQL Prompt add-in) allows you to quickly search for strings within your database schema. What's really nice is double-clicking on a matching object takes you right to the object in the Object Explorer—which can save you a lot of time when you have a really large database (I work in an application w/several hundred tables!)

If you use SQL Server Management Studio, I highly recommend you download SQL Search and try it out.

NOTE:
If you're using Microsoft SQL Server Management Studio Express version, or you're "The Add-in 'SQL Search 1.0' failed to load or caused an exception. (Error number: 80070002)" when trying to start Management Studio after installing the add-in, you'll need the extensibility.dll installed before the add-in will work. You'll need to un-installing SQL Search, then download and install extensibility.dll and then re-install SQL Search.


Problems with BlogCFC and new Live Writer 2011

I haven't done a ton of blogging lately, it's amazing how busy you stay with a 6 month old baby in the house. Quite frankly, I'm enjoying being a father immensely, so that's certainly my highest priority with my free time nowadays. Anyway, I did finally get around to blogging a few things lately that I've run into at work.

After one recent blog entry, Brian Ghidinelli pinged me to tell my RSS feed was messed up. After a little researched I learned the problem was some of the characters (namely the em dash) were not properly escaped into HTML entities. I use Raymond Camden's BlogCFC for my blogging platform and new that at one time the xmlrpc code did work properly—because I had made a lot of bug fixes to get it working with Live Writer. However, my last few blog entries were written with the newer version of Live Writer, so what could have changed?

So I busted out my Charles HTTP Proxy and started debugging and I found the problem is that Live Writer does not send the charset being used when it sends the HTTP request to ColdFusion. Because it doesn't send that it's using UTF-8, ColdFusion ends up not handling the content in the request properly. UTF-8 characters end up coming through as a pattern of characters—for example, the em dash ("—") ends up coming through as —.

Now that I knew what the problem was, I needed to find a way to fix it. My first logical step was to make sure that the xmlrpc.cfm template was setting the page encoding to UTF-8, so I added this to the template:

<cfprocessingdirective pageencoding="utf-8" />

However, that didn't change the behavior at all. After much digging, I found I could use the Java page context object to explicitly set the character encoding for the incoming request using:

<cfset getPageContext().getRequest().getHttpRequest().setCharacterEncoding("UTF-8") />

Thankfully, this does fix the problem. So, if you're having problems getting Live Writer to work with BlogCFC, try changing the first few lines of your xmlrpc.cfm template to:

<cfprocessingdirective pageencoding="utf-8" />

<!---// this is required for Live Writer to ensure that the request is processed as UTF-8 //--->
<cfset responseCharset = getPageContext().getRequest().getHttpRequest().getCharacterEncoding() />
<cfif not structKeyExists(variables, "responseCharset")>
  <cfset getPageContext().getRequest().getHttpRequest().setCharacterEncoding("UTF-8") />
</cfif>

<cfcontent type="text/xml; charset=utf-8">

<cfsetting enablecfoutputonly=true>

NOTE: I use the responseCharset to check to see if an explicit character encoding has been supplied. If the value is null, then no charset was supplied so I override the encoding to UTF-8.


jNotify – A Light-weight Notification System for jQuery

One thing that's essential for any web application to be successful, is to implement a method for providing users with feedback based on their input. People want feedback based on their actions in real time. Users want to know that their actions had the intended consequence and if not, why it failed.

In the golden age of web applications, most applications solved this problem by returning a completely new page to the user any time they took an action on your website. If a user filled out a form on your website, the form's data would be sent to the server and the server would then generate a completely new page that either told the user that their action was successful or would show them the errors that occurred.

As web applications began to advance, more applications began to rely on JavaScript to provide more instantaneous feedback to users. If a user would leave a required field empty, developers would use a JavaScript alert() functions to notify users that they left the field incomplete. Over time, more and more applications have come to rely on JavaScript to provide users with a better user experience. The problem is that while the browsers have made many great strides over the years, we're still left with these very rudimentary UI controls for providing native feedback with users.

The alert(), prompt() and confirm() functions basically haven't changed at all since they were introduced. They offer essentially no control over the presentation and they're completely obtrusive to the user's experience. They all halt execution and force user action before any more processing occurs. While there are certain times this behavior is desired, there are many times when it's less than ideal.

For example, let's say you have you have a page that allows users to update bulk records in real time. On each row there is an "Update" button that allows a user to save updates back to the server. You obviously want to provide users with feedback after each update operation has completed. A quick solution is to use the alert() function when the update is successful.

The problems with this method are:

  • It's obtrusive—every time I display the "success" message to the user, I'm required to interact w/the alert box. This can really slow down the user's workflow.
  • No design control—I have no control over how the alert() dialog looks to the user. There's no difference between a successful message and an error message.
  • Only one message at at a time—I can never display multiple alerts to the user at the same time.
  • The alert() halts all activity—including any background tasks.

What we need is a better notification system that's just as easy to use, but allows us greater control over how the messages are display. This is where the jNotify Plug-in comes in. At just about 3KB, it's a very light-weight solution that gives us lots of flexibility over how we display notifications to the user. It allows us:

  • To display multiple messages to the user at one time
  • Use an HTML in the notification
  • Specify whether a notification is "sticky" or not (sticky messages require the user's interaction to close them)
  • Displays notifications in an unobtrusive manor—messages appear on the screen for short time, then fade away
  • Complete control over the positioning and appearance of the notification

I put together a short little video that illustrates some of the typical UI problems that the use of the alert() function introduces and show how jNotify can be used to solve those problems.

The jNotify Plug-in for jQuery

Implementing jNotify on your site is extremely easy. Just include the jquery.jnotify.js and jquery.notify.css files on your page, then substitute your use of alert() with $.jnotify().

So, instead of:
alert("An e-mail notification has been sent!");

You'd use:
$.jnotify("An e-mail notification has been sent!");

We've been using this plug-in in our own applications for over a year and we've had great success with it—especially when used with conjunction of providing feedback to users when AJAX operations complete.

You can see a live example on the jNotify jQuery Plug-in page.


Weird CSS issues with Internet Explorer 8 on <select> elements when using min-width

IE8 has this weird min-width issue with the <select> element, where the box model on the <select> element correctly honors the min-width style property, but the actual <option> elements when using a multiple <select> element end up using the maximum width of the widest <option> element.

Setting the width on the <option> elements has no effect and I want a pure CSS solution.

Here's an example of the bug:

Single selects work as expected

Multi-selects do not work as expected

Here's what the CSS looks like:

<style type="text/css">
    select {
        min-width: 100px;
    }
</style>

Does anyone know a pure CSS solution to fixing the <option> elements so that they inherit the width of the <select> element?

Here's what it looks like in Internet Explorer 8:

image

Notice how the selected elements don't span to the edge of the select element.

Here's how I would expect it to look:

image


Bug in ColdFusion's cfcontent within CustomTag execution end

I was working on a ColdFusion custom tag this afternoon, that is to throw an error if tag is not properly constructed. In order to clean out the output, I'm using the following ColdFusion code snippet to clear the output buffer:

<cfcontent reset="true" />

This usually works great, but for some reason the output was always been displayed twice. The code was not actually executing twice, just being displayed in the output stream twice when the custom tag was in the "end" execution method.

To illustrate this problem, save the following as "contentBug.cfm":

<cfif (thisTag.executionMode is "start")>
  <cfparam name="attributes.reset" type="boolean" default="false" />

  <!---// if we have an end tag, stop processing now (the rest of the code will be run on the tag end) //--->
  <cfif thisTag.hasEndTag>
    <cfexit method="exitTemplate" />
  </cfif>
</cfif>

<cfif attributes.reset><cfcontent reset="true" /></cfif>
<cfoutput><h1>test!</h1></cfoutput>
<cfdump var="#thisTag#">
<cfabort />

Now save the following in a file called "test.cfm":

<cf_contentBug reset="true" />
NOTE:
Pay attention to the tag syntax, the "/>" makes the code execute in the "end" execution method where the bug reveals itself.

When you run the test.cfm template, you'll see the following output:

image

Note that the "test!" data outputted twice. If you change the "reset" attribute to "false" or change the code to <cf_contentBug reset="true"> (no ending slash) the code is only generated once.

This bug shows up in both ColdFusion v8.0.1 and v9.0.

If I find a workaround (other than not using <cfcontent reset="true" /> in my code) I'll make sure to update this post!