Calculating the visual length of a string

Categories: HTML/ColdFusion, Source Code, Java

Sorry I haven't blogged in a while, but I've been very busy working on a project. Part of the project requires that I convert HTML to formatted plain text. On the very surface, this may seem simple (just use a RegEx to remove the HTML,) but the key word in that first sentence was "formatted."

One of the many issues I've run into, is that none of the built-in ColdFusion string manipulation functions account for the "visual" length of a string. Since one of the things I needed to do was wrap text after XX number of visual characters; I needed a function that, unlike the standard len() function, would return the length of a string as it would appear on the screen. This means I have to take into account how many "spaces" a tab would occupy on the screen.

My first attempt was simply to count every tab character (chr(9)) as 8 spaces. While this number assured I would never go past the right edge of the content, it wasn't very accurate (as a tab can very between 1 space to 8 spaces in Windows.) I quickly started running into problems when I realized that for some functionality (like centering text,) I'd really need an accurate account of the total number of visual spaces a string was occupying.

As I was thinking about the problem, I decided to do a quick Google search to see if I could find anything that solved the problem. I actually came across a post from mailing list dedicated to NEdit (an X Window editor.) While the solution is written for an NEdit macro, the logic was easily replicated in ColdFusion.

So, here's the code translated for ColdFusion. If you're wondering, the wrapText() UDF I wrote supports auto-indenting, smart indenting (for ordered/unordered lists), prepending/appending data to each line. It also correctly wraps lines based upon the visual representation of the string—unlike the built-in wrap() function which assumes a tab occupies a single space.

<cffunction name="getVisualLen" access="private" output="false" returntype="numeric"
   hint="Gets the visual length of a string; converting tabs to actual visual space used">

   <cfargument name="text" type="string" required="true" />
   <cfargument name="tabSize" type="numeric" required="false" default="8" />
   <cfscript>
   // get the text to check the visual length of
   var sText = arguments.text;
   // get the visual column length
   var iColumn = 0;
   // find any tabs
   var oFindTab = reFind("\t+", sText, iColumn, true);
   // get the end position of the first tab
   var iEndPos = oFindTab.pos[1] + oFindTab.len[1];
   // this is a copy of iEndPos before re-doing the regex
   var iLastPos = 0;
   // loop through the string find each tab position
   while( iEndPos GT 0 ){
      // add non-tab char widths
      iColumn = iColumn + oFindTab.pos[1] - iLastPos;
      // add variable width of first tab in tab sequence
      iColumn = iColumn + (arguments.tabSize - (iColumn MOD arguments.tabSize));
      // add width of following tabs in tab sequence
      iColumn = iColumn + ((iEndPos - oFindTab.pos[1] - 1) * arguments.tabSize);
      // set the last position checked
      iLastPos = iEndPos - 1;
      // find the next set of tabs
      oFindTab = reFind("\t+", sText, iEndPos, true);
      // set the new end position
      iEndPos = oFindTab.pos[1] + oFindTab.len[1];
   }
   // add the current column to the length of the string, minus the end position
   // this is the end of the string
   iColumn = iColumn + (len(sText) - iLastPos);
   // return the visual length of the string
   return iColumn;
   </cfscript>
</cffunction>
NOTE:
The above UDF does not account for some hidden visual characters. You may need to modify the code to account for various other characters (i.e. carriage return.) In my project, I'm dealing w/individual lines from a block of text, where each line has the cr/nl stripped out.

Add Comment



If you subscribe, any new posts to this thread will be sent to your email address.