Calculating the visual length of a string

Posted by Dan on Aug 31, 2006 @ 2:39 PM

Sorry I haven't blogged in a while, but I've been very busy working on a project. Part of the project requires that I convert HTML to formatted plain text. On the very surface, this may seem simple (just use a RegEx to remove the HTML,) but the key word in that first sentence was "formatted."

One of the many issues I've run into, is that none of the built-in ColdFusion string manipulation functions account for the "visual" length of a string. Since one of the things I needed to do was wrap text after XX number of visual characters; I needed a function that, unlike the standard len() function, would return the length of a string as it would appear on the screen. This means I have to take into account how many "spaces" a tab would occupy on the screen.

My first attempt was simply to count every tab character (chr(9)) as 8 spaces. While this number assured I would never go past the right edge of the content, it wasn't very accurate (as a tab can very between 1 space to 8 spaces in Windows.) I quickly started running into problems when I realized that for some functionality (like centering text,) I'd really need an accurate account of the total number of visual spaces a string was occupying.

As I was thinking about the problem, I decided to do a quick Google search to see if I could find anything that solved the problem. I actually came across a post from mailing list dedicated to NEdit (an X Window editor.) While the solution is written for an NEdit macro, the logic was easily replicated in ColdFusion.

So, here's the code translated for ColdFusion. If you're wondering, the wrapText() UDF I wrote supports auto-indenting, smart indenting (for ordered/unordered lists), prepending/appending data to each line. It also correctly wraps lines based upon the visual representation of the string—unlike the built-in wrap() function which assumes a tab occupies a single space.

<cffunction name="getVisualLen" access="private" output="false" returntype="numeric"
    hint="Gets the visual length of a string; converting tabs to actual visual space used">

    <cfargument name="text" type="string" required="true" />
    <cfargument name="tabSize" type="numeric" required="false" default="8" />

    <cfscript>
    // get the text to check the visual length of
    var sText = arguments.text;
    // get the visual column length
    var iColumn = 0;
    // find any tabs
    var oFindTab = reFind("\t+", sText, iColumn, true);
    // get the end position of the first tab
    var iEndPos = oFindTab.pos[1] + oFindTab.len[1];
    // this is a copy of iEndPos before re-doing the regex
    var iLastPos = 0;

    // loop through the string find each tab position
    while( iEndPos GT 0 ){
        // add non-tab char widths
        iColumn = iColumn + oFindTab.pos[1] - iLastPos;
        // add variable width of first tab in tab sequence
        iColumn = iColumn + (arguments.tabSize - (iColumn MOD arguments.tabSize));
        // add width of following tabs in tab sequence
        iColumn = iColumn + ((iEndPos - oFindTab.pos[1] - 1) * arguments.tabSize);
        // set the last position checked
        iLastPos = iEndPos - 1;

        // find the next set of tabs
        oFindTab = reFind("\t+", sText, iEndPos, true);
        // set the new end position
        iEndPos = oFindTab.pos[1] + oFindTab.len[1];
    }

    // add the current column to the length of the string, minus the end position
    // this is the end of the string
    iColumn = iColumn + (len(sText) - iLastPos);

    // return the visual length of the string
    return iColumn;
    
</cfscript>
</cffunction>
NOTE:
The above UDF does not account for some hidden visual characters. You may need to modify the code to account for various other characters (i.e. carriage return.) In my project, I'm dealing w/individual lines from a block of text, where each line has the cr/nl stripped out.
Categories: HTML/ColdFusion, Java, Source Code

Comments for this entry have been disabled.