dans.blog: Problems with CFINCLUDE & UTF-8 files...

I was talking with a friend this afternoon and we were discussing an issue his client was having. They were using the <cfinclude> tag to read in a UTF-8 file to display some "cached" data but were having a problems. It turns out that any actual UTF-8 encoded characters were not displaying correctly, because the included files were written with <cffile>, which does not write a BOM (Byte Order Mark) when saving UTF-8 data.

Fortunately, Tim Blair had already run into this problem and come up with a solution for writing UTF-8 files that have a BOM. I've taken his code and wrapped it up into a UDF:

function fileWriteUT8(sFilePath, sInput){
    // declare jWrite object
    var jWriter = "";
    // create the file stream
    var jFile = createobject("java", "java.io.File").init(sFilePath);
    var jStream = createobject("java", "java.io.FileOutputStream").init(jFile);
    // output the UTF-8 BOM byte by byte directly to the stream
    jStream.write(239); // 0xEF
    jStream.write(187); // 0xBB
    jStream.write(191); // 0xBF
    // create the UTF-8 file writer and write the file contents
    jWriter = createobject("java", "java.io.OutputStreamWriter");
    jWriter.init(jStream, "UTF-8");
    jWriter.write(sInput);
    // flush the output, clean up and close
    jWriter.flush();
    jWriter.close();
    jStream.close();

    return true;
}

You can now use the fileWriteUTF8() function to write UTF-8 files that contain the correct BOM so that <cfinclude> will correctly render the output.

NOTE:
There are a couple of other solutions to this problem. First, you can use <cffile action="read" charset="utf-8" variable="someVar"> to read the file and then output the contents of the variable. This works, but in the case of the client there cached files also contained some CFML, so they need to actually parse the file as well. The other solution that should work as well, would be to write a <cfprocessingdirective pageencoding="utf-8"> to the file when it's originally created. However, that's pretty darn ugly. Sounds to me if CFMX really should have a charset attribute for the <cfinclude> or at bare minimum, the <cffile> tag should write the BOM for UTF-8 encoded files.

Categories: HTML/ColdFusion, Java

1 Comments

PaulH
Dec 19, 2005 @ 7:58 PM

by definition, a BOM isn't strictly required for utf-8 encoding. a cfprocessingdirective in the included file usually helps in these situations. also i think you can just stuff the BOM into the file's data (at the beginning) and write it out via cffile without the charset option.

Comments for this entry have been disabled.