Problems with BlogCFC and new Live Writer 2011

Categories: HTML/ColdFusion

I haven't done a ton of blogging lately, it's amazing how busy you stay with a 6 month old baby in the house. Quite frankly, I'm enjoying being a father immensely, so that's certainly my highest priority with my free time nowadays. Anyway, I did finally get around to blogging a few things lately that I've run into at work.

After one recent blog entry, Brian Ghidinelli pinged me to tell my RSS feed was messed up. After a little researched I learned the problem was some of the characters (namely the em dash) were not properly escaped into HTML entities. I use Raymond Camden's BlogCFC for my blogging platform and new that at one time the xmlrpc code did work properly—because I had made a lot of bug fixes to get it working with Live Writer. However, my last few blog entries were written with the newer version of Live Writer, so what could have changed?

So I busted out my Charles HTTP Proxy and started debugging and I found the problem is that Live Writer does not send the charset being used when it sends the HTTP request to ColdFusion. Because it doesn't send that it's using UTF-8, ColdFusion ends up not handling the content in the request properly. UTF-8 characters end up coming through as a pattern of characters—for example, the em dash ("—") ends up coming through as —.

Now that I knew what the problem was, I needed to find a way to fix it. My first logical step was to make sure that the xmlrpc.cfm template was setting the page encoding to UTF-8, so I added this to the template:

<cfprocessingdirective pageencoding="utf-8" />

However, that didn't change the behavior at all. After much digging, I found I could use the Java page context object to explicitly set the character encoding for the incoming request using:

<cfset getPageContext().getRequest().getHttpRequest().setCharacterEncoding("UTF-8") />

Thankfully, this does fix the problem. So, if you're having problems getting Live Writer to work with BlogCFC, try changing the first few lines of your xmlrpc.cfm template to:

<cfprocessingdirective pageencoding="utf-8" />

<!---// this is required for Live Writer to ensure that the request is processed as UTF-8 //--->
<cfset responseCharset = getPageContext().getRequest().getHttpRequest().getCharacterEncoding() />
<cfif not structKeyExists(variables, "responseCharset")>
  <cfset getPageContext().getRequest().getHttpRequest().setCharacterEncoding("UTF-8") />
</cfif>

<cfcontent type="text/xml; charset=utf-8">

<cfsetting enablecfoutputonly=true>

NOTE: I use the responseCharset to check to see if an explicit character encoding has been supplied. If the value is null, then no charset was supplied so I override the encoding to UTF-8.

Comments

Raymond Camden's Gravatar I'll get this into source next week. Thanks!
Dan G. Switzer, II's Gravatar @Raymond:

The code could also benefit from something that parsed for any non-ASCII characters and replaced w/an HTML entity. The code currently checks for the common characters (like em dash, en dash, ellipse) but it doesn't check for every character, something on the lines of what Ben Nadel did:

http://www.bennadel.com/blog/1155-Cleaning-High-As...

(NOTE: Although I'd remove his check for 8230 and leave that as an actual entity--actually, I might remove the checks for smart quotes too just in case people like them.)
Robert Zehnder's Gravatar Good catch, Dan. When I first started using WLW ages ago with BlogCFC I originally had to hack up the XMLRPC handler to work with it. It is definitely a beast.
Raymond Camden's Gravatar Any reason why you put the cfcontent up there? It's at the bottom already.
Dan G. Switzer, II's Gravatar @Raymond:

I have it at the top of my script, not the bottom. It shouldn't matter where it is really, other than it occurs before any output.
Raymond Camden's Gravatar Ok. It will be in the next release.

Add Comment

Leave this field empty


If you subscribe, any new posts to this thread will be sent to your email address.