Problems with BlogCFC and new Live Writer 2011
I haven't done a ton of blogging lately, it's amazing how busy you stay with a 6 month old baby in the house. Quite frankly, I'm enjoying being a father immensely, so that's certainly my highest priority with my free time nowadays. Anyway, I did finally get around to blogging a few things lately that I've run into at work.
After one recent blog entry, Brian Ghidinelli pinged me to tell my RSS feed was messed up. After a little researched I learned the problem was some of the characters (namely the em dash) were not properly escaped into HTML entities. I use Raymond Camden's BlogCFC for my blogging platform and new that at one time the xmlrpc code did work properly—because I had made a lot of bug fixes to get it working with Live Writer. However, my last few blog entries were written with the newer version of Live Writer, so what could have changed?
So I busted out my Charles HTTP Proxy and started debugging and I found the problem is that Live Writer does not send the charset being used when it sends the HTTP request to ColdFusion. Because it doesn't send that it's using UTF-8, ColdFusion ends up not handling the content in the request properly. UTF-8 characters end up coming through as a pattern of characters—for example, the em dash ("—") ends up coming through as —.
Now that I knew what the problem was, I needed to find a way to fix it. My first logical step was to make sure that the xmlrpc.cfm template was setting the page encoding to UTF-8, so I added this to the template:
<cfprocessingdirective pageencoding="utf-8" />
However, that didn't change the behavior at all. After much digging, I found I could use the Java page context object to explicitly set the character encoding for the incoming request using:
<cfset getPageContext().getRequest().getHttpRequest().setCharacterEncoding("UTF-8") />
Thankfully, this does fix the problem. So, if you're having problems getting Live Writer to work with BlogCFC, try changing the first few lines of your xmlrpc.cfm template to:
<cfprocessingdirective pageencoding="utf-8" /> <!---// this is required for Live Writer to ensure that the request is processed as UTF-8 //---> <cfset responseCharset = getPageContext().getRequest().getHttpRequest().getCharacterEncoding() /> <cfif not structKeyExists(variables, "responseCharset")> <cfset getPageContext().getRequest().getHttpRequest().setCharacterEncoding("UTF-8") /> </cfif> <cfcontent type="text/xml; charset=utf-8"> <cfsetting enablecfoutputonly=true>
NOTE: I use the responseCharset to check to see if an explicit character encoding has been supplied. If the value is null, then no charset was supplied so I override the encoding to UTF-8.
Comments
The code could also benefit from something that parsed for any non-ASCII characters and replaced w/an HTML entity. The code currently checks for the common characters (like em dash, en dash, ellipse) but it doesn't check for every character, something on the lines of what Ben Nadel did:
http://www.bennadel.com/blog/1155-Cleaning-High-As...
(NOTE: Although I'd remove his check for 8230 and leave that as an actual entity--actually, I might remove the checks for smart quotes too just in case people like them.)
I have it at the top of my script, not the bottom. It shouldn't matter where it is really, other than it occurs before any output.
