UDF for converting a PDF page to Images using CF8 & Java
I'm working on a project where I'm trying to create thumbnails for documents the user uploads. Since CF8 has introduced the <cfpdf /> tag, I thought it would be pretty straightforward to turn page 1 of a PDF into a thumbnail image—turns out I was wrong.
While the <cfpdf /> does work, it was causing me to jump through some various hoops some of which I could easily overcome. The issues I had were:
- The <cfpdf /> only allows you to create the images based on a scaled percentage. This is pretty pointless if you ask me—since I suspect different PDFs might generate different image sizes. I wanted my thumbnails scaled to fit a specific dimension.
- You can't specify the exact name of the file to be generated. You specify a "prefix" which is attached to each image and then it automatically appends the string "_page_N" (where "N" is the current page number.) This perfectly logically when you're exporting multiple pages, but in my case I only want the first page and I need to specify the exact file name.
- The <cfpdf /> tag holds a lock on the images created. I believe this because it never closes the java.io.File objects it creates for the new images. Since CF holds a lock on the file, this prevents me from being able to rename or delete the thumbnail image until the lock is released—whenever that might happen.
I was able to work around issues #1 & 2, but issue #3 was the one causing me the real issues. I mean I could write the files to a temp file and clean them up later, but I already felt like I was hacking too many things to get this to all work.
So, I thought I'd play around with the native Java objects to see if it wouldn't be easy to just write a ColdFusion UDF that would allow me to do exactly what I wanted with the image. It turns out it's pretty straightforward.
The power all lies in the PdfDecoder class. There's a lot a methods in that class (including some text extracting methods which I didn't get around to playing with.) Converting a page to an BufferedImage object is as easy as invoking either the getPageAsImage() or getPageAsTransparentImage() methods.
Since the PdfDecoder class returns a BufferedImage object, that made it really easy to manipulate further with CF8's built-in image handling functions. You just need to pass the BufferedImage to the imageNew() function.
So, within just a few minutes I was able to put together this little UDF:
<!---// define the arguments //--->
<cfargument name="source" type="string" required="true" hint="The full path to the PDF document." />
<cfargument name="destination" type="string" required="true" hint="The destination folder where to save the image." />
<cfargument name="page" type="numeric" default="1" hint="The PDF page to convert to an image." />
<cfargument name="type" type="string" default="png" hint="The type of image to write." />
<cfargument name="width" type="numeric" default="-1" hint="If you specify both a width and height, the image will scale-to-fit those dimensions, otherwise the image is fullsize." />
<cfargument name="height" type="numeric" default="-1" hint="If you specify both a width and height, the image will scale-to-fit those dimensions, otherwise the image is fullsize." />
<cfargument name="highResolution" type="boolean" default="true" hint="Decide whether or not to use high quality rendering." />
<cfargument name="transparent" type="boolean" default="false" hint="Should the image contain transparencies?" />
<cfargument name="interpolation" type="string" default="highestQuality" hint="Specify the interpolation type when scaling the image." />
<cfargument name="quality" type="numeric" default="0.8" hint="The JPG quality (if writing to JPG.)" />
<!---// declare variables //--->
<cfset var pdfDecode = "" />
<cfset var imageToSave = "" />
<cfset var newFile = arguments.destination & reReplaceNoCase(getFileFromPath(arguments.source), ".pdf$", "." & arguments.type) />
<cftry>
<cfscript>
pdfDecode = createObject("java", "org.jpedal.PdfDecoder").init(javaCast("boolean", true));
//createObject("java", "coldfusion.document.JPedalFontRegistry").init(pdfDecode);
pdfDecode.showAnnotations = javaCast("boolean", false);
pdfDecode.useHiResScreenDisplay(javaCast("boolean", arguments.highResolution));
pdfDecode.setExtractionMode(javaCast("int", 0));
pdfDecode.openPdfFile(javaCast("String", arguments.source));
// if a password has been supplied, use the password
if( structKeyExists(arguments, "password") )
pdfDecode.setEncryptionPassword(javaCast("String", arguments.password));
imageToSave = createObject("java", "java.awt.image.BufferedImage");
// if creating a transparent image, do so now
if(arguments.transparent)
imageToSave = pdfDecode.getPageAsTransparentImage(javaCast("int", page));
// otherwise, get the standard image
else
imageToSave = pdfDecode.getPageAsImage(javaCast("int", page));
// close the PDF file
pdfDecode.closePdfFile();
/*
* go back to native CF functions
*/
// create a native CF image from the BufferedImage
pdfImage = imageNew(imageToSave);
// if we've specified a width/height, scale to those dimensions
if( (arguments.width gt 0) and (arguments.height gt 0) )
imageScaleToFit(pdfImage, width, height, interpolation);
// write the image to disk
imageWrite(pdfImage, newFile, arguments.quality);
</cfscript>
<!---// if an error has occured, just return an empty string to indicate we couldn't process the PDF //--->
<cfcatch type="any">
<cfreturn "" />
</cfcatch>
</cftry>
<cfreturn newFile />
</cffunction>
The file will be saved with the same name as the original PDF, but it will have whatever you specified for the file "type" as the extension. For example:
expandPath(".") & "\attachments\" & "my.pdf"
, expandPath(".") & "\attachments\thumbnails\"
, 1
, "png"
, 64
, 64
) />
<cfif len(imgPath)>
<cfoutput>
<img src="./attachments/thumbnails/#getFileFromPath(imgPath)#" />
</cfoutput>
<cfelse>
<h1>Could not process PDF</h1>
</cfif>
The code above would create a file in the "thumbnails" folder titled "my.png" that is scaled to fit the dimensions 64 x 64. The UDF returns the path to the file it wrote, unless it was not able to write an image from the PDF in which case it returns an empty string.
There's actually a lot of interesting looking things in the PdfDecoder class. When I have more time, I'll have to go back and play with some of the other methods.

Comments
I'll definitely consider it, but it's probably a better fit for CFLib.org. Of course if I end up developing a number of other PDF-related functions, then I'd move those to a CFC.
I sent Raymond an e-mail letting him know he was welcome to include the function in the pdfUtils component. That seems like a good fit for the function.
I am running v8.01, so perhaps it's an issue that was just introduced. Sometimes CF does release the lock, but more often than not for me it fails. Also, I believe if I dumped out multiple pages, it was only the last page in the series that had the lock (but I could be wrong--I didn't test it to thoroughly.)
Also, just a note on the locking: I can overwrite the file or copy the file, I just can't delete it or rename it.
Need to report this as a regression to Adobe. Meantime thank goodness for your UDF!
Yes, that's the error that can occur when a file is locked.
So I've just modified your function to allow an optional "imageFilename" argument which defaults to the PDF filename:
<cfargument name="imageFilename" type="string" required="false" default="#ReReplaceNoCase(getFileFromPath(arguments.source),'.pdf$','')#" hint="The filename (without extension) of the image to be created">
The "newFile" var then just concatenates the destination, imageFilename and type arguments.
I've also submitted a bug/regression report to Adobe for the cfpdf tag in 8.0.1.
Cheers, Julian.
A hot fix for several cfimage and image functions has been released from engineering. A technote, "Patch for CFImage and Image functions in ColdFusion 8.0.1" with the hot fix will post at http://www.adobe.com/go/kb403411 in the next week or so.
Thanks,
Hemant