UDF for converting a PDF page to Images using CF8 & Java

Categories: HTML/ColdFusion, Source Code

I'm working on a project where I'm trying to create thumbnails for documents the user uploads. Since CF8 has introduced the <cfpdf /> tag, I thought it would be pretty straightforward to turn page 1 of a PDF into a thumbnail image—turns out I was wrong.

While the <cfpdf /> does work, it was causing me to jump through some various hoops some of which I could easily overcome. The issues I had were:

  1. The <cfpdf /> only allows you to create the images based on a scaled percentage. This is pretty pointless if you ask me—since I suspect different PDFs might generate different image sizes. I wanted my thumbnails scaled to fit a specific dimension.
  2. You can't specify the exact name of the file to be generated. You specify a "prefix" which is attached to each image and then it automatically appends the string "_page_N" (where "N" is the current page number.) This perfectly logically when you're exporting multiple pages, but in my case I only want the first page and I need to specify the exact file name.
  3. The <cfpdf /> tag holds a lock on the images created. I believe this because it never closes the java.io.File objects it creates for the new images. Since CF holds a lock on the file, this prevents me from being able to rename or delete the thumbnail image until the lock is released—whenever that might happen.

I was able to work around issues #1 & 2, but issue #3 was the one causing me the real issues. I mean I could write the files to a temp file and clean them up later, but I already felt like I was hacking too many things to get this to all work.

So, I thought I'd play around with the native Java objects to see if it wouldn't be easy to just write a ColdFusion UDF that would allow me to do exactly what I wanted with the image. It turns out it's pretty straightforward.

The power all lies in the PdfDecoder class. There's a lot a methods in that class (including some text extracting methods which I didn't get around to playing with.) Converting a page to an BufferedImage object is as easy as invoking either the getPageAsImage() or getPageAsTransparentImage() methods.

Since the PdfDecoder class returns a BufferedImage object, that made it really easy to manipulate further with CF8's built-in image handling functions. You just need to pass the BufferedImage to the imageNew() function.

So, within just a few minutes I was able to put together this little UDF:

<cffunction name="convertPdfToImage" access="public" returntype="string" output="false" hint="Attempts to create a thumbnail from a file.">
   <!---// define the arguments //--->
   <cfargument name="source" type="string" required="true" hint="The full path to the PDF document." />
   <cfargument name="destination" type="string" required="true" hint="The destination folder where to save the image." />
   <cfargument name="page" type="numeric" default="1" hint="The PDF page to convert to an image." />
   <cfargument name="type" type="string" default="png" hint="The type of image to write." />
   <cfargument name="width" type="numeric" default="-1" hint="If you specify both a width and height, the image will scale-to-fit those dimensions, otherwise the image is fullsize." />
   <cfargument name="height" type="numeric" default="-1" hint="If you specify both a width and height, the image will scale-to-fit those dimensions, otherwise the image is fullsize." />
   <cfargument name="highResolution" type="boolean" default="true" hint="Decide whether or not to use high quality rendering." />
   <cfargument name="transparent" type="boolean" default="false" hint="Should the image contain transparencies?" />
   <cfargument name="interpolation" type="string" default="highestQuality" hint="Specify the interpolation type when scaling the image." />
   <cfargument name="quality" type="numeric" default="0.8" hint="The JPG quality (if writing to JPG.)" />
   <!---// declare variables //--->
   <cfset var pdfDecode = "" />
   <cfset var imageToSave = "" />
   <cfset var newFile = arguments.destination & reReplaceNoCase(getFileFromPath(arguments.source), ".pdf$", "." & arguments.type) />
   <cftry>
      <cfscript>
      pdfDecode = createObject("java", "org.jpedal.PdfDecoder").init(javaCast("boolean", true));
      //createObject("java", "coldfusion.document.JPedalFontRegistry").init(pdfDecode);
      pdfDecode.showAnnotations = javaCast("boolean", false);
      pdfDecode.useHiResScreenDisplay(javaCast("boolean", arguments.highResolution));
      pdfDecode.setExtractionMode(javaCast("int", 0));
      pdfDecode.openPdfFile(javaCast("String", arguments.source));
      // if a password has been supplied, use the password
      if( structKeyExists(arguments, "password") )
         pdfDecode.setEncryptionPassword(javaCast("String", arguments.password));
      imageToSave = createObject("java", "java.awt.image.BufferedImage");
      // if creating a transparent image, do so now
      if(arguments.transparent)
         imageToSave = pdfDecode.getPageAsTransparentImage(javaCast("int", page));
      // otherwise, get the standard image
      else
         imageToSave = pdfDecode.getPageAsImage(javaCast("int", page));
      // close the PDF file
      pdfDecode.closePdfFile();
      /*
       * go back to native CF functions
       */
      // create a native CF image from the BufferedImage
      pdfImage = imageNew(imageToSave);
      // if we've specified a width/height, scale to those dimensions
      if( (arguments.width gt 0) and (arguments.height gt 0) )
         imageScaleToFit(pdfImage, width, height, interpolation);
      // write the image to disk
      imageWrite(pdfImage, newFile, arguments.quality);
      </cfscript>   
      <!---// if an error has occured, just return an empty string to indicate we couldn't process the PDF //--->
      <cfcatch type="any">
         <cfreturn "" />
      </cfcatch>
   </cftry>
   <cfreturn newFile />
</cffunction>

The file will be saved with the same name as the original PDF, but it will have whatever you specified for the file "type" as the extension. For example:

<cfset imgPath = convertPdfToImage(
   expandPath(".") & "\attachments\" & "my.pdf"
   , expandPath(".") & "\attachments\thumbnails\"
   , 1
   , "png"
   , 64
   , 64
) />

<cfif len(imgPath)>
   <cfoutput>
      <img src="./attachments/thumbnails/#getFileFromPath(imgPath)#" />
   </cfoutput>
<cfelse>
   <h1>Could not process PDF</h1>
</cfif>

The code above would create a file in the "thumbnails" folder titled "my.png" that is scaled to fit the dimensions 64 x 64. The UDF returns the path to the file it wrote, unless it was not able to write an image from the PDF in which case it returns an empty string.

There's actually a lot of interesting looking things in the PdfDecoder class. When I have more time, I'll have to go back and play with some of the other methods.

Comments

Sami Hoda's Gravatar Very nice and very useful! You should put this on RiaForge.
Dan G. Switzer, II's Gravatar @Sami:

I'll definitely consider it, but it's probably a better fit for CFLib.org. Of course if I end up developing a number of other PDF-related functions, then I'd move those to a CFC.
Ed's Gravatar Nice work. This looks like an ideal candidate for an addition to to Ray Camden's pdfUtils CFC on riaForge.
Dan G. Switzer, II's Gravatar @Ed:

I sent Raymond an e-mail letting him know he was welcome to include the function in the pdfUtils component. That seems like a good fit for the function.
Julian Halliwell's Gravatar Great work, Dan. It does seem odd that the naming and sizing options in cfpdf-thumbnail are so limited, but I've not experienced the locking issue when renaming the thumbnail file immediately after creation.
Dan G. Switzer, II's Gravatar @Julian:

I am running v8.01, so perhaps it's an issue that was just introduced. Sometimes CF does release the lock, but more often than not for me it fails. Also, I believe if I dumped out multiple pages, it was only the last page in the series that had the lock (but I could be wrong--I didn't test it to thoroughly.)

Also, just a note on the locking: I can overwrite the file or copy the file, I just can't delete it or rename it.
Julian Halliwell's Gravatar Dan, I'm afraid you seem to be right. Previously I was renaming the generated thumbnail images with 100% success, but just tested with 8.0.1 applied and it consistently fails with the "invalid source" error that I guess means it's locked.

Need to report this as a regression to Adobe. Meantime thank goodness for your UDF!
Dan G. Switzer, II's Gravatar @Julian:

Yes, that's the error that can occur when a file is locked.
Julian Halliwell's Gravatar Dan, using your UDF I still got a locking error when I tried to rename the image, arising from the CF imageNew/Write functions no doubt. I know renaming isn't necessary if you're happy using the original PDF filename, but I have a different naming scheme for images.

So I've just modified your function to allow an optional "imageFilename" argument which defaults to the PDF filename:

<cfargument name="imageFilename" type="string" required="false" default="#ReReplaceNoCase(getFileFromPath(arguments.source),'.pdf$','')#" hint="The filename (without extension) of the image to be created">

The "newFile" var then just concatenates the destination, imageFilename and type arguments.

I've also submitted a bug/regression report to Adobe for the cfpdf tag in 8.0.1.

Cheers, Julian.
Hemant's Gravatar Hi,

A hot fix for several cfimage and image functions has been released from engineering. A technote, "Patch for CFImage and Image functions in ColdFusion 8.0.1" with the hot fix will post at http://www.adobe.com/go/kb403411 in the next week or so.

Thanks,
Hemant

Add Comment



If you subscribe, any new posts to this thread will be sent to your email address.