UDF for converting a PDF page to Images using CF8 & Java

Posted by Dan on Apr 8, 2008 @ 3:51 PM

[UPDATED: Monday, November 21, 2011 at 8:33:18 AM]

I'm working on a project where I'm trying to create thumbnails for documents the user uploads. Since CF8 has introduced the <cfpdf /> tag, I thought it would be pretty straightforward to turn page 1 of a PDF into a thumbnail image—turns out I was wrong.

While the <cfpdf /> does work, it was causing me to jump through some various hoops some of which I could easily overcome. The issues I had were:

  1. The <cfpdf /> only allows you to create the images based on a scaled percentage. This is pretty pointless if you ask me—since I suspect different PDFs might generate different image sizes. I wanted my thumbnails scaled to fit a specific dimension.
  2. You can't specify the exact name of the file to be generated. You specify a "prefix" which is attached to each image and then it automatically appends the string "_page_N" (where "N" is the current page number.) This perfectly logically when you're exporting multiple pages, but in my case I only want the first page and I need to specify the exact file name.
  3. The <cfpdf /> tag holds a lock on the images created. I believe this because it never closes the java.io.File objects it creates for the new images. Since CF holds a lock on the file, this prevents me from being able to rename or delete the thumbnail image until the lock is released—whenever that might happen.

I was able to work around issues #1 & 2, but issue #3 was the one causing me the real issues. I mean I could write the files to a temp file and clean them up later, but I already felt like I was hacking too many things to get this to all work.

So, I thought I'd play around with the native Java objects to see if it wouldn't be easy to just write a ColdFusion UDF that would allow me to do exactly what I wanted with the image. It turns out it's pretty straightforward.

The power all lies in the PdfDecoder class. There's a lot a methods in that class (including some text extracting methods which I didn't get around to playing with.) Converting a page to an BufferedImage object is as easy as invoking either the getPageAsImage() or getPageAsTransparentImage() methods.

Since the PdfDecoder class returns a BufferedImage object, that made it really easy to manipulate further with CF8's built-in image handling functions. You just need to pass the BufferedImage to the imageNew() function.

So, within just a few minutes I was able to put together this little UDF:

<cffunction name="convertPdfToImage" access="public" returntype="string" output="false" hint="Attempts to create a thumbnail from a file.">
    <!---// define the arguments //--->
    <cfargument name="source" type="string" required="true" hint="Full filepath to source file" />
    <cfargument name="destination" type="string" required="true" hint="Destination folder for image" />
    <cfargument name="page" type="numeric" default="1" hint="Page number in the PDF to convert to an image" />
    <cfargument name="type" type="string" default="png" hint="Type of image to create (i.e. PNG, JPG, etc)" />
    <cfargument name="width" type="numeric" default="-1" hint="Width of image (specifying both a width and height for the image to scale-to-fit, otherwise the image is fullsize)" />
    <cfargument name="height" type="numeric" default="-1" hint="Height of image (specifying both a width and height for the image to scale-to-fit, otherwise the image is fullsize)" />
    <cfargument name="highResolution" type="boolean" default="true" hint="Indicates whether or not to use high quality rendering" />
    <cfargument name="transparent" type="boolean" default="false" hint="Indicates whether or not the page should be retrieved as a transparent image" />
    <cfargument name="interpolation" type="string" default="highestQuality" hint="Interpolation method used for resampling" />
    <cfargument name="quality" type="numeric" default="0.8" hint="Defines the JPEG quality used to encode the image" />
    <!---// declare variables //--->
    <cfset var pdfDecode = "" />
    <cfset var pdfImage = "" />
    <cfset var imageToSave = "" />
    <cfset var newFile = arguments.destination & reReplaceNoCase(getFileFromPath(arguments.source), "\.pdf$", "." & arguments.type) />
    <cftry>
        <cfscript>
            pdfDecode = createObject("java", "org.jpedal.PdfDecoder").init(javaCast("boolean", true));
            // the version of PdfDecoder in cf8 supports showing annotations
            if( structKeyExists(pdfDecode, "showAnnotations") ) pdfDecode.showAnnotations = javaCast("boolean", false);
            pdfDecode.useHiResScreenDisplay(javaCast("boolean", arguments.highResolution));
            pdfDecode.setExtractionMode(javaCast("int", 0));
            pdfDecode.openPdfFile(javaCast("String", arguments.source));
            // if a password has been supplied, use the password
            if( structKeyExists(arguments, "password") )
                pdfDecode.setEncryptionPassword(javaCast("String", arguments.password));
            imageToSave = createObject("java", "java.awt.image.BufferedImage");
            // if creating a transparent image, do so now
            if(arguments.transparent)
                    imageToSave = pdfDecode.getPageAsTransparentImage(javaCast("int", page));
            // otherwise, get the standard image
            else
                    imageToSave = pdfDecode.getPageAsImage(javaCast("int", page));
            // close the PDF file
            pdfDecode.closePdfFile();
            /*
             * go back to native CF functions
             */

            // create a native CF image from the BufferedImage
            pdfImage = imageNew(imageToSave);
            // if we've specified a width/height, scale to those dimensions
            if( (arguments.width gt 0) and (arguments.height gt 0) )
                imageScaleToFit(pdfImage, width, height, interpolation);
            // write the image to disk
            imageWrite(pdfImage, newFile, arguments.quality);
        
</cfscript>
        <!---// if an error has occured, just return an empty string to indicate we couldn't process the PDF //--->
        <cfcatch type="any">
            <cfreturn "" />
        </cfcatch>
    </cftry>
    <cfreturn newFile />
</cffunction>

UPDATE:
I've updated the code to work with ColdFusion 9. CF9 does not support the "showAnnotations" option, so we just need to exclude it.

The file will be saved with the same name as the original PDF, but it will have whatever you specified for the file "type" as the extension. For example:

<cfset imgPath = convertPdfToImage(
    expandPath(".") & "\attachments\" & "my.pdf"
    , expandPath(".") & "\attachments\thumbnails\"
    , 1
    , "png"
    , 64
    , 64
) /
>

<cfif len(imgPath)>
    <cfoutput>
        <img src="./attachments/thumbnails/#getFileFromPath(imgPath)#" />
    </cfoutput>
<cfelse>
    <h1>Could not process PDF</h1>
</cfif>

The code above would create a file in the "thumbnails" folder titled "my.png" that is scaled to fit the dimensions 64 x 64. The UDF returns the path to the file it wrote, unless it was not able to write an image from the PDF in which case it returns an empty string.

There's actually a lot of interesting looking things in the PdfDecoder class. When I have more time, I'll have to go back and play with some of the other methods.

Categories: HTML/ColdFusion, Source Code

20 Comments

  • Very nice and very useful! You should put this on RiaForge.
  • @Sami:

    I'll definitely consider it, but it's probably a better fit for CFLib.org. Of course if I end up developing a number of other PDF-related functions, then I'd move those to a CFC.
  • Nice work. This looks like an ideal candidate for an addition to to Ray Camden's pdfUtils CFC on riaForge.
  • @Ed:

    I sent Raymond an e-mail letting him know he was welcome to include the function in the pdfUtils component. That seems like a good fit for the function.
  • Great work, Dan. It does seem odd that the naming and sizing options in cfpdf-thumbnail are so limited, but I've not experienced the locking issue when renaming the thumbnail file immediately after creation.
  • @Julian:

    I am running v8.01, so perhaps it's an issue that was just introduced. Sometimes CF does release the lock, but more often than not for me it fails. Also, I believe if I dumped out multiple pages, it was only the last page in the series that had the lock (but I could be wrong--I didn't test it to thoroughly.)

    Also, just a note on the locking: I can overwrite the file or copy the file, I just can't delete it or rename it.
  • Dan, I'm afraid you seem to be right. Previously I was renaming the generated thumbnail images with 100% success, but just tested with 8.0.1 applied and it consistently fails with the "invalid source" error that I guess means it's locked.

    Need to report this as a regression to Adobe. Meantime thank goodness for your UDF!
  • @Julian:

    Yes, that's the error that can occur when a file is locked.
  • Dan, using your UDF I still got a locking error when I tried to rename the image, arising from the CF imageNew/Write functions no doubt. I know renaming isn't necessary if you're happy using the original PDF filename, but I have a different naming scheme for images.

    So I've just modified your function to allow an optional "imageFilename" argument which defaults to the PDF filename:

    <cfargument name="imageFilename" type="string" required="false" default="#ReReplaceNoCase(getFileFromPath(arguments.source),'.pdf$','')#" hint="The filename (without extension) of the image to be created">

    The "newFile" var then just concatenates the destination, imageFilename and type arguments.

    I've also submitted a bug/regression report to Adobe for the cfpdf tag in 8.0.1.

    Cheers, Julian.
  • Hi,

    A hot fix for several cfimage and image functions has been released from engineering. A technote, "Patch for CFImage and Image functions in ColdFusion 8.0.1" with the hot fix will post at http://www.adobe.com/go/kb403411 in the next week or so.

    Thanks,
    Hemant
  • HOLY CRAP that was too easy. The only thing I had to do was comment out one line:

    pdfDecode.showAnnotations = javaCast("boolean", false);

    I'm running CF 9.0.1 and when I dumped out the pdfDecode object, that property wasn't shown, so I'm guessing its been removed or changed, but I'll look into that later. For now, it works.....so THANK YOU for saving me a few hours of research. I was going down the path of using Ghostscript to convert it, but this is "Native" to CF so it's easier to deploy.

    Not sure if this would run on my Ralio install, but I'll try it anyway :)

    THANKS and have a good evening.
  • @pb_

    I've updated the code to the version I'm using in production that works in both CF8 and CF9.

    I would imagine this code would work in Railo--provided the jPedal is located in the classpath. I have no idea if Railo includes this library or not, but jPedal is available for purchase. However, it's not cheap and you can probably find an open source alternative.
  • Hi, we're experiencing problems with CFPDF and wonder if your function handles it: Images in a pdf don't always show up in the generated thumbnail images... the space is just blank where the images should be. This is the same for both CF8 and CF9. Ive read it may be because of the compression algorithm used to compress the images (JPEG2000). Please let me know. thanx
  • @Martin:

    I have no idea. I'd suggest you just try the function and then tell me if it works. :)
  • I did not check yet the JPEG2000 test like ive mentioned earlier ^. But Ive tried you function sucesfully so far. My next question is: is your function supposed to be more performant than CFPDF (in terms in memory usage, CPU, speed,etc). Our real problem, at my work, with CFPDF is its memory usage (when we have multiple concurrent users converting large PDF's)...its brings our server down :(. Any suggestion?
  • ...Forgot to mention that my performance tests with your function seem to reveal similar performance with CFPDF
  • @Martin:

    The only claims I make, are the ones stated in this blog entry.

    If performance is an issue, then you can either look at queuing conversion w/a background task (so that only one PDF is ever processed at a time) or potentially creating a separate server for processing the images.

    You can also look into 3rd party conversion libraries or server solutions designed for high performance document conversion.
  • I've adapted this code to a project where I needed a higher quality output file and in some cases I need to work with the resulting CF image. Mine is tag based to match my other case, but I thought it might be useful to someone to see how to create the high res version of the image.

    http://pastebin.com/6gmmBT6U
  • @Daniel:

    Thanks for sharing!
  • Here is an updated version that has a few catches that fix if an error occurs that it does not keep the PDF file locked.

    http://pastebin.com/DtAdE0NN

    @Dan - On a side note, I am getting an error when I create a PDF with a PDF417 barcode or DataMatrix barcode that states the dimensions are too large. Any idea on how to handle this? I can share the PDF file if you wish.

Comments for this entry have been disabled.