CFMX UDF: Creating a query from a zip file

Posted by Dan on Sep 18, 2006 @ 11:10 AM

The other day I had the need to unzip the contents of a zip file. However, I did not want to unzip the file to disk, I really wanted to grab the binary data in memory so I could write it to MS SQL as an image data type.

As I normally do, I first did a quick Google and CFLib.org search to see if I could find anything that did exactly what I wanted. The only examples I could find were of writing the zip contents to disk. While I could have written to disk and then read the file from disk, I know I could do what I wanted by just creating a ByteArrayOutputStream. So I took the unzipFile UDF I found (written by Samuel Neff) and basically re-wrote it.

I decided I actually wanted to return a complete query object of all the zip file information. I thought this would be useful for other projects (in case I ever need to "browse" a zip file.)

I made the ability to return the binary data in the zip as an argument (returnBinary.) Returning the binary data of a zip file can potentially be very dangerous, as if the contents of the file are very large, you could quickly use up all the memory alloted to your JVM. This is why it's optional. In my case, I'm getting single image files via a webservice that have been compressed using zip compression. I'm also restricting the size of the file that can be uploaded.

I also added some very simple filtering via the regexFilter attribute. This allows you to supply a RegEx string and only results where the name matches the expression will be returned. If you want to do more complex filtering, since the results are a query object, you can always use a query-of-queries (QoQ) for more complex filtering. I just wanted to provide a simple method of filter the resultset. In most cases a RegEx filter should be enough to grab back just certain file types, etc. (i.e. "\.gif$".)

So, here's the end result:

<cfscript>
/**
* Opens a zip file an extract all the information on the files into a query object
*
* NOTE: based on unzipFile by Samuel Neff
*
* @param filePath Path to the zip file (Required)
* @param returnBinary Should binary data be returned (Optional)
* @param regexFilter A regex string to filter query results by name (Optional)
*
* @return query
* @author Dan G. Switzer, II
* @version 1, September 15, 2006
*/

function queryZipFile(filePath) {
    // create a new zip file object
    var zipFile = createObject("java", "java.util.zip.ZipFile").init(filePath); // ZipFile
    // used to enumeration the ZipEntry
    var entries = "";
    // the current enumerated ZipEntry
    var entry = "";
    // should we return binary data
    var bReturnBinary = false;
    // the regex filter to use to filter out specific entries
    var sRegExMatch = "";
    // the query object we'll return
    var getZipInfo = queryNew("id, name, size, date, mimeType, compressedSize, crc, method, type", "integer, varchar, integer, date, varchar, integer, varchar, varchar, varchar");
    // the current name of the entry
    var sName = "";
    // if the current entry is a directory
    var bDirectory = false;
    // the current entry's compression method
    var sMethod = "";
    // the number of entries
    var iFilesLen = 1;
    // a Java Date object, for converting time to CF
    var jDate = createObject("java", "java.util.Date");
    // a Java Long object, for converting CRC to Hex string
    var jLong = createObject("java", "java.lang.Long");
    // the Servlet context, for attempting to determine mime type of file
    var jServerContext = getPageContext().getServletContext();
    // buffer string used for getting the file contents
    var buffer = repeatString(" ", 1024).getBytes();
    // the input stream of the zip file
    var inStream = "";
    // create an BAOS as the output stream, this will allow us to store the file in memory
    var outStream = createObject("java", "java.io.ByteArrayOutputStream");
    // the length of the current stream
    var length = 0;
    // track valid compression methods
    var stMethods = structNew();

    // if the second argument is supplied, check to see if we should return binary data
    if( arrayLen(arguments) gt 1 ) bReturnBinary = arguments[2];
    // if the third argument is supplied, check to see if we should filter data based on a string
    if( arrayLen(arguments) gt 2 ) sRegExMatch = trim(arguments[3]);

    // if we're to add the binary data, add the column now
    if( bReturnBinary ) queryAddColumn(getZipInfo, "binary", "binary", arrayNew(1));

    // define the valid methods for a compression
    stMethods["-1"] = "unspecified";
    stMethods["0"] = "stored";
    stMethods["8"] = "deflated";

    // get the entries in the zip file
    entries = zipFile.entries();

    // loop through the all the entries
    while( entries.hasMoreElements() ){
        // get the next element
        entry = entries.nextElement();

        // get the current name
        sName = entry.getName();
        // is the entry a directory
        bDirectory = entry.isDirectory();
        // the method of compression
        sMethod = entry.getMethod();

        // if there hasn't been a search string supplied, or the match is found grab the entry
        if( (len(sRegExMatch) eq 0) or (NOT bDirectory and (reFindNoCase(sRegExMatch, sName) gt 0)) ){
            // convert the epoch time to a Java Date object
            jDate.setTime(entry.getTime());

            // add a row to the query for the current entry
            queryAddRow(getZipInfo);
            querySetCell(getZipInfo, "id", iFilesLen);
            querySetCell(getZipInfo, "name", sName);
            querySetCell(getZipInfo, "size", entry.getSize());
            querySetCell(getZipInfo, "date", createODBCDateTime(jDate.toString()));
            querySetCell(getZipInfo, "compressedSize", entry.getCompressedSize());
            if( structKeyExists(stMethods, sMethod) ){
                querySetCell(getZipInfo, "method", stMethods[sMethod]);
            } else {
                querySetCell(getZipInfo, "method", sMethod);
            }
            // return a type similar to cfdirectory (either "dir" or "file")
            if( bDirectory ){
                querySetCell(getZipInfo, "type", "dir");
            } else {
                // convert the CRC-32 to a hex string
                querySetCell(getZipInfo, "crc", uCase(jLong.toHexString(entry.getCrc())));
                querySetCell(getZipInfo, "mimeType", jServerContext.getMimeType(sName));
                querySetCell(getZipInfo, "type", "file");
            }

            // only grab the uncompressed binary data if it's a file and we've requested it
            if( NOT bDirectory and bReturnBinary ){
                // get the current entry's file stream
                inStream = zipFile.getInputStream(entry);

                // read in the 1K buffer chuck
                length = inStream.read(buffer);
                // loop through the inStream and grab each data chunk
                while( length GTE 0 ){
                    outStream.write(buffer, 0, length);
                    length = inStream.read(buffer);
                }
                // close the input stream
                inStream.close();

                // save the binary stream to the query
                querySetCell(getZipInfo, "binary", outStream.toByteArray());
                // reset the outStream -- close() has no affect, this will clear the contents
                outStream.reset();
            }

            // increase the zip file count
            iFilesLen = iFilesLen + 1;
        }
    }

    // close the zip file
    zipFile.close();

    // return the query object
    return getZipInfo;
}
</cfscript>
Categories: HTML/ColdFusion, Java, Source Code

Comments for this entry have been disabled.