CFMX UDF: Creating a query from a zip file

Categories: HTML/ColdFusion, Source Code, Java

The other day I had the need to unzip the contents of a zip file. However, I did not want to unzip the file to disk, I really wanted to grab the binary data in memory so I could write it to MS SQL as an image data type.

As I normally do, I first did a quick Google and CFLib.org search to see if I could find anything that did exactly what I wanted. The only examples I could find were of writing the zip contents to disk. While I could have written to disk and then read the file from disk, I know I could do what I wanted by just creating a ByteArrayOutputStream. So I took the unzipFile UDF I found (written by Samuel Neff) and basically re-wrote it.

I decided I actually wanted to return a complete query object of all the zip file information. I thought this would be useful for other projects (in case I ever need to "browse" a zip file.)

I made the ability to return the binary data in the zip as an argument (returnBinary.) Returning the binary data of a zip file can potentially be very dangerous, as if the contents of the file are very large, you could quickly use up all the memory alloted to your JVM. This is why it's optional. In my case, I'm getting single image files via a webservice that have been compressed using zip compression. I'm also restricting the size of the file that can be uploaded.

I also added some very simple filtering via the regexFilter attribute. This allows you to supply a RegEx string and only results where the name matches the expression will be returned. If you want to do more complex filtering, since the results are a query object, you can always use a query-of-queries (QoQ) for more complex filtering. I just wanted to provide a simple method of filter the resultset. In most cases a RegEx filter should be enough to grab back just certain file types, etc. (i.e. "\.gif$".)

So, here's the end result:

<cfscript>
/**
* Opens a zip file an extract all the information on the files into a query object
*
* NOTE: based on unzipFile by Samuel Neff
*
* @param filePath Path to the zip file (Required)
* @param returnBinary Should binary data be returned (Optional)
* @param regexFilter A regex string to filter query results by name (Optional)
*
* @return query
* @author Dan G. Switzer, II
* @version 1, September 15, 2006
*/
function queryZipFile(filePath) {
   // create a new zip file object
   var zipFile = createObject("java", "java.util.zip.ZipFile").init(filePath); // ZipFile
   // used to enumeration the ZipEntry
   var entries = "";
   // the current enumerated ZipEntry
   var entry = "";
   // should we return binary data
   var bReturnBinary = false;
   // the regex filter to use to filter out specific entries
   var sRegExMatch = "";
   // the query object we'll return
   var getZipInfo = queryNew("id, name, size, date, mimeType, compressedSize, crc, method, type", "integer, varchar, integer, date, varchar, integer, varchar, varchar, varchar");
   // the current name of the entry
   var sName = "";
   // if the current entry is a directory
   var bDirectory = false;
   // the current entry's compression method
   var sMethod = "";
   // the number of entries
   var iFilesLen = 1;
   // a Java Date object, for converting time to CF
   var jDate = createObject("java", "java.util.Date");
   // a Java Long object, for converting CRC to Hex string
   var jLong = createObject("java", "java.lang.Long");
   // the Servlet context, for attempting to determine mime type of file
   var jServerContext = getPageContext().getServletContext();
   // buffer string used for getting the file contents
   var buffer = repeatString(" ", 1024).getBytes();
   // the input stream of the zip file
   var inStream = "";
   // create an BAOS as the output stream, this will allow us to store the file in memory
   var outStream = createObject("java", "java.io.ByteArrayOutputStream");
   // the length of the current stream
   var length = 0;
   // track valid compression methods
   var stMethods = structNew();
   // if the second argument is supplied, check to see if we should return binary data
   if( arrayLen(arguments) gt 1 ) bReturnBinary = arguments[2];
   // if the third argument is supplied, check to see if we should filter data based on a string
   if( arrayLen(arguments) gt 2 ) sRegExMatch = trim(arguments[3]);
   // if we're to add the binary data, add the column now
   if( bReturnBinary ) queryAddColumn(getZipInfo, "binary", "binary", arrayNew(1));
   // define the valid methods for a compression
   stMethods["-1"] = "unspecified";
   stMethods["0"] = "stored";
   stMethods["8"] = "deflated";
   // get the entries in the zip file
   entries = zipFile.entries();
   // loop through the all the entries
   while( entries.hasMoreElements() ){
      // get the next element
      entry = entries.nextElement();
      // get the current name
      sName = entry.getName();
      // is the entry a directory
      bDirectory = entry.isDirectory();
      // the method of compression
      sMethod = entry.getMethod();
      // if there hasn't been a search string supplied, or the match is found grab the entry
      if( (len(sRegExMatch) eq 0) or (NOT bDirectory and (reFindNoCase(sRegExMatch, sName) gt 0)) ){
         // convert the epoch time to a Java Date object
         jDate.setTime(entry.getTime());
         // add a row to the query for the current entry
         queryAddRow(getZipInfo);
         querySetCell(getZipInfo, "id", iFilesLen);
         querySetCell(getZipInfo, "name", sName);
         querySetCell(getZipInfo, "size", entry.getSize());
         querySetCell(getZipInfo, "date", createODBCDateTime(jDate.toString()));
         querySetCell(getZipInfo, "compressedSize", entry.getCompressedSize());
         if( structKeyExists(stMethods, sMethod) ){
            querySetCell(getZipInfo, "method", stMethods[sMethod]);
         } else {
            querySetCell(getZipInfo, "method", sMethod);
         }
         // return a type similar to cfdirectory (either "dir" or "file")
         if( bDirectory ){
            querySetCell(getZipInfo, "type", "dir");
         } else {
            // convert the CRC-32 to a hex string
            querySetCell(getZipInfo, "crc", uCase(jLong.toHexString(entry.getCrc())));
            querySetCell(getZipInfo, "mimeType", jServerContext.getMimeType(sName));
            querySetCell(getZipInfo, "type", "file");
         }
         // only grab the uncompressed binary data if it's a file and we've requested it
         if( NOT bDirectory and bReturnBinary ){
            // get the current entry's file stream
            inStream = zipFile.getInputStream(entry);
            // read in the 1K buffer chuck
            length = inStream.read(buffer);
            // loop through the inStream and grab each data chunk
            while( length GTE 0 ){
               outStream.write(buffer, 0, length);
               length = inStream.read(buffer);
            }
            // close the input stream
            inStream.close();
            // save the binary stream to the query
            querySetCell(getZipInfo, "binary", outStream.toByteArray());
            // reset the outStream -- close() has no affect, this will clear the contents
            outStream.reset();
         }
         // increase the zip file count
         iFilesLen = iFilesLen + 1;
      }
   }
   // close the zip file
   zipFile.close();
   // return the query object
   return getZipInfo;
}
</cfscript>

Add Comment



If you subscribe, any new posts to this thread will be sent to your email address.