The other day I had the need to unzip the contents of a zip file. However,
I did not want to unzip the file to disk, I really wanted to grab the
binary data in memory so I could write it to MS SQL as an image data
type.
As I normally do, I first did a quick Google and CFLib.org search to see
if I could find anything that did exactly what I wanted. The only examples
I could find were of writing the zip contents to disk. While I could have
written to disk and then read the file from disk, I know I could do what I
wanted by just creating a ByteArrayOutputStream. So I took the
unzipFile
UDF I found (written by Samuel Neff) and basically re-wrote it.
I decided I actually wanted to return a complete query object of all the
zip file information. I thought this would be useful for other projects
(in case I ever need to "browse" a zip file.)
I made the ability
to return the binary data in the zip as an argument (returnBinary.)
Returning the binary data of a zip file can potentially be very dangerous,
as if the contents of the file are very large, you could quickly use up
all the memory alloted to your JVM. This is why it's optional. In my case,
I'm getting single image files via a webservice that have been compressed
using zip compression. I'm also restricting the size of the file that can
be uploaded.
I also added some very simple filtering via the regexFilter
attribute. This allows you to supply a RegEx string and only results where
the name matches the expression will be returned. If you want to do more
complex filtering, since the results are a query object, you can always
use a query-of-queries (QoQ) for more complex filtering. I just wanted to
provide a simple method of filter the resultset. In most cases a RegEx
filter should be enough to grab back just certain file types, etc.
(i.e. "\.gif$".)
So, here's the end result:
<cfscript>
/**
* Opens a zip file an extract all the information on the files into a query object
*
* NOTE: based on unzipFile by Samuel Neff
*
* @param filePath Path to the zip file (Required)
* @param returnBinary Should binary data be returned (Optional)
* @param regexFilter A regex string to filter query results by name (Optional)
*
* @return query
* @author Dan G. Switzer, II
* @version 1, September 15, 2006
*/
function queryZipFile(filePath) {
// create a new zip file object
var zipFile = createObject("java", "java.util.zip.ZipFile").init(filePath); // ZipFile
// used to enumeration the ZipEntry
var entries = "";
// the current enumerated ZipEntry
var entry = "";
// should we return binary data
var bReturnBinary = false;
// the regex filter to use to filter out specific entries
var sRegExMatch = "";
// the query object we'll return
var getZipInfo = queryNew("id, name, size, date, mimeType, compressedSize, crc, method, type", "integer, varchar, integer, date, varchar, integer, varchar, varchar, varchar");
// the current name of the entry
var sName = "";
// if the current entry is a directory
var bDirectory = false;
// the current entry's compression method
var sMethod = "";
// the number of entries
var iFilesLen = 1;
// a Java Date object, for converting time to CF
var jDate = createObject("java", "java.util.Date");
// a Java Long object, for converting CRC to Hex string
var jLong = createObject("java", "java.lang.Long");
// the Servlet context, for attempting to determine mime type of file
var jServerContext = getPageContext().getServletContext();
// buffer string used for getting the file contents
var buffer = repeatString(" ", 1024).getBytes();
// the input stream of the zip file
var inStream = "";
// create an BAOS as the output stream, this will allow us to store the file in memory
var outStream = createObject("java", "java.io.ByteArrayOutputStream");
// the length of the current stream
var length = 0;
// track valid compression methods
var stMethods = structNew();
// if the second argument is supplied, check to see if we should return binary data
if( arrayLen(arguments) gt 1 ) bReturnBinary = arguments[2];
// if the third argument is supplied, check to see if we should filter data based on a string
if( arrayLen(arguments) gt 2 ) sRegExMatch = trim(arguments[3]);
// if we're to add the binary data, add the column now
if( bReturnBinary ) queryAddColumn(getZipInfo, "binary", "binary", arrayNew(1));
// define the valid methods for a compression
stMethods["-1"] = "unspecified";
stMethods["0"] = "stored";
stMethods["8"] = "deflated";
// get the entries in the zip file
entries = zipFile.entries();
// loop through the all the entries
while( entries.hasMoreElements() ){
// get the next element
entry = entries.nextElement();
// get the current name
sName = entry.getName();
// is the entry a directory
bDirectory = entry.isDirectory();
// the method of compression
sMethod = entry.getMethod();
// if there hasn't been a search string supplied, or the match is found grab the entry
if( (len(sRegExMatch) eq 0) or (NOT bDirectory and (reFindNoCase(sRegExMatch, sName) gt 0)) ){
// convert the epoch time to a Java Date object
jDate.setTime(entry.getTime());
// add a row to the query for the current entry
queryAddRow(getZipInfo);
querySetCell(getZipInfo, "id", iFilesLen);
querySetCell(getZipInfo, "name", sName);
querySetCell(getZipInfo, "size", entry.getSize());
querySetCell(getZipInfo, "date", createODBCDateTime(jDate.toString()));
querySetCell(getZipInfo, "compressedSize", entry.getCompressedSize());
if( structKeyExists(stMethods, sMethod) ){
querySetCell(getZipInfo, "method", stMethods[sMethod]);
} else {
querySetCell(getZipInfo, "method", sMethod);
}
// return a type similar to cfdirectory (either "dir" or "file")
if( bDirectory ){
querySetCell(getZipInfo, "type", "dir");
} else {
// convert the CRC-32 to a hex string
querySetCell(getZipInfo, "crc", uCase(jLong.toHexString(entry.getCrc())));
querySetCell(getZipInfo, "mimeType", jServerContext.getMimeType(sName));
querySetCell(getZipInfo, "type", "file");
}
// only grab the uncompressed binary data if it's a file and we've requested it
if( NOT bDirectory and bReturnBinary ){
// get the current entry's file stream
inStream = zipFile.getInputStream(entry);
// read in the 1K buffer chuck
length = inStream.read(buffer);
// loop through the inStream and grab each data chunk
while( length GTE 0 ){
outStream.write(buffer, 0, length);
length = inStream.read(buffer);
}
// close the input stream
inStream.close();
// save the binary stream to the query
querySetCell(getZipInfo, "binary", outStream.toByteArray());
// reset the outStream -- close() has no affect, this will clear the contents
outStream.reset();
}
// increase the zip file count
iFilesLen = iFilesLen + 1;
}
}
// close the zip file
zipFile.close();
// return the query object
return getZipInfo;
}
</cfscript>