BufferOverflowException when invoking ByteArrayOutputStream.toString() on large arrays in CFMX 6.1

Posted by Dan on Dec 9, 2005 @ 6:38 PM

I'm working on project that involves us transferring large XML files between a client application and the server. In order to increase bandwidth efficiency, we're using gzip on the XML data to shrink the file size down. This works great as we're seeing about an 80% shrink in file size.

However, I was running into a problem trying to expand the GZIP file on the server. I wanted to expand the file directly to a string in memory—avoiding writing the file to disk. By using a ByteArrayOutputStream I read in the GZIP file using the java.io.FileInputStream and java.util.zip.GZIPInputStream. I then used the toString() method on the ByteArrayOutputStream to convert the OutputStream into a string ColdFusion could use.

On files under 10MBs, I wasn't having a problem but on some really large files I was getting a strange java.nio.BufferOverflowException error when trying to convert the OutputStream to a string. Turns out there appears to be some kind of threshold between CFMX and Java.

To workaround this issue, what I ended up doing is calling the size() method on the OutputStream. If the size of the array is greater than 10,000,000 then I'd invoke the toString() on the OutputStream and append it to my return string. Next, I invoke the reset() method to flush the OutputStream.

So far, this has worked with every size file I've thrown at it. Here's what the code looks like:

function gzipExpandToString(infile){
    var outStream = createObject("java", "java.io.ByteArrayOutputStream");
    var inStream = createObject("java", "java.io.FileInputStream");
    var inGzipStream = createObject("java", "java.util.zip.GZIPInputStream");

    var buffer = repeatString(" ", 1024).getBytes();
    var length = 0;
    var sOutput = "";

    inStream.init(arguments.infile);
    inGzipStream.init(inStream);

    do {
        length = inGzipStream.read(buffer, 0, 1024);
        if( length neq -1 ){
            outStream.write(buffer, 0, length);
            if( outStream.size() gt 10000000 ){
                sOutput = sOutput & outStream.toString();
                outStream.reset();
            }
        }
    } while( length neq -1 );

    inGzipStream.close();
    inStream.close();

    sOutput = sOutput & outStream.toString();
    outStream.reset();

    return sOutput;
}

The key change was the addition of the if( outStream.size() gt 10000000 ){} statement.

I probably should have tested things farther to find out what exactly was the number that CF was having problems with, but I didn't have time to troubleshoot the problem any further.

Categories: HTML/ColdFusion, Java

9 Comments

  • Hi Dan,

    I need to do something similar, but different. I'm trying to compress (gzip) a block of HTML to store in SQL Server, and then uncompress it when it's selected from SQL server.

    Can you please give a very basic example on how to gzip a string and unzip it back to the original string?
  • @Roman:

    I'd recommend against compressing data before storing it in the database--it would make any type of querying based on that data impossible.

    If you're using SQL Server 2008 (or newer) you have the ability to create a compressed table:

    http://msdn.microsoft.com/en-us/library/cc280449.a...

    That would certainly be a better option (although I'd generally stick to do that for archived/warehoused data.)

    As for your actual question, I do not have any code that compresses a string to GZIP, but you can look at the java.util.zip.GZIPOutputStream class--which does what you want.
  • I'm trying to figure this out but without much success. It seems to encode fine, but when I try to decode, I get an error that says Not in GZIP format for some reason, but I'm passing the output from the encoder to the decoder, so it should be in correct format.

    <code>
    streamOutputByteArray = CreateObject("java","java.io.ByteArrayOutputStream");
    streamGZIPOutput = CreateObject("java","java.util.zip.GZIPOutputStream");
    streamOutputByteArray.init();
    streamGZIPOutput.init(streamOutputByteArray);

    streamGZIPOutput.write(largeString.getBytes(), 0, Len(largeString.getBytes()));

    streamGZIPOutput.finish();
    streamGZIPOutput.close();
    streamOutputByteArray.flush();
    streamOutputByteArray.close();

    //compressedString=BinaryEncode(streamOutputByteArray.toByteArray(),"Base64");
    compressedString=streamOutputByteArray.toString("utf-8");
    streamOutputByteArray.reset();

    /*Done with Compressing*/

    compressedInputString = CharsetDecode(compressedString,"utf-8");//BinaryDecode(compressedString,"Base64");

    length = 0;
    sOutput = "";

    streamInputByteArray = CreateObject("java","java.io.ByteArrayInputStream");
    streamGZIPInput = CreateObject("java","java.util.zip.GZIPInputStream");
    streamInputByteArray.init(compressedInputString);
    streamGZIPInput.init(streamInputByteArray);

    do {
        length = streamGZIPInput.read(compressedInputString, 0, Len(compressedInputString));
        if( length neq -1 ){
          streamOutputByteArray.write(compressedInputString, 0, length);
          if( streamOutputByteArray.size() gt 10000000 ){
            sOutput = sOutput & streamOutputByteArray.toString();
            streamOutputByteArray.reset();
          }
        }
      } while( length neq -1 );

      streamGZIPInput.close();
      streamInputByteArray.close();

      sOutput = sOutput & streamOutputByteArray.toString();
      streamOutputByteArray.reset();
    </code>
  • @Roman:

    Did you check out the gzip() UDF on CFLib.org?

    http://www.cflib.org/udf/gzip

    There's also a cooresponding ungzip():

    http://www.cflib.org/udf/ungzip

    These functions might be more in the lines of what you want, since they do everything from string inputs.
  • I know this is a really old post but definitely a nice little read. I did try your original method and seemed to run fine. One odd thing I noticed is it appears the outputted text from this ends up converted double tabs into single tabs. Meaning I had some data that was something like: foo#chr(9)##Chr(9)#bar and when I gunzipped the file to get to that data it appears to now be foo#Chr(9)#bar. I have not tried the function on cflib yet to see if that to does it.
  • @Aaron:

    That's a strange behavior. Have you tried ungzip the file w/another tool to see if the problem isn't with the source material?

    There's nothing in the code above that should interfere w/whitespace characters.
  • Dan,

    To test it I made a text file in notepad that was just basically like the example I said above. I then used a windows based gzip program to zip it up. Then I fed the file into te routine. I then tried doing a Find() for two Chr(9)'s and found nothing. I tried a find for one and found that. I then tried CFFILE on the text file and the find for two tabs found it.

    I am going to mess with it more today once I make it to work. But so far it seems like the java is changing two tabs to one tab. Which throws things when parsing a tab delimited file that is failing to put double quotes around blank spots.
  • @Aaron:

    What JDK are you using? I can't say I tested multiple tabs, as I was dealing with XML and the formatting of the tabs wouldn't have made a difference it what we were doing. Very strange issue indeed!
  • Dan,

    I am running 1.6.0_14

    Not having much luck with the UDF from cflib in regards to getting it to unzip. Actually getting some errors thrown, may just be how I am trying to encode things:

    <cffile action="read" file="#App.RootPath#\Testing.txt.gz" variable="strExcel">
    <cfset compressed=ToBase64(strExcel) />
    <cfset Test    = ungzip(compressed, "base64") />

    That tosses following for line 35 of the UDF on cflib:

    An exception occurred while instantiating a Java object. The class must not be an interface or an abstract class. Error: ''.

    Think I might just go the route of using cfexecute to unzip my file via WinZip then cffile to read it. My real file that I need to do this with is pretty big, 114,000 KB, Excel even bogs down trying to open up that sucker. I hate even using CF for this and been wondering if I could get Oracle's SQLLDR to work with it but I have been having to use CF to clean up the data for an import.

Comments for this entry have been disabled.