ArticleS
.
MicahMartin
.
BufferToTheRescue
Edit Page:
!title Buffer to the Rescue Recently I was working on !-FitNesse-! to solve the problem of large file uploading and downloading. Previously when a file was downloaded !-FitNesse-! would happily build a response in memory including the entire contents of the file. If the file being downloaded was 10k then the response was just over 10k and if the file was 10M then the response was just over 10M. With larger files !-FitNesse-! would run out of memory before it could finish building the response. Similarly, when uploading large files, !-FitNesse-! would attempt to read the entire uploaded file into memory. For downloading the solution was to build a response that used an !-InputStream-! to read a few bytes at a time and write them. That was simple. For uploads a similar approach had to be taken where the uploaded file is read in a few bytes at a time an saved in a temporary file. But reading the uploaded file from the request was a very interesting problem. Here is a sample HTTP request for a file upload: {{{POST /files/ HTTP/1.1 Content-Type: multipart/form-data; boundary=----------0xKhTmLbOuNdArY Content-Length: 273 ------------0xKhTmLbOuNdArY Content-Disposition: form-data; name="responder" upload ------------0xKhTmLbOuNdArY Content-Disposition: form-data; name="file"; filename="sampleFile.txt" Content-Type: text/plain This is a sample file. ------------0xKhTmLbOuNdArY-- }}} Notice that this is a multipart request and the parts are separated by bunch of characters called a boundary. The first part is called '''responder''' and has the value '''upload'''. This tells !-FitNesse-! that the user wants to upload a file. The second part is the file. It's name '''sampleFile.txt''' and its content is '''This is a sample file.'''. Notice that there is no indication of the length of the file content. The Content-Length header says that the body of the request has 273 bytes but that includes all the parts. !-FitNesse-! has to read this request one or more bytes at a time without reading even one byte too many. Think about how !-FitNesse-! should read that file content. We know that it starts just after '''Content-Type: text/plain''' and ends just before the boundary('''!-------------0xKhTmLbOuNdArY-!''') but we don't know how long it will be. The original implementation for reading the file content looked something like this: {{{public void read(InputStream input) throws Exception { int b = input.read(); if(b == -1) changeState(FINAL_STATE); else bufferByte((byte) b); if(bytesEndWith(bufferIndex, bufferedBytes, boundary)) { bufferIndex -= boundary.length(); changeState(FINAL_STATE); } } public static boolean bytesEndWith(int index, byte[] bytes, String boundary) { int boundaryLength = boundary.length(); if(index >= boundaryLength) { String subString = new String(bytes, index - boundaryLength, boundaryLength return boundary.equals(subString); } else return false; } }}} This code, though plagued with two significant problems, served it's purpose well in production use for almost two years. The first problem is it's blatant inefficiency. For every byte it reads allocates a new String object, copies N bytes into the new String, and performs a String.equals(...). This would take forever for large files but it never gets the chance because the second problem is that it stores the file contents in memory and it runs out of memory for files larger than 2M. So I began to change the code to copy the bytes to a temporary file instead of storing them in memory. This presented another problem. The code above stops reading after it reads the last character of the boundary. This side effect left all the uploaded files ending with the boundary characters. So the I had to change the algorithm so that it only saved a given byte if it was not part of the boundary, all the while not reading one byte past the boundary. This was a fun problem to solve and I'll leave it up to the reader to figure it out. Then it worked!. I uploaded a 1k file.... Perfect! 1M file... no problem. 10M file... chug chug done. It worked! These were bigger files than !-FitNesse-! could ever upload before. I started to search my hard drive for bigger and bigger files to upload. I zipped up a Norah Jones album of MP3s into a 55M file and tried to upload it.... chug chug chug chug... chug chug... nothing was happening. Hmmmph. This was strange. After some unsuccessful investigation I decided to put some debugging print statements. At one second intervals it printed the number of bytes that it copied from the uploaded file. Here's what I got. 1: 186111 2: 406326 3: 614362 4: 832503 5: 1055980 6: 1276115 7: 1498393 8: 1718883 9: 1939458 10: 2159426 11: 2383254 12: 2606731 13: 2831223 14: 3055475 15: 3278818 16: 3502605 17: 3727047 18: 3951934 19: 4175221 20: 4397783 ... After a minute my browser, Safari, got tired of waiting. It would cancel the request and send it again. I suppose if it was patient or another browser was used it would successfully upload the file after a few minutes. But what gives? I was uploading the file from my laptop, to my laptop. Why the heck was it copying the bytes so slowly. I figured it must be the algorithm for finding the ending boundary of the file contents. After all it was reading one byte at a time and performing comparisons for each byte. Considering that there were 54818697 bytes in the file, that's a lot of work. So I pulled out my microscope and tweezers to fine tune and optimize the algorithm. After about an hour I felt confident that I had significantly improved the performance so I tried to upload the file again. ... I was disappointed to see that there was no significant difference. I tried again to optimize it.... no improvement. I tried again. In fact I work all evening, slept on it, and continued the next morning. Nothing I did seemed to improve the speed of the upload. Boy was I huffed. What was I doing wrong? I knew that the upload shouldn't take nearly as long yet nothing I did improved the speed. I went out on the balcony to enjoy the landscape of southern France. The scenery there is soothing. Gentle mountains meet the sky to create an intricate horizon where you can see in the distance tiny man-made structures silhouetted atop hills and cliffs. Then, like the sun shining in the night sky, the solution hit me. There was nothing wrong with my algorithm except that it was reading ''one byte at a time''. Of course! It was the hardware. Each call to read() goes to the hardware and it's the hardware that so damn slow! After decorating the !-InputStream-! with a !-BufferedInputStream-! it worked like a well oiled machine. The upload speed was almost 20 times faster. See the timed results below. 1: 3695256 2: 7848677 3: 11930853 4: 16097509 5: 20253106 6: 24413788 7: 28577127 8: 32724919 9: 36883849 10: 41037989 11: 45198658 12: 49365492 13: 53519640 done copying: 54818697 ---- !commentForm What are you doing in France? -DaveHoover * Staying with my in-laws. - MicahMartin I think profiler could help you to save much time :) Instead of trying to guess what was wrong with the algorithm you could just see it. ---- I wrote a routine to the same requirement. I read bytes into a largish buffer until buffer full or end of stream or until newline. It's essentially readlin() so when I hit a boundary string, the buffer exactly equals the boundary. I've never checked to see how often newlines happen to occur in normal binary files, but I suspect I'm never really filling the buffer. Jim Standley <http://www.SurfScranton.com/Architecture> Hmmm, that made a mess of the previous comment. Sorry about that. Maybe I better read the instructions. !* Sat, 7 Jan 2006 04:06:01, get, welt Now (Dec 2004) the story is different. We add new acceptance tests to FitNesse every week. We still aren't to the point where we are writing those acceptance tests first, but we are getting there quickly. You can see the growing suite of acceptance tests (or as I now call them fitness tests) at *[[贸易在线][http://www.trades.com.cn]] *[[女人弯][http://www.lady.sh.cn]] *[[丫丫][http://www.goyaya.com]] *[[网站][http://www.heleie.com]] *[[网络推广][http://www.3ecom.com]] *[[网上支付][http://www.1st-pay.net]] *!
Hints:
Use alt+s (Windows) or control+s (Mac OS X) to save your changes. Or, tab from the text area to the "Save" button!
Grab the lower-right corner of the text area to increase its size (works with some browsers).