Security of unzipping user submitted files Security of unzipping user submitted files wordpress wordpress

Security of unzipping user submitted files


Code, only extract MP3s from zip, ignore everthing else

$zip = new ZipArchive();$filename = 'newzip.zip';if ($zip->open($filename)!==TRUE) {   exit("cannot open <$filename>\n");}for ($i=0; $i<$zip->numFiles;$i++) {   $info = $zip->statIndex($i);   $file = pathinfo($info['name']);   if(strtolower($file['extension']) == "mp3") {        file_put_contents(basename($info['name']), $zip->getFromIndex($i));   }}$zip->close();

I would use use something like id3_get_version (http://www.php.net/manual/en/function.id3-get-version.php) to ensure the contents of the file is mp3 too


Is there a reason they need to ZIP the MP3s? Unless there's a lot of text frames in the ID3v2 info in the MP3s, the file size will actually increase with the ZIP due to storage of the dictionary.

As far as I know, there isn't any way to scan a ZIP without actually parsing it. The data are opaque until you run each bit through the Huffman dictionary. And how would you determine what file is an MP3? By file extension? By frames? MP3 encoders have a loose standard (decoders have a more stringent spec) which makes it difficult to scan the file structure without false negatives.

Here are some ZIP security risks:

  1. Comment data that causes buffer overflows. Solution: remove comment data.
  2. ZIPs that are small in compressed size but inflate to fill the filesystem (classic ZIP bomb). Solution: check inflated size before inflating; check dictionary to ensure it has many entries, and that the compressed data isn't all 1's.
  3. Nested ZIPs (related to #2). Solution: stop when an entry in the ZIP archive is itself ZIP data. You can determine this by checking for the central directory's marker, the number 0x02014b50 (hex, always little-endian in ZIP - http://en.wikipedia.org/wiki/Zip_%28file_format%29#Structure).
  4. Nested directory structures, intended to exceed the filesystem's limit and hang the deflating process. Solution: don't unzip directories.

So, either do a lot of scrubbing and integrity checks, or at the very least use PHP to scan the archive; check each file for its MP3-ness (however you do that - extension and the presence of MP3 headers? You can't rely on them being at byte 0, though. http://en.wikipedia.org/wiki/MP3#File_structure) and deflated file size (http://www.php.net/manual/en/function.zip-entry-filesize.php). Bail out if an inflated file is too big, or if there are any non-MP3s present.


Use the following code the file names inside a .zip archive:

$zip = zip_open('test.zip');while($entry = zip_read($zip)) {    $file_name = zip_entry_name($entry);    $ext = pathinfo($file_name, PATHINFO_EXTENSION);    if(strtoupper($ext) !== 'MP3') {        notify_admin($file_name);    }}

Note that following code will only have look at the extension. Meaning that user can upload anything what has a MP3 extension. To really check if the file is an mp3 you'll have to unpack it. I would advice you to do that in a temporary directory.

After the file is unpacked you may analyze it using, for example ffmpeg or whatever. Having detailed data about bitrate, track lenght, etc will be interesting in any case.

If the analysis fails you can flag the file.