ZPP Library -- A set of C++ classes for reading .ZIP archives

Copyright ©1999 Michael Cuddy, Fen's Ende Software
minor modifications Copyright ©2000-2003 Eero Pajarre
Version 1.0 (first SourceForge release)

Overview

Many programs, games especially, need to access large numbers of datafiles. These data files generally take up quite a bit of disk space not only for the storage of their data, but as "wasted" disk space due to cluster size limitations (on FAT12/FAT16 filesystems -- Fat32 has alleviated some of this wastage by using smaller cluster sizes). Traditionally, game programmers have relied on one of two main methods of data storage:

The drawback to the first solution is the wasted disk space problem, as well as the problem of slower installations (it's much slower to create one thousand 1KB files than it is to create a single 1MB file) and access (DOS/Win95 uses a linear search through the directory to open files -- the more files you have in a single directory, the longer it takes to open the files further down the list).

The second solution provides it's own pitfalls, first is that you must write all your own image/sound/etc. loading routines which use a custom API for accessing the archived data. A further drawback is that you have to write your own archive utility to build the archives in the first place.

The ZPP library addresses all of these concerns:

In addition to providing "transparent" access to members of .ZIP archives, ZPP provides a versioning feature which can be used to implement "patch" zip files to fix problems found after the release of a program without having to re-issue an entire .ZIP file.

These benefits are a compelling reason to use .ZIP archives instead of "rolling-your-own". Indeed, recently a couple of popular games (Starsiege: Tribes and QuakeIII) have moved to using .ZIP files (previously, the Quake series went the custom route with the famous ".WAD" files). The extensions are different, (".PK3" in the case of Quake III, and ".VOL" files in the case of "Starsiege: Tribes") but the files are standard .ZIP file and can be manipulated with off-the-shelf tools -- this can have a BIG impact on time to market for a game because the tools do not have to be written to manipulate archives; well developed and supported tools already exist in the marketplace and can be had for free (Info ZIP), or for a small fee (PkZip).

ZPP uses the free ZLIB library to handle it's decompression / compression. It is compatible with files stored in .ZIP archives using the "STORE" method and the "DEFLATE" method.

Classes

The core of ZPP consists of three core classes: These classes wrap up the functionality of the ZLib library.

For interface to the rest of the world, objects of the following classes are created:

The zppZipArchive class encapsulates all of the information about a .ZIP archive file (XXXX.zip). It also supports global searching of files and text attributes (see below) attached to the .zip file comment which are parsed by the library and accessed through member functions in the zppZipArchive class.

zppZipArchive objects contain a vector of zppZipFileInfo objects, each one representing the information required to uncompress a particular member of a .ZIP archive.

The zppZipReader class encapsulates the decompression state for a single file, and is the real worker behind the zppstreambuf class.

The zppstreambuf class is an iostream wrapper around the zppZipReader class. It supports all of the standard "istream" functions for reading and can be passed to any file parsing routine which uses istreams.

IMPORTANT NOTE: because the compressed file data is decompressed on the fly, zppstreambuf::seekg() will fail if the offset is not 0. A future enhancement to the library will allow seeking on a file which is not compressed (ZIP refers to this as the "stored" compression method), or to have an entire file decompressed to memory when it's zppZipReader object is created.

The izppstream class wraps a zppstreambuf for formatted input, and is a standard "istream" type class.

Attributes

Most .ZIP archive utilities let you attach a "zip comment" to a zip file. This is text data that can be up to 64KB in length. The ZPP library can (optionally) treat this data as a set of key/value pairs, and make those values available to the program. The ZPP library expects the key/value data to be in a specific (but relatively flexible) format. The data is in the form of a text file.

The unit of parsing is a line. The line-end character can be UNIX style (a single NL, ASCII 10), MAC style (a single CR, ASCII 13) or DOS/Windows style (CR, ASCII 13, followed by NL, ASCII 10). Continuation lines are not supported.

The first line of the file MUST be "%ZPP%" (without the quotes). If the ZPP library does not find this signal string, it will not parse the attribute file.

Each line of the attribute file defines a single key/data pair. Blank lines (lines consisting of only spaces and a line-terminator) are ignored. Comment lines, beginning with '#', are also ignored.

A Key/value line is of one of the following four forms:

In all three forms, the KEY is made up of non-whitespace characters and is terminated by the first space character. A whitespace character is ASCII 32 (space), or ASCII 10 (tab).

After the KEY, there can be optional whitespace. An EQUALS sign "=" separates the KEY from the DATA. After the "=", any whitespace will be skipped.

In the first form, the data associated with the key is all non-whitespace characters up to the end-of-line or first whitespace encountered. Note that the data cannot have embedded whitespace in it.

The final three forms are used when there is a need to embed whitespace in the data. The data starts after the quote or parenthesis character, and continues to a matching, closing quote or parenthesis character. The data string cannot contain the character used to bracket it, and there is no way to escape an embedded quote character, thus the three acceptable forms.

Example attributes:


	ZPP_PRIORITY = 2
	ProgramTitle = "Mike's Wonderful program"
	Quote3 = ("Hey, bob's dog bit me!", said Joe.)
NOTE: Attribute names are CASE SENSITIVE.

The zppZipArchive::findAttr() member function is used to locate attributes.


	zppZipArchive myZip("foo.zip");
	string z;
	
	z = myZip.findAttr("Quote3");
	
	if (z == "") {
	    cout << "Quote3 attribute not found" << endl; 
	} else {
		cout << "Quote3 = " << z << endl;
	}
The zppZipArchive::attrExists() member function can be used to determine if an attribute exists.

NOTE: Attributes beginning with the string "ZPP_" are reserved for use by the ZPP library. There are currently two such attributes used: ZPP_PRIORITY (see "Priorities", below), and ZPP_DIR_PREFIX (see "Directory Prefix", below).

Priorities

When programs are distributed, there frequently arises the need to replace files which contained errors in the original distribution. With all files packed inside of archives, it can be difficult to patch a multi-megabyte file in place on the user's system, so the ZPP library has the ability to assign a priority to a ZIP archive. When ZIP archives are opened by the ZPP library, all contained files are added to a global map of files with an attached priority. The priority comes from a class static variable which can be set before the zppZipArchive object is constructed, or from a ZPP_PRIORITY attribute (see Attributes) above contained in the .ZIP file when it's opened. Files of higher priority replace those of lower priority in the global map of files.

NOTE: files not in a .ZIP archive on the hard disk have a higher priority than any .ZIP file -- they will be found first before the open .ZIP archives are searched.

Directory Prefix

When a .ZIP archive is opened and the zppZipArchive object created, a string can optionally be prepended to all files contained within. This string is taken from the attribute "ZPP_DIR_PREFIX" contained in the zip comment.

This feature can be used, to move all of the files contained in the zip file "down" in the file hierarchy without having to have a separate subdirectory for them, or storing that subdirectory name when the archive is built. For example, you might have a single zip file for each level of a game "level1.zip", "level2.zip", etc. and the files stored in the .ZIP file are of the form "images/shot.bmp", "map.bsp", etc. If both level1.zip and level2.zip contained the same file names, it might get confusing which file you're trying to access, so the level1.zip archive could contain a ZPP_DIR_PREFIX="Level1/" attribute: The files contained within now become "Level1/images/shot.bmp", "Level1/map.bsp" for the level1.zip

Getting Started

In the distribution archive, I've provided VC 5.0 (Visual Studio 97) project files to build the library and sample executable. I don't provide the built library.

To use the library in your own programs, put the generated library (zpp.lib) and it's header files (zpp.h, zpplib.h, zreader.h, izstream.h) where your compiler can find them. Add #include "zpp.h" to your program, and you're off and running. See the example source code included in the archive.

Opening Archives

A zppZipArchive object is associated with each .ZIP archive opened. When the zppZipArchive object is constructed, the name of the file to open is passed in to the constructor:

	zppZipArchive(string &_fn, ios::openmode _mode = ios::in, bool _makeGlobal = true);

The _fn parameter is the name of the .ZIP file to open.

The _mode argument currently MUST be ios::in as only reading of .ZIP files is supported. ios::bin is OR'ed into the file mode on platforms where it's needed to access files in an untranslated way.

The _makeGlobal argument, if true (the default), causes all files in the archive to be added to the global list of files (see Opening Files, below).

When the zppZipArchive object is constructed, some basic integrity checks are done on the .ZIP file (placement of the central directory, making sure that the archive is complete, not a part of a multi-part archive, as these are not supported), and then the table of contents (list of files) is read in and stored in a <vector> of zppZipFileInfo objects. Each file's name is also added to a <map> for quick lookup on opening.

In the case of a "global" .zip file (one constructed with _makeGlobal set to true), references to the files in the archive are inserted into a global map.

If present, the .ZIP archive's attributes are parsed and put in a map for future access.

To open up all archives in a particular directory (or matching a particular wildcard), use the class static zppZipArchive::openAll() function. This function relies on a function in util.cpp which enumerates a directory matching a wildcard. Since directory enumeration is inherently a non-portable operation, this file will need to be modified to support different operating systems.

Opening Files

In general, files are opened by creating a set of iostream objects for the file: the zppstreambuf object is the underlying stream buffer object, derived from istreambuf, and is used for raw file I/O. For formatted file I/O, a izppstream object can be constructed. Derived from istream, it can be used in nearly all places that an ifstream object can be used.

When opening files, there are two methods that can be used to decide which file to open: first, a file can be opened from a specific .ZIP archive (zppZipArchive object), or, a file can be opened by searching the global list of files. Whether or not a file is opened from the global list of files depends on if a zppZipArchive object is passed to the zppstreambuf or izppstream constructor.

Both the zppstreambuf and the izppstream objects support an "open and close" model as well as open-at-construction time model.

Examples

Several examples will eventually be included with the distribution. Since this is an alpha release, only a small example is included.

Portability

I'm making the source code to the ZPP library available for porting. Currently, it compiles and runs using the Microsoft provided STL with Microsoft Visual C++ 5.0 (Service Pack 3). I will be testing it on other compilers / operating systems as those systems come available.

The underlying ZLIB library is very portable, and should port cleanly to any platform.

If you are interested in porting ZPP to another platform, please use the SourceForge forum interface to contact the developers so that we can coordinate efforts.

Future Directions

There are several planned enhancements for the ZPP library. These features will be available in a future release. If you have new ideas about where the ZPP library should go, or even better, code to implement those ideas, contact us!

Known Bugs

This section will be for known, but not-yet-fixed bugs.

Disclaimer

This library is provided AS-IS. Full source is provided, that's about as much disclosure as you could want. If you run this code without looking at it, and it blows up your hard-drive, tough. This code is relatively stable, but is under active development -- and it hasn't blown up my hard drive ... yet.

License

Copyright 1999 Michael Cuddy. Some modifications Copyright 2000-2003 Eero Pajarre. Code is licensed under the MIT license.

The original license from Michael Cuddy contained the following:

"This code is based on code from the ZPP library, by Jean-Loup Gailly and Mark Adler. They wrote the real meat-and-potatoes code here, I just wrapped it up with some semantic sugar. See the links section for the official ZLib home page.

You may NOT use this code in any mission-critical, or life-support project; I wouldn't trust my code with my life, and neither should you."

Version History

I don't expect to update this document, or this code very often, but when I do, I'll put that information here.

Links

Related and not-so-related links: SourceForge.net Logo Project summary page