zlib’s Software Library for Data Compression: Playing a Role Behind the Scenes in Gzip and Many Mainstream Software Platforms

zlib’s Software Library for Data Compression: Playing a Role Behind the Scenes in Gzip and Many Mainstream Software Platforms

TL; DR: zlib, an open-source software library used for data compression, is a vital component in both the gzip software program and many widely recognized software platforms, such as Linux, iOS, and macOS. Initially released in 1995, the free solution still enjoys widespread use, with regular enhancements provided by the open-source community. Moving forward, zlib Co-Creator Mark Adler told us fans should keep an eye out for possible performance improvements and compatibility with new standards.

If had you told zlib Co-Creator Mark Adler in 1995 that the software library for data compression he had just released would still enjoy widespread use in 2020, he said he would have had a good laugh.

“I would have known that 25 years later, computers would be completely different — bigger, faster, higher throughput — and compression technologies would be much more advanced,” Mark told us. “There was no way I’d think zlib would still be in use. But it is.”

While Mark is the first to admit there are better compression technologies available today for some use cases, they all provide different tradeoffs.

Photo of Mark Adler and zlib logo

Co-Creator Mark Adler gave us an inside look at zlib, an open-source software library used for data compression.

“Zlib lives in a particular space in terms of how long it takes to compress versus how much compression you get out of it, and how fast it decompresses.”

The open-source technology is also free, widely regarded, and boasts exceptional portability. In addition to supporting the gzip file format and software application, zlib is a vital software component in some of today’s best-known operating systems, including Linux, macOS, and iOS.

Intel and CloudFlare both maintain high-performance forks of the commonly used library. It’s also featured in popular gaming consoles, including PlayStation 4, Wii, Xbox One, and Xbox 360.

Today, the free software is regularly enhanced by the open-source community. Still, Mark told us he’s looking forward to potentially improving the software with performance upgrades and better compatibility with new standards.

Contributing Open-Source Compression Code for Decades

The first public version of the zlib software library was released on May 1, 1995, under the open-source zlib license. But Mark said that zlib’s roots stretch back to the late 1980s, when he upgraded from an IBM PC running MS-DOS to the NeXT Computer System while completing his doctoral degree in physics. (Fun fact: NeXT, Inc, founded in 1985 by Steve Jobs, was eventually absorbed by Apple).

“I wanted to transfer a bunch of files from my old PC to my NeXT Computer, but I didn’t have a good way to do it, other than using zip files,” Mark said. “The problem was, on the NeXT, there was no way to unzip. I thought, maybe this is something I can do myself.”

zlib was created as part of an open-source response to patent concerns.

As it turns out, he could. Not only did Mark churn out some code for extracting files, but he ultimately contributed it to the open-source project UnZip, now known as Info-ZIP. Then he wrote a zip program for UnZip, further cementing his entry into the open-source world.

At the same time, Jean-loup Gailly of France wrote the compression code of Info-Zip’s portable archiver, zip. “We realized we could use Jean-loup’s compressor and my decompressor to provide something better than what was currently available in the Unix system, which was an appropriately named program known as compress,” Mark said. “So that’s what we did — put them together in a program we called gzip.”

Jean-loup was the primary author of gzip, while Mark served as the author of gzip’s and UnZip’s main decompression routines.

Expanding Use Cases & Formation of the PNG Photo Format

gzip was released on October 31, 1992. In 1994, Unisys Corporation unexpectedly decided to enforce its 1983 patent on the LZW compression algorithm, used for making Graphics Interchange Format (GIF) files, by charging developers a fee for making software using the technology.

Needless to say, the open-source community was less than thrilled with this announcement. In response, Mark, Jean-loup, and multiple digital graphics specialists formed what would become known as the PNG Working Group.

Using Jean-loup’s compression software and Mark’s decompression software, plus technology to make images more amenable to compression, the group developed the PNG file format. The acronym stands for Portable Network Graphics as well as the more blatant PNG’s Not GIF.

PNG

Zlib Co-Creators Mark Adler and Jean-loup Gailly helped develop the PNG image format.

“We came up with a new lossless alternative to GIF that addressed multiple issues, not just the fact that we wanted a license-free, open-source image format,” Mark said. “It also provided transparency, color-map specification, and more depth.”

Ensuring the industry could adopt the new format required more coding, which resulted in the creation of PNG lib (now known as libpng) and zlib.

“PNG lib was software that would construct the format,” Mark said. “It would take whatever input you had, with various optional information such as bit depth, number of colors, and transparency. That library allowed you to construct PNG images, decompress PNG images, and turn them back into raw image information. The other library was zlib, which was used by PNG lib for raw compression and decompression.”

While writing these libraries, Jean-loup and Mark realized that zlib, in particular, could be applied in a much broader context.

Ongoing Enhancements via the GitHub Community

Mark told us zlib hasn’t changed fundamentally since its inception, though there have been portability and compatibility updates over the years.

“The format, of course, is used in many places: in PNG files, HTTP, storage formats, and in many other protocols that are simply transferring data from one place to another to reduce transmission time,” he said. “Since it’s free and anyone can use it, we don’t know exactly how often it’s used, but based on the questions I get, I would say it’s being used very widely.”

Mark hasn’t released an update since 2017, but he said that the open-source community has made changes as recently as in the past few weeks. Industry giants such as Google and IBM have also made improvements over the years for their own use, and Facebook has made progress on the technology to boost performance on the company’s web servers and clients.

“There are other alternatives to zlib, other libraries out there on GitHub with desirable characteristics,” he added. “For example, LZ4 is lossless data compression code that doesn’t compress as well, but it compresses, and especially decompresses, much faster. Or there’s XZ, based on 7-Zip code, which can provide much higher compression but is much slower overall.”

New technologies, such as Zstandard, provide both better compression and speed.

“You could imagine Zlib being replaced by the Zstandard compression — and that may happen, but because of the widespread use of Zlib and all of the formats that it’s used in, it may take a little while for new technology to take over,” Mark.

To Come: Performance Improvements, Incorporating New Standards

It’s hard to believe, but Mark’s work in the data compression space is just a side hobby. After earning his doctorate in physics from the California Institute of Technology, he joined forces with the Space and Communications Group at Hughes Aircraft, where he worked on video compression, error-correcting codes, and the effects of X-ray bursts on satellite cables.

After that, he headed to NASA’s Jet Propulsion Laboratory, where he worked as Lead Mission Engineer on the Cassini–Huygens research mission to Saturn. He was also responsible for planning the Mars Exploration Rover missions and served as Mission and Systems Manager and Chief Engineer for the Mars Sample Return project. Today, he’s working on hardware and technology development at Apple.

“I never intended to work in data compression, but I feel an obligation to keep the project alive,” he said.

Moving forward, Mark said he sees three key areas of improvement for zlib. “The first one is portability,” he said. “It’s extremely portable, but there are stale makefiles and things that need to be done with CMake, Microsoft Visual Studio, and other build systems to make the build process more seamless.”

He said there are multiple possibilities for improving the performance of zlib on certain architectures using assembly instructions, CRC instructions, and accelerators for the Adler-32 checksum, among other technologies. “There’s a lot of things that can and have been done to improve performance of the deflation or inflation code, but those haven’t been integrated into the main Zlib distribution,” Mark said.

There’s also the possibility of incorporating new compression standards.

“For example, Zstandard could be another compression method that’s added to zlib to provide a better performance, better compression, at a faster speed. That’s another longer-term action that could be considered.”

Christine Preusler

Questions or Comments? Ask Christine!

Ask a question and Christine will respond to you. We strive to provide the best advice on the net and we are here to help you in any way we can.