Zstandard Compression for Nodejs
Jun 30, 2019
4 minute read

After a number of attempts to work with Zstandard in Nodejs I finally decided to write my own package to handle the problem. Simple-ZSTD is a simple wrapper exposing a transform stream interface by calling the system installed zstd.

Squash to ZSTD

Some time ago I was faced will a compression problem in an application. Having never really look at different compression algorithms in depth I knew I needed to take the time to understand the solutions that exist and came across this library on Github. Squash is an abstraction layer written in C for a number of different compression libraries. As a result they created on of the best benchmarking sites I have ever seen. Long rabbit hole later looking at round trip analysis on this tool I decided ZSTD was the right compression tool for my application.

ZSTD on NPM

Meeting my expectations there were a number of packages on NPM interfacing to ZSTD. After reading and testing node-zstandard and node-zstd I opted to use node-zstd due to the better stream interface, the native bindings and a newer (faster) version of ZSTD.

Issue 1 - Node 10

I ran into my first major issue when attempting to update to Node 10. The native binding api changed breaking the package. Having seen this happen with nodegit I wanted to understand these native bindings better. Looking at it as a learning opportunity I spent a couple of evenings fixing the binding and updating to a newer version of ZSTD. You can take a look at the pull request here. Sadly the maintainer appears to inactive and to this date the PR sits idle. I continued with my project installing my own version of the node-zstd directly from Github. On a side note, as a first for me I noticed someone forked my work and continued to update the package. :)

Issue 2 - Pkg

My next issue arose when I was attempting to package my work with pkg. Native binding support for pkg is difficult and I decided to investigate node-zstandard’s implementation of interfacing with the binary. Replacing a native binding with child process call I wouldn’t have to worry about a number of unknowns I had around pkg and native addons.

simple-zstd

Having gained better perspective on the problem I decided I wanted to take what I believed to be the best traits of these two packages and create my own. Node-zstandard’s child process call to a binary and node-zstd’s interface of exposing a transform stream. Finally the name was inspired by simple-git which is a git binding to the system installed git.

duplex-child-process

I knew that a child process exposes streams for stdin and stdout and from the initial design thoughts I knew I wanted wrap those into a singular stream object. While investigating how to wrap these correctly I came across duplex-child-process. This is exactly what I was attempting to do to interface with the package and decided to depend on it for my implementation. As a result simple-zstd is, while, simple coming in at a whopping 32 lines of code the supporting package.json is larger. :S

Performance Testing

ZSTD is all about speed! As a result I was curious as to which implementation was fastest. I put together a simple script that compares compressing and then decompressing a folder for gzip and the zstd packages at play. To my surprise simple-zstd was the most performant! Of course this is going to be dependant on the system installed binary but when we compare the native binding to zstd v1.3.4 (node-zstd) to the system binary 1.3.3 (simple-zstd) we see an improvement in speed. If anything we should see a performance decrease as version v1.3.4 boast an improvement in speed over v1.3.3. I suspect that there are some inefficiencies in the native binding implementation. ZSTD continues to get faster with every release and at this time the latest (1.4.0) appears to be significantly faster than the 1.3.x version especially when compiled with the correct tools. If someone needed even more performance out of simple-zstd they could compile a new version and replace their system binary!

Results TLDR

Simple-zstd solves both of my issues and I do not have to worry about future native bindings api changes or other unforeseen issues around native addons. As zstd is available in many package repositories most systems can install it with little to no effort. The implementation is the fastest I have tested and makes it easy to swap out the version of zstd for the new shinny (faster) version as this implementation decouples zstd from nodejs and the package.



comments powered by Disqus