Tarballs - Why?

Published on Thursday, July 22, 2010

More and more I begin to wonder why we generate tarballs at all these days. Is it just because it's easy - a function of "make distcheck"? There's certainly value in the actual distcheck process to ensure you have a sane build, but why actually distribute the tarball? What's the meaningful difference between a tarball and a git tag?

Now, I won't even touch on the subject of how badly I want to throw autotools in the trash, but we're so entrenched in its ways, and are comfortable with its quirks that energy is better spent on actual improvements, so for now the distcheck process is here to stay. For now.

So I ask a very serious question, others have asked as well - why publish tarballs? Most users get their packages in binary form from their distribution. Most users who build from source I would argue are using git already, or have git installed on their system, or can easily do so. Providing instructions on cloning/checkout out the tag/building using autogen/autoreconf/etc can be provided easily and clearly.

I migrated Banshee to Linode and consequently from Apache to lighttpd about a month ago. The logs start on June 20, 2010:

    $ grep -E 'banshee-1.+\.tar\.(gz|bz2)' \
download.banshee.fm.access.log | wc -l
7066

So in one month, we've only had 7066 tarball downloads, and that accounts for any and all released versions of Banshee over the past 5 years. Certainly the bulk of those downloads would be version 1.6.1, since that was the newest available tarball over the last month. 284 of those downloads were version 1.7.3, released less than 24 hours ago. I could generate better statistics, but that's not the point here. The point is that number is pretty small compared to the reach of the distributions.

I roughly estimate the average size of a Banshee tarball (bzip2) is 3MB. Eliminating tarballs would save us 20GB/mo in bandwidth - and that's during a quiet time in development when the servers are less active (1.6.1 was released in May). We'll be seeing a spike I'll be monitoring around 1.7.3.

So, if we ditched tarballs, how would you be affected? Would you care?

Update: to clarify a few things, you would still build and install like normal. For instance:

$ ./autogen.sh --prefix=$HOME/local --disable-whatever \
&& make && make install

Packagers would however have an additional minor burden. If their package system (e.g. rpm) requires an archive (e.g. can't build directly from git), then the packager would be responsible for creating an archive. They could either just archive the git clone directory, or actually run their own "make distcheck" from their clone. It would be up to the packager to best integrate the git clone into whatever system they are using.