Audio profile configuration for the masses

Published on Saturday, January 6, 2007

Welcome to the second part of the "Things you may not know about Banshee" series of posts, where I highlight some cool features about Banshee that have been introduced in the 0.11.x series. I'm making up for all the blogging I haven't done in the last 4-5 months.

Just before the GNOME Summit last year, I started working on a user friendly way to perform audio profile configuration. For example, selecting the desired bitrate for an MP3 stream. I had a few goals in mind at the time:

  1. Must be audio framework agnostic. Banshee supports both GStreamer and Helix, and this needs to work with both frameworks without any issues. The user should be able to configure profiles using the same interface and not know which audio framework will be doing the heavy lifting. Essentially, the audio profiles framework should not actually need to know anything about specific audio frameworks. Ever.
  2. Must provide a straight-forward, sensible user interface for configuring complex pipelines. The primary point here is that the user should never have to edit a raw pipeline. A user should not have to "know GStreamer" to change their desired encoding bitrate. The current GNOME audio profiles editor
    The current GNOME audio profiles editor :-(
  3. Multiple configurations should be supported on the same profile. Profiles are things like "MP3", "Ogg Vorbis", "FLAC". The profile contains the pipeline and interface description. Configurations are sets of values that can be merged into and saved from a profile. This allows a user to configure a 128 Kbps MP3 encoding setting for their iPod and a 192 Kbps encoding setting for ripping CDs to their local library. Each configuration uses the same base profile, but its settings are different.
  4. Never show profiles that the user won't be able to use. Not all users have the necessary components installed to be able to encode AAC or MP3 for example. Profiles for these formats should be provided, but if they won't be able to run, they should not be shown. This means profiles should be tested against their default configuration values before ever presenting a user interface.

With all this in mind, I set out to write the beast. The user interface is defined in XML. Variables define a UI control type, possible values, etc. "Processes" are also defined in the XML with an audio framework ID, for instance "gstreamer." For GStreamer, the process is the pipeline definition.

However, as of early this morning, the process definition is now an S-Expression. Before, it was simply a pipeline string that had $variables in it, which would be expanded based on the user configuration.

Since the GNOME Summit, I have been working with this profile stuff on and off. It's been functional since Banshee 0.11.2 (a few months ago), but has been evolving in various ways since then. During this time it was clear that more expressiveness was needed for generating the actual process/pipeline definition. For example, in GStreamer if a user chooses to use VBR in LAME, the xingmux element should be added to the pipeline. However, xingmux is in gst-plugins-bad, and chances are not many users actually have xingmux. This means xingmux should only be appended to the pipeline if VBR is enabled and xingmux is actually available. Other reasons for needing more expressiveness are arguments for GStreamer elements that may be mutually exclusive. If I use mode X, I must provide arguments A and B but not C. If I use mode Y, I must provide arguments C but neither A nor B.

Last night I decided I needed to write an S-Expression evaluator to make this expressiveness a reality. 10 hours later, we now have SExpEngine, and it can do some really cool things. Functions are very easy to add to it and there are a number of built-in functions for logic, conditionals, comparisons, casting, arithmetic, and strings. It also supports variables, which can either be value types or a callback method that returns a tree.

I added a function to allow process S-Expressions to test sub-parts of a pipeline before merging it in to the resulting/final pipeline (think xingmux, from above). Additional GStreamer functionality can be added to build variations of a pipeline based on available elements, differences in GStreamer versions, etc. S-Expressions mean configurability (woo - fake words), reliability, and compatibility.

The result is something I'm quite happy with. For example, here is the S-Expression for the GStreamer LAME process:

+ "audioconvert ! "
"audio/x-raw-int,rate=" $sample_rate ",channels=" $channels " ! "
"audioconvert ! "

"lame mode=" $mode " "

(if (= $vbr_mode 0)
(+ "bitrate=" $bitrate)
(+ "vbr-mode=" $vbr_mode
" vbr-quality=" (- 9 $vbr_quality)
(if (gst-element-is-available "xingmux")
" ! xingmux"
""
)
)
)

(if (gst-element-is-available "id3v2mux")
" ! id3v2mux"
" ! id3mux"
)

To make sense of the variables, take a look at the full XML LAME profile..

Now, getting back to the "okay, why should I, as a user, care" side of things, I'll close the post with a screencast (ooooh, fancy, I've never done one of these!) that shows all of the profile stuff in action. For the sake of also demoing how the S-Expression evaluates into a proper GStreamer pipeline, I ran Banshee in debug mode for most of the screencast, which shows a text view and a "Test S-Expr" button. Rest assured, if you're running Banshee like a normal user, you'll never see this part of the profile configuration dialog :-).

Banshee's audio profiles
What I hope one day can replace the GNOME audio profiles editor so applications other than Banshee can take advantage of the sweetness. Click the screenshot to watch the screencast. (Ogg/Theora).

I'm still working a lot of things out with this, but it's my hopes to some day make this work outside of Banshee. It's written with that in mind. At the very least, I'd like to make the XML profile and S-Expression format some kind of standard.