New Bamboo Web Development

Bamboo blog. Our thoughts on web technology.

Experiments with the HTML5 Audio Data API

by Oliver

About a month ago we had a hack day at New Bamboo, a day where all the bambinos were free to work on any kind of project they wanted to. I decided to experiment with the HTML5 Audio Data Api, and canvas.fm, (code) is the outcome of that experiment.

Canvas.fm is a small app that allows you to listen to music available on SoundCloud. Whilst the audio is playing it draws a visualisation of the current song using HTML5 canvas. It uses a server component, powered by Node.js, to convert SoundCloud's streaming MP3 audio into the OGG format - which Firefox supports.

Audio Data API

Browser specifics

The audio data API was originally introduced in Firefox 4 in the second half of 2010. Using this API it is possible to read & write audio data from both audio and video elements.

It is worth noting that more recently a webkit audio data API has also emerged. The Mozilla implementation is, at the time of writing, significantly simpler and comes with much more comprehensive documentation and examples.

Because I originally only had a day to put something together I decided to focus on the Mozilla implementation of the audio data API. I hope in the future to be able to make canvas.fm work on both webkit and firefox.

Implementation Details

There are a number of events that the Mozilla audio API gives us which we can hook into to manipulate the audio data. In canvas.fm we are only reading audio data so we are primarily interested in the loadedMetaData event and the MozAudioAvailable event.

Loaded Meta Data

The loaded meta data event is fired, as the name suggests, when the browser has loaded the meta data for this audio element. This event is not specific to the audio data API, but it signals that we can now gain access to some meta information about the audio that we will later need to process the data.

Specifically these are:

  • the number of channels available in the audio source — 1 for mono, 2 for stereo etc.
  • sampleRate — the number of samples per second in the audio which is typically around 44.1KHz.
  • frameBufferLength — this will be discussed in the next section.

MozAudioAvailable

This event is where the bulk of the processing happens. The event handler is passed the frame buffer and the time in the audio track that the first element of this frame buffer represents.

The frame buffer is a typed array of floats that represent the decoded audio data. Each element of the array represents a tiny sample of the audio that is being played, with values normalised between 1 and -1. The values actually represent the sum of all the amplitudes of frequencies for a particular channel at a particular point in time.

The length of this array can be obtained from the audio elements frameBufferLength that we grabbed in the loadedMetaData event handler. The full length will be this value multiplied by the number of channels because the channel data is actually interleaved with each other, e.g. [channel1, channel2, channeln, channel1, channel2, channeln...].

Fast Fourier Transform

Ultimately we want the amplitudes of the various frequencies that make up the audio being played, rather than the sum of all the frequencies for a channel, which is what we've currently got.

This transformation is possible using Fourier Analysis, the detail of which is in the realm of physics and way beyond the scope of this article. Essentially we are transforming the original data, which is in the time domain, into what we need to graph, which is data represented in the frequency domain.

A Fast Fourier Transform algorithm, implemented in JavaScript, does all the hard work for us. What we end up with is an array of frequency amplitudes, which is precisely what we need to draw our visualisation.

Visualisation

An obvious choice for visualising our audio data would be a very familiar graphic equaliser. There are already several of these around, and although they are a good first step I wanted to do more. These kind of visualisations only allow you to see what is happening now, without giving an impression of the song as a whole.

I decided to use a variation on the traditional graphic equaliser. I wanted to draw a disc, showing the various changes in frequency strength throughout a certain piece of music. In the end you would end up with a circle, showing the frequency shifts and variations of the whole song, hopefully it would also produce a visually interesting image.

To make this possible without having to keep state of the whole frequency spectrum for a whole song, we can take advantage of canvas transformations. The canvas code that is doing the drawing doesn't change its position as the audio is being played. Instead we rotate the canvas to prevent the canvas overwriting existing lines, and to give us the circular image that we want.

1 var draw = function (samples) {
2   samples.forEach(function (sample) {
3     intensity = sample * 10
4     ctx.fillStyle = 'rgba(0,0,0, ' + intensity + ')'
5     ctx.fillRect(0, 512 - i, 1, 1)
6   })
7 }

The above snippet shows the draw function that takes care of drawing on the canvas drawing context, ctx. On each sample of the song we rotate the canvas by an amount that means the canvas is rotated a full 360°, or 2π radians, by the time the audio is complete.

1 var rotate = function (rad) {
2   ctx.rotate(rad)
3 }

Streaming Audio

The SoundCloud API is very simple and, most importantly for canvas.fm, allows us to get access to a streaming MP3 of any track available on the SoundCloud service. Unfortunately Firefox doesn't support the MP3 audio format due to licensing issues.

Thankfully it is relatively trivial to convert MP3s to Firefox's preferred audio format, OGG. Since canvas.fm allows users to search and choose from the entire SoundCloud music catalogue, we cannot pre-convert all of these audio streams.

Instead, Canvas.fm uses Node.js to handle SoundCloud's incoming MP3 stream, and then pipes this into ffmpeg, giving us an OGG formatted audio stream which we can then pipe into an HTTP response. Using Node.js makes this kind of stream handling incredibly simple and efficient.

 1 var track = Track.create({id: request.params['track_id']}),
 2     converter = Converter.create();
 3 
 4 track.stream(function (err, trackStream) {
 5   if (err) throw(err)
 6 
 7   trackStream.pipe(converter.process.stdin)
 8   converter.process.stdout.pipe(response)
 9 })

Above is an excerpt from the HTTP request handler in the canvas.fm Node.js app. track is a simple wrapper around the SoundCloud API. Calling the stream method makes a few HTTP requests to SoundCloud's API and then yields the MP3 stream to the callback.

Inside the callback we use the pipe method built into Node.js to take the MP3 stream and pipe it into the converter object. This is just a wrapper around an ffmpeg child process. We can then pipe the output of this ffmpeg process back into the response.

The pipe method takes care of any complexity in handling these streams, making sure that they are always kept in-sync by pausing, draining and resuming streams as required. All this means that, regardless of the size of the MP3 file from SoundCloud, the memory footprint of the Node.js process doesn't grow in size.

Conclusion

Combining these technologies made it fairly simple to put together a fairly sophisticated looking visualisation. Below are a few samples of the images that canvas.fm can produce.

la mezcla
hot chip
tensnake