New Bamboo Web Development

Bamboo blog. Our thoughts on web technology.

getUserMedia on the server, with Sinatra and Say Cheese

by Lee Machin

Say Cheese is a small library that makes it easier to integrate webcams into your website or app, using the recent getUserMedia API. It opens up a whole range of opportunities for web developers, provided their users run modern and up to date browsers (sorry Safari, IE).

One such opportunity is working with the user's webcam stream on the server side. This could be as simple as sending a single frame (what SayCheese calls a snapshot), or pushing the entire stream itself with, say, websockets. I'll start simple and show how you can send a 'snapshot' to the server via AJAX, and then do something interesting with it.

Snapshots via AJAX

You will need:

  • A web server (local or otherwise), Ruby environment optional
  • A copy of the example code
  • Google Chrome or Opera (as they work best)

Here's one I made earlier

http://saycheese-mustache.herokuapp.com

The server

At New Bamboo we roll with Ruby, so Sinatra is the minimal web framework of choice. If Ruby is too high level, feel free to use Node and Express... [1]

For now, open up lib/saycheese-mustache.rb in the example code to see what the server will do. The post '/image' route is of particular interest:

1 post '/image' do
2   name  = "#{Time.now.to_i}.png"
3 
4   ImageStore.store name, image
5   image_url = ImageStore.find(name).url
6   "http://mustachify.me/?src=#{image_url}"
7 end

The easiest way to get the image from the client, to the server, is by sending a base64 encoded string containing the data. Once decoded, you have the image in binary format, and can write it to a file...

A detour into saving data

... And I would have done exactly that, had I not forgotten that Heroku doesn't allow you to write files. The image is sent to an S3 bucket instead, using this class (in lib/saycheese-mustachify/image_store.rb):

1 class ImageStore < AWS::S3::S3Object
2   set_current_bucket_to ENV['S3_BUCKET_NAME']
3 
4   def store(name, image)
5     super name, image, :access => :public_read
6   end
7 end

I used aws-s3, which allowed me to store stuff on S3 without too much faff. I could keep the specifics out of my route, and change their behaviour a bit. In this case, I redefined store so it always made the images public.

(If you use rbenv then rbenv-vars is great for ensuring your environment variables are set, without committing them to the repo.)

This isn't about the trials and tribulations of writing a cloud-based app though, so back to the real code.

Moustachifying

I'd like to call the process I went through to create this app Bone Idle Driven Development, because at each step I thoroughly considered what would be quickest and easiest to write. This attitude is pervasive throughout the code, and also in the chosen concept: adding moustaches to people's faces.

1 image_url = ImageStore.find(name).url
2 "http://mustachify.me/?src=#{image_url}"

You'll notice that the request responds with a URL to mustachify.me that has the fresh S3 image URL appended to the query string. The webcam image has now been mustachified, and is ready to send to the client.

The rest of the Ruby code isn't of much concern, and doesn't have anything to do with the webcam functionality.

The client

The client-side javascript is unfortunately a little more involved than the Sinatra app. First, the webcam has to be set up (and access has to be granted by the user), and then an AJAX request has to be made when the user clicks the 'Snap!' button.

Open up index.html in the example project root, and skip to the inline JS at the bottom.

Say Cheese makes half of this process a lot simpler than it otherwise would be, and all the basic functionality is encapsulated in the following code:

 1 var webcam = new SayCheese('#webcam-container');
 2 
 3   webcam.on('start', function() {
 4     $('#snap')
 5       .attr('disabled', false)
 6       .on('click', function() {
 7         webcam.video.pause();
 8         return webcam.takeSnapshot();
 9       });
10   });
11 
12   webcam.start();

The last line does all the hard work behind the scenes. It makes sure the browser supports webcams, the user has granted access to their own, and that the webcam is started in the correct way depending on the browser. getUserMedia is still in its infancy, but this code is designed to be forwards compatible with the finalised implementations. The point is, you don't have to care about it, because SayCheese does it for you.

The function in the middle simply enables the 'snap!' button on the page when the webcam is ready to be used. This button is used to call our Sinatra app.

This is what makes the request when the button is clicked:

 1 webcam.on('snapshot', function(evt, snapshot) {
 2   var req  = new XMLHttpRequest(),
 3       img  = snapshot.toDataURL('image/jpeg').split(',')[1];
 4 
 5   req.open('POST', '/image');
 6 
 7   req.onload = function(evt) {
 8     var url = this.response,
 9         img = document.createElement('img');
10 
11     img.onload = function() {
12       webcam.video.play();
13       return $('#results').prepend(this);
14     }
15 
16     img.src = url;
17   };
18 
19   var data = new FormData();
20   data.append('img', img);
21   req.send(data);
22 });

It seems to do a lot, so let's split it up.

the snapshot event

1 webcam.on('snapshot', function(evt, snapshot) {
2   ...
3 });

The snapshot event is triggered every time the function takeSnapshot is called, and the result is ready to use. We hook into this particular event, so we can do interesting things with the canvas element it returns. In this case, we're making an XMLHttpRequest, so we can send the canvas' image data.

getting the image data

1 var req  = new XMLHttpRequest(),
2 img  = snapshot.toDataURL('image/jpeg').split(',')[1];

Every canvas element has a function called toDataURL, which returns a base64 encoded string representing the canvas' pixel data. You can copy and paste the output into your location bar, and see the image straight away. We're not interested in the full URL though; just the base64 encoded string that appears after the first comma.

making the request

There are two parts to making an XMLHttpRequest. The first part involves building the request, the second part involves dealing with the response.

1 req.open('POST', '/image');
2 
3 ...
4 
5 var data = new FormData();
6 data.append('img', img);
7 req.send(data);

I've used the FormData API for this, because it gets the job done without having to mess about with setting headers by hand. It's essentially an object representation of an HTML form, without the HTML, and makes your AJAX uploads ridiculously simple.

Either way, that sends the request, but without a callback the response will vanish into the ether. Enter onload:

 1 req.onload = function(evt) {
 2     var url = this.response,
 3         img = document.createElement('img');
 4 
 5     img.onload = function() {
 6       webcam.video.play();
 7       return $('#results').prepend(this);
 8     }
 9 
10     img.src = url;
11   };

Jumping back to the Sinatra app for a second, the API responds with this:

1 "http://mustachify.me/?src=#{image_url}"

The response is a plain text string, and so we don't need to do anything fancy to make it usable. The next step is to create an image element, and set the source to be that URL. The onload callback on the image is triggered when the image has been downloaded, and it's safe to assume at that point that we can add it to the DOM and it'll appear on the page.

Unless you're camera shy, or Mustachio couldn't find where to put a 'tache, you should see a picture of yourself with some freshly cultivated facial growth when this is invoked.

And you're done!

The future, and WebRTC

This is just one thing you can do, in one way, to work with data from a webcam stream on the server side. You might not need to make an HTTP request, when you can stream data over a web socket; and you might prefer to work with binary data or byte arrays rather than encoded strings.

What else could you do? Maybe people don't need to upload a photo to set an avatar anymore, if you can snap a picture from their webcam. If you get into streaming, you can perform some sort of analysis on the webcam data, or make a game around it. You can capture the audio from the stream and do similar.

As WebRTC support improves in browsers you'll even have the ability to share webcam feeds between clients (with PeerConnection), and with it the opportunity to create a built in Skype competitor with real time moustache support.

Either way, though, I hope you'll now have a good idea of where to start, and how Say Cheese can simplify part of the process.

[1] https://twitter.com/shit_hn_says/status/234856345579446272