Week 2: Feature Extraction Photo Booth

Also my first sketch using ml5.js, a friendly machine learning library for the browser built and maintained by my colleagues at ITP.

I’ve been looking forward to training neural nets to respond to images, in other words, to create my own custom classifiers. Just my luck that many of the ml4a demos incorporate video from the laptop’s webcam as a data source, and after sampling these for project ideas, I realized this:

The output of these demos looks and sounds nearly the same no matter who uses them.

And remembered this:

The input is just as important as the output (thank you, Collective Play).

So how might my project incorporate expressive input to create meaningful output for the user? Could I create a photo booth that builds a new portrait based on different facial expressions?

The Feature Extraction Photo Booth uses transfer learning to train on images for which it recognizes different features (such as those present in various funny faces) and when it “sees” one of those faces, it copies a section from the video and pastes it into an empty area of the canvas. So each facial expression is mapped to copy a different segment of the video into the new portrait. Currently the user may train the model on three different facial expressions (or three entirely different faces, I suppose). This process happens repeatedly and eventually a new collaged portrait emerges. The more exaggerated the expressions the better! So yes: a photo booth that use feature extraction to extract features (but really to screen grab areas where features are likely present).

Notes on my process:

  1. First I wrote a sketch in P5 to help me remember how make images from areas of the canvas where the video displays and to design the basic layout that I envisioned: live video on the left (along with the buttons to train the my model) and the resulting portrait on the right. Of note: this used P5 version 0.7.1.

  2. Then I dug into the ml5 code and this example to learn how to build my own custom classifier based on feature extraction. As indicated in the documentation, the featureExtractor() “class allows you to extract features of an image via a pre-trained model [in this case, MobileNet] and re-train that model with new data [in my case, different funny faces].” Of note: this used P5 version 0.6.0.

  3. Finally, I combined my original P5 sketch with the classifier sketch. And it worked! Sorta.

    1. First it was waaay to fast. Once the classifying started, new images were popping up exceedingly too fast to create an interesting new portrait. It took me some time identify how to modify the code to slow this process. I tried creating timers in various locations, but eventually realized that I could start a loop to run from classify() to gotResults() and back. Within gotResults() I inserted setTimeout() to delay the call to classify() by a particular amount of time. (I can also added a counter here to eventually stop the whole thing.)

    2. Also, the copied images were landing random locations in the portrait area, and it quickly turned into a nondescript composition. So I changed the sketch to copy and paste segments into even vertical thirds, but the copied areas came out distorted. Through troubleshooting I realized that the distortion only occurred when using P5 v0.6.0. However, upon changing it to v0.7.1, the sketch breaks… My workaround right now is to keep it at 0.6.0 and adjust the size of the screen captures from the video. This kinda gets me there depending on my monitor resolutions…ugh. SOLVED! (see below)

There are many improvements yet to make. Here’s my punch list:

  • Figure out the snafu from working with different versions of the P5 library. SOLUTION: change mousePressed() to mouseReleased() when using P5 v0.7.1.

  • Refactor my code!

  • Styling!! Oh geez, my fluency with HTML/CSS is still nearly nil. But it’s projects like these that motivate me to work on it.

  • Display the video and portrait in frames that retrain the same proportions no matter the width or height of the browser window and automatically adjust accordingly.

  • Mirror the video playback. Currently the image is reversed. I’m familiar with the operations need to make this happen (see Painting Mirror): it requires flipping the underlying grid of the canvas and moving the origin to the upper right right. But this impacts all the other layout calculations, and I just ran out of time here.

  • How do I clear my model without having to refresh the browser page to reload a fresh, empty one to train?

    • SOLUTION from Reference: classifier.customModel = null

    • and noting that classifier.classify will always set classifier.isPredicting = true, so I wrapped .classify into a boolean to stop the automatic predicting until I started training again.

    • also, I got better results adding classifier.hasAnyTrainedClass = false in my clearModel() function and classifier.hasAnyTrainedClass = true to the before classifier.train. Before this update, it seemed that the model was remembering residual training data, but I should revisit this…

Try it here!
Feature Extraction Photo Booth (reload browser to restart)
Feature Extraction Photo Booth with Restart