In my mid 20s, I quit my job as a machine learning engineer at Wayfair, locked myself in a basement for 9 months, programming several hundered lines of code per day, all for the sake of my passion for music! I wanted a certain set of music chops and had big problems with music education. I was focused on real-time creativity, composition, and execution - the exciting stuff, not the boring and tedious stuff taught in school. After I broke into the software industry and my skills began to develop, I started to think I could implement some of the wild ideas I was dreaming up, so I decided to quit my job, jump in, and see what I could produce. My goal was ambitious: I wanted to create a software platform that could automate the acquisition of virtuosic musical capabilities to anyone.

There are various important tasks in music, e.g. recognizing notes and chords in audio recordings, generating new notes and chords on the fly, recognizing song structure, ..., but some musicians seemingly have better, easier, more efficient ways of completing these than others. Some musicians randomly choose notes within a scale without knowing what it will sound like beforehand, others already have a sound in their head and know which notes on the fretboard will produce those sounds. Some people use the rules of harmony to draw from a range of sensible chords to play after the current one, others hear a good sounding chord in their head and know the frets that produce those sounds. I dreamt of finding principled ways of completing and training these tasks. I also wanted to explore others, like vocal harmonizing, perfect pitch, the voice-instrument connection, ...

What you see here are some highlights from a demo video I shot of the software prototype that I used to train myself on these various tasks. After a bunch of experimentation, I made progress on some abilities that seemed very out of reach before, including:

Reliable relative pitch: given a bass note sounded on my instrument, and a melody note that I imagined in my head, I could immediately find the fret corresponding to the imagine note with high accuracy
Pseudo-perfect pitch: given I could sing a note, with a high accuracy I could choose the corresponding fret on the guitar +/- one semi-tone of error.
Chord detection: by just hearing the raw audio straight from a bar of a song, I could sing with high accuracy the root of the chord that was sounding, and whether it was major, minor, dominant, …
Given a melody, the ability to harmonize with my voice in real-time
The ability to scat sing a very convincing solo over any chord progression, at least from a “note choice” perspective (my singing voice is not exactly pleasant from a timbre perspective)

. Given competency at enough tasks such as these, I believed you could “weave/compose” them together to achieve more complicated virtuosic tasks, such as composing an entire song from scratch on the fly. Ultimately, working on this project became too much effort to balance alongside of work life and social life, so I had to put it down and focus on other things.

How I approached the problem

I explored the possibility that each of these various musical faculties could be trained like a machine-learning model. If we take that as an assumption, then maybe other things would follow:

There was the problem of how to extract information (concise training data) from a high-dimensional digital artifact like a WAV or MP3 file. When we think about a song, we are interested in details like:

What I created

Demos/Highlights

I ended up putting together a setup that highly simplified the playing of these musical games. If I wanted to send information to my guitar, I could easily specify which fret to send it to (using a high-level programming language like Scala, TypeScript, and Python). Moreover, I could specify which color I wanted the fret to take. The colors were useful 1) to distinguish between different parts when multiple parts were streamed to the fretboard 2) to visualize the "error signal"/feedback during training games (e.g. colors "close" to green were close to the ground truth, colors "close" to red signified the most loss) and 3) when key center information was available, to label in real time the solfege syllable associated with a plucked note.

Here is an example of multiple parts being played on the fretboard, and one part being labelled with the corresponding solfege colors as I play it.

Choosing notes to play on an instrument is really unintuitive if you think about it. Most people can easily and naturally mimic a melody they've heard, but how many of those same people can play it just as fast on an instrument, even if they're professional? Almost none.

In response to this idea, I wrote software to map my voice to its corresponding fret in realtime, in order to make finding a note on the guitar as natural as finding it with my voice.

Before I had thought of using an LED mesh on a real guitar, my idea was to use a touch surface and program any musical interface I wanted (here is one that roughly simulates a guitar). Though the possibilities were endless with respect to designing your instrument, it was too annoying to play everything on a flat, rectangular surface. It wasn't comfortable nor satisfying. I was glad to discover a better way to achieve this...

We listen to songs on YouTube and Spotify, not books of sheet music, so I wanted to be able to extract training material from those sources automatically and quickly - here is an example of doing so from YouTube.

Here's an example of interfacing with other applications - we can also see that the software representation of the automatically transcribed song could be rendered to sheet music, producing a jazz "lead sheet"-like artifact.

I love working backwards when solving problems in general, so I wanted to do the same thing when playing these musical games over songs. Slowing things down is also great, in order to give your brain time to process all this information. As for speeding things up, I was under the impression that by practicing at higher speeds, playing at a regular tempo would be a piece of cake.

Key centers are important, but they can change frequently, especially in jazz, so I wanted a robust method to estimate them with software.

I didn't want to loop over measures, I wanted to loop over meaningful things like phrases, verses, choruses, ... so I used a model that could help me achieve that.

I was really a perfectionist about rhythm - I wanted to be able to loop over a target phrase indefinitely, and get into a trance-like state, so I didn't want any abrupt starting and stopping, pausing, gaps, ...

Of course, not all recordings in the wild keep the same tempo throughout, but I still wanted to automatically produce a "sheet music like" representation of them.

This one didn't end up being too useful, but it was kind of cool... I converted a vocal signal into a Hertz timeseries and fed it into a synthesizer

Here's a few examples of the musical "games" I created and played. Of course there are a lot of things in common with all these games, so I created a software framework that took care of all that common functionality, along with places I could inject my own game-specific rules.

You could play this guitar just like any other electric guitar - of course, playing something like this would be impossible on a touch surface like an iPad.

Since I primarily drove the software with Ableton Live (and Max), I decided to learn how to produce music inside of the DAW as well. I did a deep dive for a couple weeks, and made this proof-of-concept EDM remix to show that you could produce real music alongside these software libraries.

Motivation for Project/Background

How I approached the problem

What I created

Demos/Highlights