Chris Chen

Holliday Shuler

William Xiao

Professor: Bryan Pardo of Northwestern University

EECS 352: Machine Perception of Music and Audio

Final posterGitHub link

Train Your Pitch

Very few people are born with absolute (or perfect) pitch. However, research1 has suggested that absolute pitch recognition can be learned to some degree. In fact, relative pitch (the recognition of pitch intervals using a reference note) can be developed even more easily.

The Motivation

Ear training to recognize notes and intervals is a fundamental skill for musicians and very useful for the musically inclined. A solid understanding of intervals and pitches serves as the foundation for more advanced concepts like chords and harmonies. An easy-to-use learning tool built into a web browser allows anyone to improve these skills in their free time without expensive lessons or an instrument.

Our Solution

We believe that the best way to develop one's listening and singing accuracy is to practice both skills. Repeatedly. So, we've created an application that will allow people to quickly and easily train both their pitch recognition and the pitch accuracy of their singing voice.


1. Van Hedger, Stephen C., et al. "Auditory working memory predicts individual differences in absolute pitch learning." Cognition 140 (2015): 95-110.

Using this Application

Users will have the option of either listening to a sound and classifying its pitch/interval, or recording their own voice in an attempt to match a specified pitch/interval.

Interval or Pitch Training

Training absolute pitch involves recognizing or singing pitches without using a reference note. Relative training, on the other hand, allows for the recognition of notes based on some initial known note. Essentially, this distinction means that one form of training (for absolute pitch) requires the generation or processing of one note, while the other form (for relative pitch) requires two notes.

Before each session, then, the user can specify to either train with pitches or intervals. With pitches, the application will play or request a single note. With intervals, it will play or request a combination of two notes, separated by a half-step, third, fifth, or octave.

Learn by Ear

After determining if they want to train with pitches or intervals, users wishing to practice their learning recognition should select the "Learn" option. The application will present them with a recording of a note or interval, and the user will attempt to identify it correctly.

If the user incorrectly identifies the pitch/interval, the app will give a notification of how far him or her is from the correct answer. The user can then try again or, at any time, generate a new recording to identify.

Test by Voice

The other option is to "Test," which lets users sing in an attempt to match a specified pitch or interval requested by the application.

How We Did It

We used a lot of different libraries, man.

The Machine Processing

We made extensive use of the Web Audio API for playback and recording of sounds. The Web Audio API provided an easy-to-use abstraction to access the user's microphone and speakers. The only down-side to using the Web Audio API is the varying level of support between browsers. For example, Chrome requires web pages to be accessed using the HTTPS protocol in order to access the microphone while Firefox does not. Some functions we used are deprecated, but were chosen because their newer versions do not have support on all major browsers.

All of our pitch tracking is performed directly in the browser through Javascript, eliminating the need for server-side processing. We used an implementation of the McLeod Pitch Method algorithm implemented in the pitchfinder.js1 library.

The User Interface

Our web application's user interface is created directly in native HTML5, CSS, and Javascript. We used the Materialize CSS framework2 to create responsive, interactive, and attractive pages.

The Audio Samples

The piano audio samples used in our application were obtained from Bigcat Instruments3. The piano that produced the sound was a Baldwin Baby Grand.


1. pitchfinder.js

2. Materialize

3. Bigcat Instruments piano samples


The flow chart above describes the steps taken to process the Learning (listening) portion of the application.


This second flow chart describes the steps taken to process the Testing (singing) portion of the application. This section actually involves machine processing of the audio.

Product Testing

There are two factors that affect the effectiveness of pitch tracking - the user and the software.

Complications with User Input

In processing the user input, there are a few sources of error to consider. Users might not start singing as soon as the record button is clicked, adding silence at the beginning of a recording. Some users may waver in their pitch during the beginning of their recording as they hear themselves and adjust to the pitch they actually desire (pitch gliding). Other users may start shaking or experience dips in their pitch toward the end of the recording as they run out of air. Furthermore, users may sing with vibrato, rapidly oscillating between notes as much as a half step on either side of their intended note. They may also sing in rooms with substantial background noise.

To combat these issues, we trim the audio samples to remove silence and lead-ups or fade-outs, then took the overall median frequency in the recorded signal as the intended pitch. If there is too much noise or variance present in the signal to determine the pitch, then the interface prompts the user to re-record.

Pitch Detection Testing Methods

We ran several tests to pick a pitch detection method and to verify the accuracy of our chosen pitch detection method. The four pitch detection methods we tested were Average Magnitude Difference Function1 (AMDF), YIN2, Dynamic Wavelet3 (DW), and McLeod Pitch Method4 (MPM). All of these algorithms were implemented in Javascript in the pitchfinder.js5 library.

In addition to the piano samples played by our interface, we also used a vocal vibrato sample, a vocal pitch glide sample, and a violin vibrato sample. We also built an interface that allows users to easily record and run the various pitch detection methods on a live vocal sample. We also used the Waves6 tuner application, a highly reviewed and used tuning application for phones to independently verify the pitches.

Accuracy of the Pitch Tracker

In all of the tests, we found the MPM to be the most consistently accurate. It correctly identified all of the voice and violin samples as well as live singing and humming. While it failed to identify some of the lowest piano samples, it performed well for frequencies within an average human's vocal range, which is about 85Hz (F2) to 180Hz for males and 165Hz to 255Hz for females. No other algorithm performed as consistently well as MPM. Unfortunately, this may have been due to implementation errors in the pitchfinder.js library that we used. GitHub's issues queue showed several outstanding implementation issues raised for the library. Our own testing showed that some algorithms were prone to returning unreasonably high frequencies around 20khz during sounds that were easily detected by MPM.

We compiled the results over all of the pitch samples into the graph below:


1. Efficient algorithms for speech pitch estimation

2. YIN, a fundamental frequency estimator for speech and music

3. Real-Time Time-Domain Pitch Tracking Using Wavelets

4. A Smarter Way to Find Pitch

5. pitchfinder.js

6. Waves - Tuner