Ear training to recognize notes and intervals is a fundamental skill for musicians and very useful for the musically inclined. A solid understanding of intervals and pitches serves as the foundation for more advanced concepts like chords and harmonies. An easy-to-use learning tool built into a web browser allows anyone to improve these skills in their free time without expensive lessons or an instrument.
We believe that the best way to develop one's listening and singing accuracy is to practice both skills. Repeatedly. So, we've created an application that will allow people to quickly and easily train both their pitch recognition and the pitch accuracy of their singing voice.
Training absolute pitch involves recognizing or singing pitches without using a reference note. Relative training, on the other hand, allows for the recognition of notes based on some initial known note. Essentially, this distinction means that one form of training (for absolute pitch) requires the generation or processing of one note, while the other form (for relative pitch) requires two notes.
Before each session, then, the user can specify to either train with pitches or intervals. With pitches, the application will play or request a single note. With intervals, it will play or request a combination of two notes, separated by a half-step, third, fifth, or octave.
After determining if they want to train with pitches or intervals, users wishing to practice their learning recognition should select the "Learn" option. The application will present them with a recording of a note or interval, and the user will attempt to identify it correctly.
If the user incorrectly identifies the pitch/interval, the app will give a notification of how far him or her is from the correct answer. The user can then try again or, at any time, generate a new recording to identify.
The other option is to "Test," which lets users sing in an attempt to match a specified pitch or interval requested by the application.
We made extensive use of the Web Audio API for playback and recording of sounds. The Web Audio API provided an easy-to-use abstraction to access the user's microphone and speakers. The only down-side to using the Web Audio API is the varying level of support between browsers. For example, Chrome requires web pages to be accessed using the HTTPS protocol in order to access the microphone while Firefox does not. Some functions we used are deprecated, but were chosen because their newer versions do not have support on all major browsers.
The piano audio samples used in our application were obtained from Bigcat Instruments3. The piano that produced the sound was a Baldwin Baby Grand.
In processing the user input, there are a few sources of error to consider. Users might not start singing as soon as the record button is clicked, adding silence at the beginning of a recording. Some users may waver in their pitch during the beginning of their recording as they hear themselves and adjust to the pitch they actually desire (pitch gliding). Other users may start shaking or experience dips in their pitch toward the end of the recording as they run out of air. Furthermore, users may sing with vibrato, rapidly oscillating between notes as much as a half step on either side of their intended note. They may also sing in rooms with substantial background noise.
To combat these issues, we trim the audio samples to remove silence and lead-ups or fade-outs, then took the overall median frequency in the recorded signal as the intended pitch. If there is too much noise or variance present in the signal to determine the pitch, then the interface prompts the user to re-record.
In addition to the piano samples played by our interface, we also used a vocal vibrato sample, a vocal pitch glide sample, and a violin vibrato sample. We also built an interface that allows users to easily record and run the various pitch detection methods on a live vocal sample. We also used the Waves6 tuner application, a highly reviewed and used tuning application for phones to independently verify the pitches.
In all of the tests, we found the MPM to be the most consistently accurate. It correctly identified all of the voice and violin samples as well as live singing and humming. While it failed to identify some of the lowest piano samples, it performed well for frequencies within an average human's vocal range, which is about 85Hz (F2) to 180Hz for males and 165Hz to 255Hz for females. No other algorithm performed as consistently well as MPM. Unfortunately, this may have been due to implementation errors in the pitchfinder.js library that we used. GitHub's issues queue showed several outstanding implementation issues raised for the library. Our own testing showed that some algorithms were prone to returning unreasonably high frequencies around 20khz during sounds that were easily detected by MPM.
We compiled the results over all of the pitch samples into the graph below: