The Science Behind Music Recognition

Source: https://www.radio10.nl/

Last week the Dutch radio station Radio 10 presented the ‘Week of the 90s’. Legendary bangers of that decade resounded through my living room in full glory. Most of them I recognized as I am a 90s kid myself. While singing along to the tune of ‘Never Be The Same Again’ I realised that I had no clue who actually sang that song. So, I went and searched for the lyrics online, and as it turns out, my phonetic interpretation of the lyrics did not provide the results that I hoped for. The solution: Shazam!

Source: https://www.shazam.com/

Shazam is a music identifier that recognizes music and TV around you. A simple press on a button on the middle of the screen enables the app to listen to your surroundings. Shazam started in 2000 as a company, and the application was already up and running in 2002. For the application to work then, one had to press ‘2580’ and hold their phone close to the source of music. The user would then receive a text message with the name and artist of the song. Nowadays, Shazam has developed into an application that can ignore background noises and can even identify TV series. Shazam provides additional information for TV and music. The app is, as they state themselves; “the start of a journey”. 

So, as it happened, I started my Shazam journey during the week of the 90s when I pressed the big Shazam button in the middle of my screen. The app was loading up when I wondered how Shazam actually managed to recognize a song by just a snippet of said song. Thus I decided to do some research to satisfy my (and hopefully your) curiosity concerning the music identifier app. 

Digital Fingerprints

When you press the button and try to identify the music, the application creates a digital fingerprint of the audio you send into the app. This fingerprint is then matched to a database of millions of files and before you know it, you are provided with the title, artist, producer, year of release, album, and everything else that you may not even have wanted to know about the song. This sounds good in theory, but how does this work for entire songs? Only a tiny snippet of maybe 4-8 seconds is required for Shazam to identify a song (no matter at what point of the song you use the app). An article by Quora elaborates on this and states that Shazam’s approach to music identification was long deemed computationally impractical. Therefore, Shazam started using the creation of spectograms songs.  A spectogram is a visual representation of frequency vs. amplitude vs. time. The high peeks of the spectogram (the most high-energy points of a song) are used to build the digital fingerprint of the song. According to Quora, this goes up to three data points per second, per song.  

Example Spectogram. Source: organ.chorale.bach

Avery Wang, the interviewee in the Quora piece elaborates further on this in her article. She states that Shazam was looking for an algorithm that was able to search in audio datafiles. The digital fingerprints that are created by the use of the spectogram, are able to be cross-referenced with the database within 5-500 milliseconds. Shazam truly revolutionized the world of audio recognition, it will truly ‘Never Be The Same Again’. (which is by Mel C, by the way!)

References

  • Sharma, Yakup. 2016. In Answer To: “How does Shazam work? What is the logic behind Shazam tracing out the exact song by just a sample of it? Quora. Accessed September 21, 2018. https://www.quora.com/How-does-Shazam-work-What-is-the-logic-behind-Shazam-tracing-out-the-exact-song-by-just-a-sample-of-it 
  • Wang, Avery Li Chun. 2003. “An Industrial-Strength Audio Search Algorithm.” Shazam Entertainment Ltd.: 1-7.