Audio Analytic's Alexandria dataset now contains over half-a-billion metadata points

At the heart of Audio Analytic's Alexandria™ data platform is the Alexandria dataset, the world’s largest, commercially-exploitable dataset for audio ML tasks. It now features 40 million labelled recordings across 1,200 sound classes and includes over half-a-billion metadata points.

Alexandria

Image removed.

As you’ll frequently read on the teams research blog, the quality, diversity and range of Audio Analytic's data is integral to their ability to give consumer products a greater sense of hearing. It represents a wide range of hardware, environments, sounds (both target and non-target), and acoustic scenes that can be used to train and evaluate models to hit accuracy expectations and work robustly in all real-world environments when deployed on consumer products.

Importantly, they know everything about the data that they use to develop their models. This full data provenance means that Audio Analytic has all of the permissions required to use the data commercially. They also capture essential information about the recorded subject, distances, microphone sensitivity, hardware, location, etc. That forensic approach to data gives us the most incredibly rich metadata, which plays a key role as models progress through their ML pipeline. It also protects customers from the technical, legal and reputational risks associated with scraping data from unreliable sources like YouTube, etc.

Machine Learning Engineer Cosmin Frăteanu, recently discussed the importance of augmentation, which relies heavily on data quality and metadata. In his blog, he said: “This is yet another reason why you can’t just use any audio dataset found on the internet. Without the metadata, it is simply impossible to simulate in a scientific manner an accurate acoustic scenario with specific hardware and software effects.”

In addition, machine learning is an iterative process. When Audio Analytic evaluate model performance, the metadata provides information on where the model performs well and needs further training. So, for example, if Audio Analytic engineers conclude that a ‘car horn’ model is underperforming at a certain distance in a particular location, they know where to focus their efforts.

In ancient Egypt, the Great Library of Alexandria was established between 283-246 BCE to house every text ever written. Later, the Greek philosopher Galen preserved vital metadata about each text, including the title, author, origin, length, etc. The Greek scholar Callimachus built on this, cataloguing the texts by genre.

These pioneers understood the importance of breadth but also the depth of information. When Audio Analytic gave their dataset its name, it was important to capture the same lofty ambitions for audio.



Read more

Looking for something specific?