/docs · SignalLab · Intro
A useful tag schema for audio archives
What to put in your audio metadata so downstream tools can actually use it.
Most audio archives die from the same problem: bad metadata. Files are uploaded with final_v2_FINAL.wav and a vague folder name, and a year later nobody can find anything.
A tag schema that survives is one with a few firm rules and a lot of forgiveness on everything else. Here’s the schema SignalLab emits for every indexed file, and why each tag earns its place.
The core tags
Every tag follows the pattern namespace:value so downstream tools can filter and group cleanly.
content:*
The big one. voice, music, mixed, noise, or silence. This is the tag your editor needs first, before anything else. If a file is content:silence, nothing downstream needs to look at it.
format:*, sample-rate:*, channels:*, bit-depth:*
Technical facts. Useful for filtering (“show me only 48kHz/24-bit voiceovers”) and for routing files to the right pipeline (“compressed files go through a re-encode step first”).
brightness:*, dynamics:*
Spectral and dynamic character buckets — dark/balanced/bright/very-bright and flat/compressed/modern/dynamic. These are useful for search and for downstream processing decisions. A voiceover tagged brightness:very-bright probably needs a de-esser.
issue:*
The most actionable family. issue:clipping, issue:noisy, issue:dc-offset, issue:lots-of-silence. These map directly to QA workflows — “fail any file with issue:clipping at ingest” is a one-line rule.
hint:*
Softer signals. hint:has-headroom, hint:loud, hint:mono-content-in-stereo. These are for tools that want context but shouldn’t fail on them.
What the schema deliberately doesn’t include
- Mood tags (
happy,melancholic, etc.). These are unreliable from signal alone and lock you into a controlled vocabulary that ages badly. - Genre tags. Same problem, worse. Genre classification is a user-facing job, not a metadata-layer one.
- Speaker identity. Personally identifying. Should be opt-in and live in a separate namespace.
- Free-form descriptions. Lovely for humans, terrible for downstream automation.
The discipline is: only tag what can be reliably computed and reliably consumed. The rest belongs in a human-curated layer or in the user-facing app, not in the schema.
A worked example
For a 12-minute podcast interview at 48k/16-bit stereo:
content:voice
content-confidence:0.82
format:audio/wav
sample-rate:48000
channels:stereo
bit-depth:16-bit
brightness:balanced
dynamics:modern
duration:720
peak:-2.1dB
rms:-19.4dB
noise-floor:-52.1dB
hint:has-headroom
Twelve tags. Every one can be queried, every one can be filtered on, every one was computed automatically. No “interview”, no “Alex_and_Sara_episode_42”, no “great episode but room sounds off”. Those belong in your CMS.
Why this matters
Audio is a wretched format to grep through. The only reason an audio archive scales past a few thousand files is the metadata layer. A schema like this is the thing that lets a downstream editor say: “Show me every voice-content file under five minutes, no clipping, brightness balanced, recorded since March.” Without that, you’re asking someone to listen.
Related
More in SignalLab docs