Skip to content

/glossary

Terms, plainly defined.

A working glossary of audio, AI, DSP, accessibility, and live-audio terms used across the labs. Updated whenever a doc surfaces a new term that needs one.

01 76 terms
AAudio
Engineering
Android’s native low-latency audio API, introduced in Android 8.0 and matured significantly in 12+. The recommended path for realtime audio on modern Android.
ABX test
SkillLab
A double-blind listening test: the subject is presented with reference A, reference B, and unknown X — and must identify whether X = A or X = B. The gold standard for proving an audible difference exists.
AGC
VoiceLab
Automatic Gain Control — adaptive amplification that keeps signal levels consistent without operator intervention. Standard in conferencing apps; used carefully in podcast post-production.
Aliasing
Engineering
The spurious low-frequency artefacts produced when a signal contains frequencies above half the sample rate (the Nyquist frequency). Prevented by anti-aliasing filters before sampling and by oversampling in nonlinear DSP.
ASHA
HearLab
Audio Streaming for Hearing Aids — the Bluetooth profile used by Android to stream low-latency audio directly to compatible hearing aids.
ASR
VoiceLab
Automatic Speech Recognition — the speech-to-text layer. Whisper, Deepgram, AWS Transcribe, Google STT. Mostly a commodity in 2026 at high-90s % accuracy on clean material.
Audiogram
HearLab
A graph of hearing threshold (dB HL) versus frequency (Hz), per ear. The standard diagnostic output of an audiology test. Drives the prescription of any modern hearing aid.
AudioWorklet
MixLab
The browser API for running audio processing code on a dedicated thread, separate from the main thread. The foundation for any serious WebAudio analysis or synthesis.
Auracast
HearLab
Bluetooth LE Audio broadcast mode. A single source streams audio that any nearby compatible device — including hearing aids — can tune to without pairing. The basis for next-generation assisted-listening systems.
Bit depth
General
The number of bits per sample. 16-bit covers most consumer use; 24-bit is the production standard; 32-bit float is the working format inside DAWs and analysis tools.
Brickwall limiter
MixLab
A look-ahead peak limiter with an absolute ceiling — set the threshold and no sample passes above it. The last stage of nearly every loudness-normalised master.
BS.1770-4
MixLab
The ITU recommendation defining how to measure loudness in audio. The basis for LUFS metering, EBU R128, and streaming-platform normalisation targets.
Buffer size
Engineering
The number of audio samples processed per callback. Smaller = lower latency but more CPU per second; larger = higher latency but more efficient. 128 or 256 samples is the modern realtime sweet spot.
Codec
General
A coder-decoder pair that compresses and decompresses audio. Lossy: MP3, AAC, Opus, LC3. Lossless: FLAC, ALAC, WAV/PCM (uncompressed). The choice of codec sets the floor for what any downstream tool can achieve.
Crest factor
MixLab
The ratio between peak and RMS levels in a signal, expressed in dB. Higher numbers indicate more dynamic range. See the crest factor doc.
Critical listening
SkillLab
Trained, focused listening for specific audio attributes — frequency balance, dynamics, distortion, stereo width. The skill SkillLab drills systematically.
Cue list
CueLab
An ordered sequence of show-runtime triggers — audio, lighting, video, scenes. The data structure underneath every live-event production tool. See CueLab’s cuelist schema doc.
De-esser
VoiceLab
A frequency-selective compressor that targets the 5–9 kHz sibilance band. Reduces harsh "s" and "t" sounds in vocal recordings without dulling the overall top end.
Diarisation
SignalLab
Speaker turn segmentation — identifying who spoke when in an audio file. A core capability for the SignalLab indexer.
Dither
MixLab
Low-level noise added before bit-depth reduction to randomise quantisation error. Required when converting from 24/32-bit to 16-bit for distribution.
Ducking
VoiceLab
Reducing the level of one signal whenever a higher-priority signal is present. The classic example: music ducking under voice in a podcast or radio show.
Dynamic EQ
MixLab
An EQ band whose gain reacts to level. Conceptually a single-band compressor with a parametric frequency control. Less destructive than multiband for surgical tonal balancing.
Embedding
SignalLab
A learned dense vector representation of audio (or a chunk of audio). The core of modern audio retrieval, similarity search, and zero-shot classification. AudioLab signal indexing uses embeddings under the hood.
FFT
MixLab
Fast Fourier Transform — the algorithm at the heart of every frequency-domain operation in audio software, from spectrum analysers to convolution reverbs.
Formant
VoiceLab
A resonant frequency band in the human vocal tract. Formant patterns distinguish vowels and reveal speaker identity. Preserved or shifted intentionally by pitch-correction tools.
Hann window
MixLab
A common windowing function applied before FFT to reduce spectral leakage. The default windowing choice for most analysis-oriented audio code.
Headroom
MixLab
The dB margin between the loudest sample in a mix and 0 dBFS (or true peak). Streaming masters typically target –1 dBTP / –14 LUFS; broadcast targets are more conservative. See the methodology page for AudioLab’s defaults.
HRTF
HearLab
Head-Related Transfer Function — the frequency-domain filter describing how a sound from a specific direction is altered by the listener’s head, torso, and outer ear. The basis for binaural spatial audio.
Inference
SignalLab
Running a trained model to produce outputs. Distinct from training. The phase that needs to be fast, predictable, and small enough to fit on-device. The user-visible part of any audio AI product.
Integrated LUFS
MixLab
A perceptually-weighted, time-gated measure of loudness across an entire piece of audio. The number streaming platforms use to normalise playback.
JND
SkillLab
Just Noticeable Difference. The smallest change in a stimulus that a listener can reliably detect. Drives the calibration of every audio-perception threshold in SkillLab’s drills.
JUCE
Engineering
Cross-platform C++ audio framework. The path of least resistance for shipping the same DSP code on Android, iOS, macOS, Windows, and as VST/AU/AAX plugins.
K-weighting
MixLab
The pre-filter chain (high-shelf at ~1.5 kHz + high-pass at ~38 Hz) defined in BS.1770-4. Applied before RMS energy summation so the resulting LUFS value tracks perceived loudness rather than raw signal energy.
LC3
HearLab
Low Complexity Communication Codec. The default codec for Bluetooth LE Audio. Materially better than SBC and competitive with AAC at lower bitrates.
LE Audio
HearLab
The Bluetooth audio specification introduced in 5.2, including LC3 codec and multi-stream profiles. Material improvement for hearing devices and low-power audio.
Lissajous figure
MixLab
A plot of one signal against another, producing a curve whose shape reveals their phase relationship. The classical visualisation behind every "phase scope" in pro audio.
LRA (Loudness Range)
MixLab
EBU 3342 metric expressing the difference between the loudest and quietest sections of a piece of audio. Distinct from peak-to-RMS dynamics; closer to perceived "how much the music breathes".
Mel spectrogram
SignalLab
A spectrogram on the mel frequency scale — perceptually-weighted bins that more closely match human pitch perception than linear frequency. The standard input representation for audio neural networks.
MFCC
SignalLab
Mel-Frequency Cepstral Coefficients. The classical compact representation of timbral content — 13–40 coefficients summarising the spectral envelope. Still useful as features even in the deep-learning era.
Mid/Side
MixLab
A stereo representation where Mid = (L+R)/2 (centre content) and Side = (L−R)/2 (stereo difference). The basis for measuring true stereo width.
Mix-minus
CueLab
A feed sent to a remote guest containing the full programme audio except their own voice — preventing the “talking-into-a-tunnel” feedback experience. Standard in broadcast and remote-podcast routing.
Momentary LUFS
MixLab
A 400 ms sliding-window loudness measurement. The fast-twitch metric — used to catch transient loud passages. Calibrated against the same K-weighting filter as integrated LUFS but without gating.
Multiband compression
MixLab
Splitting the signal into 2–6 frequency bands and compressing each independently. The standard tool for taming a single problematic band (boomy lows, harsh upper-mids) without dulling the whole mix.
MUSHRA
SignalLab
MUlti Stimulus test with Hidden Reference and Anchor. The standard listening-test methodology for evaluating perceived audio quality at intermediate-to-high quality levels.
N-1 feed
CueLab
Another name for mix-minus. Literally: a mix of all N sources minus the one going to that destination. Built natively into proper broadcast consoles; emulated via routing matrices in software.
Neural vocoder
VoiceLab
A model that converts an intermediate audio representation (typically a mel spectrogram) into a waveform. WaveNet, HiFi-GAN, BigVGAN. The “last mile” of any modern TTS pipeline.
Nyquist frequency
Engineering
Half the sample rate — the theoretical upper bound of representable frequencies in a digital signal. 22.05 kHz at 44.1 kHz; 24 kHz at 48 kHz. The reason mastering chains often work at 96 kHz or higher.
Oboe
Engineering
Google’s C++ wrapper around AAudio (with OpenSL ES fallback for legacy devices). The recommended starting point for Android audio code in 2026.
OfflineAudioContext
MixLab
WebAudio API variant for offline, faster-than-realtime audio processing. The right choice for batch analysis and rendering. Powers the AudioLab sample-synthesis pipeline.
Onset detection
SignalLab
Identifying the start times of musical events — drum hits, note attacks, percussive transients. Foundational for beat tracking, tempo estimation, and segment-level audio analysis.
Pearson correlation
MixLab
The statistical measure of linear correlation between two signals, returning a value from −1 to +1. Used in MixLab to compute stereo correlation between L and R channels.
Phantom power
VoiceLab
+48 V DC supplied to condenser microphones via the same XLR cable that carries audio. Required to operate the mic capsule; supplied by audio interfaces and mixers.
Phase / polarity
Engineering
Polarity is binary (signal inverted or not). Phase is continuous and frequency-dependent. Conflating the two is the most common audio-engineering vocabulary error — and the cause of many comb-filter mysteries.
Phoneme
VoiceLab
The smallest unit of speech sound that distinguishes one word from another in a given language. English has roughly 44; phoneme-level alignment underpins every modern TTS and ASR system.
Plosive
VoiceLab
Low-frequency burst from "p" and "b" consonants hitting the microphone. Mitigated by pop filters in capture and high-pass filtering or transient editing in post.
Pre-show checklist
CueLab
The reusable set of pre-broadcast / pre-event verifications that catches preventable failures. Hardware-routing-levels-stream-backup, in that order. See the pre-show checklist doc.
Prosody
VoiceLab
The rhythm, stress, and intonation of speech. The hardest thing for synthetic voices to get right — and the most common giveaway when they don’t. Drives intelligibility and perceived emotion.
Round-trip latency
Engineering
The total time from sound capture to sound output through a processing chain. The headline number for any realtime audio app — under 15 ms is the threshold for monitoring-while-tracking.
RT60
VoiceLab
The reverberation time required for an impulse to decay 60 dB. The standard metric for room acoustics. Practical voice-QA tools estimate it implicitly via envelope-decay analysis.
Sample rate
General
The number of samples per second. 44.1 kHz for CD; 48 kHz for broadcast and most modern production; 96 kHz / 192 kHz for high-resolution mastering.
Short-term LUFS
MixLab
A 3-second sliding-window loudness measurement. The middle layer between momentary (400 ms) and integrated (whole-program). Drives the “LRA” range calculation in EBU 3342.
Sibilance
VoiceLab
High-frequency consonant energy from "s", "sh", "t", and "z" sounds. Concentrated around 5–9 kHz. A common source of listener fatigue when over-emphasised.
Sidechain
MixLab
An external control signal that triggers a dynamics processor (typically a compressor). The technique behind pumping kick-drum mixes, voice-ducked music beds, and modern de-essing.
SNR
General
Signal-to-Noise Ratio. The dB margin between desired signal and background noise. The single number that predicts intelligibility, recording quality, and ML model performance better than any other.
Source separation
SignalLab
Splitting a mixed audio signal back into individual stems (vocals, drums, bass, other). Demucs, Spleeter, Open-Unmix. State of the art in 2026 produces clean isolation on most pop music.
Speaker verification
VoiceLab
Confirming a claimed speaker identity from a voice sample. The defensive counterpart to voice cloning. Modern systems use d-vector or x-vector embeddings compared via cosine similarity.
Spectral centroid
MixLab
The "centre of mass" of the spectrum — a weighted average frequency that corresponds roughly to perceived brightness.
Speech-in-noise
HearLab
A class of metrics and tests (SIN, QuickSIN, HINT) measuring speech intelligibility against varying noise floors. The real-world predictor of hearing-aid benefit, more so than pure-tone audiometry.
Streaks
SkillLab
Practice-consistency markers used in SkillLab. Designed to survive a missed day to avoid slot-machine progression dynamics. See progression design.
Tag schema
SignalLab
The controlled vocabulary used to describe audio content for downstream retrieval. SignalLab’s schema covers content type, brightness, dynamics, and issue flags. See the tag schema doc.
True peak
MixLab
The actual analog peak of a reconstructed signal, which can exceed the sample-domain peak due to inter-sample peaks. Measured via oversampling.
TTS (Text-to-Speech)
VoiceLab
Generating speech audio from text. State of the art in 2026 is neural — typically a text encoder + acoustic model + vocoder pipeline, often distilled into a single model for realtime inference.
VAD (Voice Activity Detection)
VoiceLab
A classifier that returns 1 for speech and 0 for non-speech, frame by frame. The first stage of nearly every voice pipeline — from conferencing apps to podcast editors to the VoiceLab QA demo.
Voice cloning
VoiceLab
Synthesising speech in the voice of a target speaker from a reference recording — modern models work from 3–30 seconds of reference audio. The basis for both legitimate dubbing/accessibility tools and modern impersonation attacks.
WDRC
HearLab
Wide Dynamic Range Compression — the multiband, level-dependent gain processing inside every modern hearing aid. Maps the speaker’s dynamic range into the user’s residual hearing range.
WebAudio API
MixLab
The W3C standard for audio processing in browsers. Includes AudioContext, AudioNode, AudioBuffer, AnalyserNode, AudioWorklet, and OfflineAudioContext. The foundation under every browser-side audio tool.