Why does dBFS matter for clinical audio and voice analysis?

Clinical voice metrics like AVQI, jitter, shimmer, and HNR are computed from recorded audio. If the recording level (in dBFS) is too low, the signal-to-noise ratio suffers and acoustic biomarkers become unreliable. If it clips (exceeds 0 dBFS), the waveform is distorted and all metrics are invalid. For calibrated measurements — such as reporting absolute sound levels or comparing across clinics — converting dBFS to dB SPL via a calibration offset is essential.

dB SPL vs dBFS: Understanding Audio Levels for Engineers and Clinicians

Q: What is the difference between dB SPL and dBFS?

dB SPL (Sound Pressure Level) measures the physical acoustic pressure of a sound in air, referenced to 20 µPa (the threshold of human hearing). dBFS (decibels relative to Full Scale) is a digital scale that measures the amplitude of an audio signal relative to the maximum level a digital system can represent. dB SPL is absolute — 94 dB SPL is a real, measurable pressure level. dBFS is always relative to the specific recording system and is always negative or zero (0 dBFS = clipping).

Q: How do you convert dBFS to dB SPL?

The conversion formula is: dB SPL = dBFS + calibration_offset. The calibration offset is determined by playing a known reference tone (typically 94 dB SPL or 114 dB SPL from a pistonphone) into your microphone and measuring the resulting dBFS level. The difference between the known SPL and the measured dBFS is your offset. A typical offset ranges from 94 dB to 120 dB depending on the microphone and recording system.

Q: What recording level in dBFS is recommended for voice analysis?

For voice analysis and acoustic biomarker extraction, aim for a peak level between -12 dBFS and -6 dBFS during sustained phonation. This provides enough headroom to avoid clipping while keeping the signal well above the noise floor. For Wav2Vec2 and other deep learning speech models, the audio should be normalized to the same range used during model pre-training — typically with RMS normalization to around -23 dBFS (EBU R128 loudness standard).

What you'll learn: the precise definition of dB SPL and dBFS, why they are not interchangeable, the calibration formula that bridges them, and why getting this right is essential for clinical voice analysis, acoustic biomarkers, and audio ML pipelines.

Two recordings can both show -18 dBFS on a meter and yet represent completely different physical sound levels — one recorded with a studio condenser at arm's length, the other with a clinical-grade microphone 30 cm from a patient's mouth. Without understanding the distinction between dB SPL and dBFS, you cannot compare them, calibrate them, or trust any absolute measurement derived from them.

This article cuts through the confusion. We define both scales precisely, derive the conversion formula, and explain the practical consequences for audio engineers, ML practitioners, and clinical researchers working with voice data.

1. What Is dB SPL?

Sound Pressure Level (SPL) is the physical measure of the pressure fluctuations a sound wave creates in air. It is defined as:

dB SPL = 20 × log10(p / p_ref)

where:
  p      = measured RMS sound pressure (in Pascals)
  p_ref  = 20 µPa  (20 × 10⁻⁶ Pa — the threshold of human hearing)

The reference 20 µPa is not arbitrary — it corresponds to the quietest sound a young, healthy human ear can detect at 1 kHz. This makes 0 dB SPL the absolute threshold of human hearing, not silence.

dB SPL is an absolute physical scale. A measurement of 94 dB SPL describes a real pressure in the world, independent of any microphone or recording system. Here are common reference points:

Sound source	Approximate dB SPL
Threshold of hearing	0 dB SPL
Quiet library	~30 dB SPL
Normal conversation (1 m)	~60–65 dB SPL
Sustained vowel /a/ in speech therapy (30 cm)	~70–80 dB SPL
Pistonphone calibration reference	94 dB SPL (or 114 dB SPL)
Threshold of pain	~130 dB SPL

Key property of dB SPL:

dB SPL is a physical, system-independent measurement. Two calibrated sound level meters placed side by side in the same sound field will read identical dB SPL values, regardless of their electronics or sensitivity.

2. What Is dBFS?

dBFS (decibels relative to Full Scale) is the digital equivalent — a relative scale that describes the amplitude of a digital audio signal with respect to the maximum amplitude a digital system can represent without clipping:

dBFS = 20 × log10(A / A_max)

where:
  A      = measured amplitude (peak or RMS)
  A_max  = maximum representable digital amplitude (full scale)

Because A is always less than or equal to A_max, dBFS values are always negative or zero. The moment a signal hits 0 dBFS, it has reached the ceiling of the digital system — any louder and it clips.

This scale is used everywhere digital audio is processed: DAWs (Pro Tools, Logic, Ableton), audio APIs (PyDub, librosa, soundfile), WAV files, and mobile recording apps all report levels in dBFS. When Python's librosa computes RMS energy from a float32 audio array normalised to [-1, 1], the resulting value in dB is a dBFS value — not a physical sound level.

Key property of dBFS:

dBFS is system-relative. The same physical 70 dB SPL voice can produce -20 dBFS on one interface and -10 dBFS on another, depending on the microphone sensitivity and preamp gain. dBFS tells you nothing about physical sound pressure unless you know the calibration of your system.

3. The Key Difference: Absolute vs. Relative

The confusion between dB SPL and dBFS is understandable — both use decibels, both involve logarithms of amplitude ratios, and both can describe audio. But they answer fundamentally different questions:

Property	dB SPL	dBFS
Domain	Physical (acoustic)	Digital (electronic)
Reference	20 µPa (absolute)	Full scale of the ADC (relative)
Range	0 dB SPL → ~194 dB SPL	-∞ dBFS → 0 dBFS
Maximum value	No hard ceiling	0 dBFS (clipping)
Instrument	Sound level meter, pistonphone	DAW meter, audio API
System-independent?	Yes	No

A practical analogy: dB SPL is like measuring temperature in Kelvin — it references a true physical zero. dBFS is like measuring temperature relative to the highest temperature your thermometer can read. Without knowing where "full scale" sits on the Kelvin axis, the reading is meaningless in absolute terms.

4. Converting Between dB SPL and dBFS

The bridge between the two scales is a single constant called the calibration offset. It represents the dB SPL that corresponds to 0 dBFS in your recording system — in other words, how loud a sound would have to be in the physical world to drive your microphone/ADC chain to full scale:

dB_SPL = dBFS + calibration_offset

Equivalently:
  calibration_offset = dB_SPL_known − dBFS_measured

Example:
  A pistonphone outputs 94 dB SPL.
  You record it and measure -28.4 dBFS (RMS).
  calibration_offset = 94 − (−28.4) = 122.4 dB

  A subsequent recording reads -18.0 dBFS (RMS).
  dB_SPL = -18.0 + 122.4 = 104.4 dB SPL

Typical calibration offsets:

For most professional measurement microphones paired with an audio interface, calibration offsets fall in the range 94 dB – 130 dB. Consumer microphones built into smartphones or laptops have much higher offsets (often 120–140 dB) because their analog sensitivity is low. The exact value must always be measured — never assumed.

Python: Calibrated SPL from a WAV File

import numpy as np
import soundfile as sf

def rms_dbfs(signal: np.ndarray) -> float:
    """Compute RMS level in dBFS (full scale = 1.0)."""
    rms = np.sqrt(np.mean(signal ** 2))
    if rms == 0:
        return -np.inf
    return 20 * np.log10(rms)

def dbfs_to_spl(dbfs: float, calibration_offset: float) -> float:
    """Convert dBFS to dB SPL using a known calibration offset.

    calibration_offset = dB_SPL_reference - dBFS_measured_at_reference
    (typically 94 dB SPL pistonphone measured at the recording chain)
    """
    return dbfs + calibration_offset

# --- Usage ---
audio, sr = sf.read("voice_sample.wav")

# Compute level from the full recording
level_dbfs = rms_dbfs(audio)
print(f"Recording level: {level_dbfs:.1f} dBFS")

# Convert to physical SPL (requires prior calibration)
CALIBRATION_OFFSET = 122.4  # dB, determined from pistonphone session
level_spl = dbfs_to_spl(level_dbfs, CALIBRATION_OFFSET)
print(f"Estimated SPL:   {level_spl:.1f} dB SPL")

Note that soundfile reads WAV files as float32 in the range [-1, 1], so the RMS of a full-scale sine wave is 1/√2 ≈ 0.707, giving -3.01 dBFS — consistent with the dBFS definition.

5. Why It Matters for Clinical Audio

Clinical audio applications — voice pathology assessment, speech therapy, acoustic biomarker research — have stricter requirements than most consumer audio use cases. The reason: clinical decisions depend on the accuracy of computed metrics, and those metrics are highly sensitive to recording level.

Acoustic Biomarkers: AVQI, Jitter, Shimmer, HNR

Metrics like the Acoustic Voice Quality Index (AVQI), jitter (cycle-to-cycle frequency variation), shimmer (cycle-to-cycle amplitude variation), and Harmonics-to-Noise Ratio (HNR) are extracted from the raw waveform of a sustained phonation. Their validity depends on:

No clipping: A signal exceeding 0 dBFS introduces nonlinear distortion. The resulting waveform is no longer a faithful reproduction of the patient's voice — shimmer and jitter values computed from it are meaningless.
Adequate SNR: A recording level below -40 dBFS (peak) typically means the signal is buried in the noise floor. HNR measurements will reflect the microphone noise rather than the patient's voice quality.
Calibrated absolute level: Some normative AVQI ranges are established at specific vocal effort levels (e.g., comfortable loudness at 30 cm). Comparing a patient against normative data requires that your recording represents the same physical SPL, not just the same dBFS.

Example from clinical practice:

Two clinics both record the sustained vowel /a/ at "comfortable loudness." Clinic A uses a Shure MV51 USB microphone with a calibration offset of 120.5 dB. Clinic B uses a Sennheiser MKH 8040 with a preamp calibrated to 112.3 dB offset. A recording that reads -20 dBFS in both clinics represents 100.5 dB SPL in Clinic A and 92.3 dB SPL in Clinic B — an 8 dB SPL difference in vocal effort. Without calibration, pooling these datasets for a machine learning model will introduce systematic bias.

Deep Learning Models: Wav2Vec2 and Input Normalisation

Models like Wav2Vec2 and HuBERT were pre-trained on audio normalised to a specific amplitude range — typically float32 in [-1, 1] with RMS-based loudness normalisation. When you pass a poorly levelled recording (very quiet, or at an unusual gain stage), the model's learned representations may not generalise correctly because the activation statistics at the input convolutional layers differ from the training distribution.

Best practice for inference with these models: normalise your audio to a consistent RMS level (e.g., -23 dBFS per the EBU R128 standard) before passing it to the model, then apply your calibration offset separately if you need absolute SPL. Never conflate the normalisation step with calibration — normalisation is a preprocessing choice; calibration maps your system to the physical world.

6. Practical Rules of Thumb

Scenario	Target level	Rationale
Voice recording (clinical)	Peak: -12 to -6 dBFS	Headroom against clipping; SNR >40 dB
Broadcast / podcast	RMS: -23 dBFS (EBU R128)	Platform loudness normalisation target
ML model input (Wav2Vec2)	Float32, RMS-normalised to -23 dBFS	Matches pre-training distribution
Calibrated SPL measurement	Record reference tone first	94 or 114 dB SPL pistonphone → derive offset
Multi-site clinical study	Calibrate each site independently	Same dBFS ≠ same SPL across hardware

The two-step mental model:

dBFS is what your software sees — a number between -∞ and 0.
Calibration offset is the knowledge of your hardware — derived once per setup.
dB SPL is the physical reality — what a sound level meter would show.

Any clinical comparison that requires absolute levels must go through all three steps.

Try the Hidacs dBFS Converter

Need to convert between dBFS and dB SPL for your recordings? Our free online converter lets you enter a dBFS value and your system's calibration offset to get the physical sound pressure level instantly — no formulas to remember.

Open dBFS Converter Tool

Frequently Asked Questions

What is the difference between dB SPL and dBFS?

dB SPL measures physical acoustic pressure relative to the threshold of human hearing (20 µPa) — it is an absolute, system-independent quantity. dBFS measures digital amplitude relative to the maximum level a recording system can handle — it is always zero or negative and is specific to each hardware/software setup. The same physical sound will produce different dBFS readings on different microphones.

How do you convert dBFS to dB SPL?

Use the formula: dB SPL = dBFS + calibration_offset. The calibration offset is determined by recording a known reference level (typically a 94 dB SPL or 114 dB SPL pistonphone tone) and subtracting the measured dBFS from the known SPL. Typical offsets range from 94 dB to 130 dB depending on the microphone and preamp.

Why does dBFS matter for clinical voice analysis?

Clinical voice metrics (AVQI, jitter, shimmer, HNR) require recordings that are neither clipped nor too quiet. Clipping (above 0 dBFS) distorts the waveform and invalidates all waveform-based measurements. An insufficient level buries the voice in the noise floor and degrades SNR-sensitive metrics like HNR. For multi-site studies or normative comparisons, calibration to dB SPL is necessary to ensure recordings represent the same physical vocal effort across different hardware setups.

What recording level in dBFS is recommended for voice analysis?

For voice analysis and acoustic biomarker extraction, target a peak level between -12 dBFS and -6 dBFS during sustained phonation. This provides adequate headroom to avoid clipping while keeping the signal well above the microphone noise floor. For deep learning speech models such as Wav2Vec2 or HuBERT, apply RMS normalisation to approximately -23 dBFS (EBU R128) before inference, separately from any SPL calibration step.

Need Help with Audio Calibration for Clinical Research?

We build calibrated audio acquisition and analysis pipelines for speech therapy and clinical research, including dB SPL/dBFS calibration workflows, acoustic biomarker extraction (AVQI, jitter, shimmer, HNR), and multi-site recording standardisation.

Try the dBFS Converter Contact Us