MelCap: A Unified Single-Codebook Neural Codec for High-Fidelity Audio Compression

Here is our demo page, showcasing MelCap — a high-fidelity single-codebook neural audio codec.

Research Overview

Neural audio codecs have recently emerged as powerful tools for high-quality and low-bitrate audio compression, leveraging deep generative models to learn latent representations of audio signals. However, existing approaches either rely on a single quantizer that only processes speech tasks, or on multiple quantizers that are not well suited for downstream tasks. To address this issue, we propose MelCap, a high-fidelity neural codec with a single codebook. By decomposing audio reconstruction into two stages, our method preserves more acoustic details than previous single-codebook approaches, while achieving performance comparable to mainstream multi-codebook methods. In the first stage, audio is transformed into mel-spectrograms, which are compressed in the image domain and quantized into compact single tokens using a 2D tokenizer. A perceptual loss is further applied to mitigate the over-smoothing artifacts observed in spectrogram reconstruction. In the second stage, a Vocoder recovers waveforms from the mel discrete tokens in a single forward pass, enabling real-time decoding. Both objective and subjective evaluations demonstrate that MelCap achieves quality on comparable to state-of-the-art multi-codebook codecs, while retaining the computational simplicity of a single-codebook design, thereby providing an effective representation for downstream tasks.

Below, you can listen to audio samples from our codec in comparison with existing approaches.

Our Method:
MelCap

Number of Quantizer: 1

Baseline 1:
WavTokenizer

Number of Quantizer: 1

Baseline 2:
DAC(s)

Number of Quantizer: 4

Baseline 3:
SNAC

Number of Quantizer: 4

Baseline 4:
DAC

Number of Quantizer: 9

Baseline 5:
Nvidia Codec

Number of Quantizer: 9

Baseline 6:
Spectral Codec

Number of Quantizer: 9

General Sound

1 Whistle Sound

Ground Truth

Ground Truth spectrogram visualization

MelCap

MelCap spectrogram visualization

WavTokenizer

WavTokenizer spectrogram visualization

DAC(s)

DAC(s) spectrogram visualization

SNAC

SNAC spectrogram visualization

DAC

DAC spectrogram visualization

Nvidia Codec

Nvidia Codec spectrogram visualization

Spectral Codec

Spectral Codec spectrogram visualization

2 Tapping Sound + Bell Sound

Ground Truth

Ground Truth spectrogram visualization

MelCap

MelCap spectrogram visualization

WavTokenizer

WavTokenizer spectrogram visualization

DAC(s)

DAC(s) spectrogram visualization

SNAC

SNAC spectrogram visualization

DAC

DAC spectrogram visualization

Nvidia Codec

Nvidia Codec spectrogram visualization

Spectral Codec

Spectral Codec spectrogram visualization

3 Flute Sound

Ground Truth

Ground Truth spectrogram visualization

MelCap

MelCap spectrogram visualization

WavTokenizer

WavTokenizer spectrogram visualization

DAC(s)

DAC(s) spectrogram visualization

SNAC

SNAC spectrogram visualization

DAC

DAC spectrogram visualization

Nvidia Codec

Nvidia Codec spectrogram visualization

Spectral Codec

Spectral Codec spectrogram visualization

Speech

1 Spanish

Ground Truth

Ground Truth spectrogram visualization

MelCap

MelCap spectrogram visualization

WavTokenizer

WavTokenizer spectrogram visualization

DAC(s)

DAC(s) spectrogram visualization

SNAC

SNAC spectrogram visualization

DAC

DAC spectrogram visualization

Nvidia Codec

Nvidia Codec spectrogram visualization

Spectral Codec

Spectral Codec spectrogram visualization

2 Chinese

Ground Truth

Ground Truth spectrogram visualization

MelCap

MelCap spectrogram visualization

WavTokenizer

WavTokenizer spectrogram visualization

DAC(s)

DAC(s) spectrogram visualization

SNAC

SNAC spectrogram visualization

DAC

DAC spectrogram visualization

Nvidia Codec

Nvidia Codec spectrogram visualization

Spectral Codec

Spectral Codec spectrogram visualization

3 Other Language

Ground Truth

Ground Truth spectrogram visualization

MelCap

MelCap spectrogram visualization

WavTokenizer

WavTokenizer spectrogram visualization

DAC(s)

DAC(s) spectrogram visualization

SNAC

SNAC spectrogram visualization

DAC

DAC spectrogram visualization

Nvidia Codec

Nvidia Codec spectrogram visualization

Spectral Codec

Spectral Codec spectrogram visualization

Music

1

Ground Truth

Ground Truth spectrogram visualization

MelCap

MelCap spectrogram visualization

WavTokenizer

WavTokenizer spectrogram visualization

DAC(s)

DAC(s) spectrogram visualization

SNAC

SNAC spectrogram visualization

DAC

DAC spectrogram visualization

Nvidia Codec

Nvidia Codec spectrogram visualization

Spectral Codec

Spectral Codec spectrogram visualization

2

Ground Truth

Ground Truth spectrogram visualization

MelCap

MelCap spectrogram visualization

WavTokenizer

WavTokenizer spectrogram visualization

DAC(s)

DAC(s) spectrogram visualization

SNAC

SNAC spectrogram visualization

DAC

DAC spectrogram visualization

Nvidia Codec

Nvidia Codec spectrogram visualization

Spectral Codec

Spectral Codec spectrogram visualization

3

Ground Truth

Ground Truth spectrogram visualization

MelCap

MelCap spectrogram visualization

WavTokenizer

WavTokenizer spectrogram visualization

DAC(s)

DAC(s) spectrogram visualization

SNAC

SNAC spectrogram visualization

DAC

DAC spectrogram visualization

Nvidia Codec

Nvidia Codec spectrogram visualization

Spectral Codec

Spectral Codec spectrogram visualization