Cepstrum Analysis Simulator Back
Signal Processing Simulator

Cepstrum Analysis Simulator — Speech & Vibration Analysis

Compute the real cepstrum by inverse Fourier transform of the log magnitude spectrum. Change fundamental frequency, sample rate, and harmonic decay to learn pitch detection on the quefrency axis.

Parameters
Fundamental frequency f_0
Hz
Sample rate F_s
Hz
Harmonic decay α
Noise σ

N = 1024 with a direct DFT. The signal model has 5 harmonics plus additive white noise.

Results
Cepstrum peak quefrency
Corresponding time τ
Estimated fundamental f_0
Estimation error |f_0 − true|/true
Signal · Spectrum · Cepstrum

Top: time series x[n] (blue) / Middle: log magnitude log|X[k]| (green) / Bottom: cepstrum c[n] (red, yellow line = peak)

Theory & Key Formulas

The real cepstrum is the inverse Fourier transform of the log magnitude spectrum of a signal. It separates a convolved excitation and transfer system along the time (quefrency) axis.

DFT. x[n] is the input of length N and X[k] is its frequency component:

$$X[k] = \sum_{n=0}^{N-1} x[n]\,e^{-j 2\pi k n / N}$$

Real cepstrum. Inverse DFT of the log magnitude spectrum (real part only):

$$c[n] = \frac{1}{N}\sum_{k=0}^{N-1} \log|X[k]|\,\cos\!\left(\frac{2\pi k n}{N}\right)$$

Pitch quefrency and estimated fundamental frequency:

$$\tau_\text{pitch} = \frac{F_s}{f_0}, \qquad \hat{f}_0 = \frac{F_s}{\tau_\text{peak}}$$

For speech, the cepstrum peak is searched in the 2–20 ms range (500–50 Hz), and pitch is estimated from that quefrency.

What is the Cepstrum Analysis Simulator

🙋
"Cepstrum" looks like a typo of "spectrum". Is that really a thing, and what is it for?
🎓
Good catch. "Spectrum" reversed gives "cepstrum", and "frequency" reversed gives "quefrency". Roughly speaking, you apply a Fourier transform a second time to a spectrum so that you can read off the "period" of the signal on a time axis. With f_0 = 200 Hz and F_s = 8000 Hz in the simulator above, the bottom panel has a sharp peak around quefrency = 40 samples. That is 40/8000 = 5 ms, exactly the period 1/200 s.
🙋
A regular FFT seems enough to find a fundamental frequency though — why bother with two stages?
🎓
Sharp question. On the spectrum the fundamental and its harmonics appear side by side, and for human voices the harmonics can be larger than the fundamental, which fools naive peak-pickers. In the cepstrum, all of those evenly spaced harmonic peaks are merged into a single point at "quefrency = period". Try moving α from 0 to 0.8 — the shape of the harmonics in the spectrum changes, yet the cepstrum peak stays put at the same place.
🙋
Indeed, the peak does not move. So why the logarithm in the middle?
🎓
That is the real magic. Speech is the convolution of vocal-fold excitation and the vocal-tract filter. On the spectrum it becomes a product; the logarithm turns it into a sum. Apply an inverse Fourier transform and the slow vocal-tract part (low quefrency) is separated from the sharp pitch part (high quefrency) along the time axis. Slicing them apart by "liftering" gives the features used in speech recognition.
🙋
When I push σ up to add noise, the cepstrum peak gets a bit smaller.
🎓
Right — noise spreads thinly across the whole spectrum, so the logarithm spreads it thinly across all quefrencies. The peak at 40 samples still survives, doesn't it. The same trick works for machine vibration diagnosis: the evenly spaced sidebands caused by a damaged gear are compressed into a single cepstrum peak, so the damage period can be picked out of the noise. Speech and rotating-machine monitoring look different on the surface, but they are the same mathematics underneath.

Frequently Asked Questions

That range matches human voice pitch. 20 ms corresponds to 50 Hz (around the lower bound of adult male voice), and 2 ms to 500 Hz (around the upper bound of female and children's voices). At very low quefrency there are large components from the vocal-tract filter, and a naive maximum would mistake them for pitch. Limiting the search to a physiologically plausible window avoids that. For machine vibration diagnosis the range is chosen from the rotational speed of the equipment.
The real cepstrum uses only log|X[k]| and discards the phase information; that is also what this simulator computes, and it is enough for pitch detection and harmonic-structure analysis. The complex cepstrum uses log X[k] (the complex logarithm), preserving phase, and it can be used to fully reconstruct the signal (homomorphic decomposition). However, it requires phase unwrapping, which is harder to implement and limits the practical applications.
In this tool we compute log(|X[k]| + ε) with a small ε = 1e-10. In theory, noise prevents |X[k]| from being exactly zero, but for synthetic signals with spectral nulls the logarithm can diverge. In practice it is common to add a tiny amount of white noise to the signal, or to clip the magnitude to a floor value (for example 1e-6 of the maximum) before taking the log.
For teaching purposes and one-off computations at N = 1024 the direct DFT is plenty (O(N²) ≈ 1 M operations, instantaneous). For real-time processing, long signals, or sliding-window frame-by-frame analysis, the FFT (O(N log N)) is essential. The WebAudio API and DSP libraries ship an FFT, and swapping it into the computation here would not change the results — only the speed.

Real-World Applications

Speech analysis and recognition: The cepstrum is central to speech processing. Pitch estimation, voicing decision, and formant extraction are all carried out by separating the low- and high-quefrency parts of the cepstrum with "liftering". MFCC (mel-frequency cepstral coefficients) is its extension and the de facto standard feature for speech recognition, speaker recognition, and speech synthesis.

Machine vibration diagnosis and predictive maintenance: Damage in gear teeth or rolling-element bearings produces many evenly spaced sidebands in the spectrum. They are hard to read directly, but the cepstrum compresses them into a single peak, making damage presence and period obvious at a glance. The technique is widely used for condition monitoring of wind turbines, gas turbines, and large motors, and it underpins predictive maintenance in the IoT era.

Seismology and reflection analysis: In seismic exploration and seismology, multiple reflections from subsurface interfaces produce the same waveform repeated at fixed delay times. The cepstrum places a peak at the corresponding quefrency, helping infer subsurface structure. The same idea is used in acoustic echo analysis to measure room reverberation times and distances to reflecting walls.

Biomedical signal analysis: Periodic physiological signals such as heart sounds, breath sounds, and electromyograms can be analyzed with the cepstrum. The S1–S2 heart-sound period, hidden periodicities in heart-rate variability, and pattern classification of swallowing sounds all benefit from cepstral features that are invisible in either the time or frequency domain alone. The approach is being adopted in clinical decision support and wearable biosensor pipelines.

Common Misconceptions and Cautions

The most common misconception is to assume that the cepstrum's horizontal axis is frequency. It is the quefrency (samples or seconds), with the dimension of time, so a larger value means a longer-period component — that is, a lower frequency. In the simulator, increasing f_0 from 200 Hz to 400 Hz moves the cepstrum peak to the left, not to the right, because the period T_0 = F_s/f_0 becomes shorter. The intuition "right on the cepstrum = higher frequency" simply does not work.

A second pitfall is to overlook integer multiples of the peak location. Smaller peaks appear at n = 2T_0, 3T_0, … in addition to the main peak at n = T_0. These are not "harmonics of the period" but a natural by-product of Fourier-transforming a periodic structure in the log spectrum (harmonics at multiples of the fundamental frequency). In the simulator, lowering α (strengthening the harmonics) makes the secondary peaks at twice and three times the location more visible. A robust algorithm has to pick the fundamental peak rather than just the largest peak — that is the "octave error" problem.

Finally, the cepstrum is not a universal period detector. For non-stationary signals whose pitch changes with time, a long analysis window mixes several pitches and blurs the peak. In practice short windows (about 20–40 ms) are used and a cepstrum is computed per frame to track the pitch contour. Signals without periodicity (plosives, impulsive responses) have no peak at all. Treat the cepstrum as a tool that shines on strongly periodic signals, and choose your window accordingly.