Why take the logarithm?

When a signal is the convolution of an excitation source and a transfer system (a filter), they become a product in the magnitude spectrum. Taking the logarithm turns that product into a sum. Applying the inverse Fourier transform then separates the slowly varying transfer system component (low quefrency) from the sharp periodic component (high quefrency) along the time axis. That separation is exactly why the cepstrum is the starting point of speech analysis.

How does it differ from MFCC?

MFCC (Mel-frequency cepstral coefficients) is an extension of the cepstrum. Before taking the logarithm, the spectrum is warped through a mel filter bank to match human auditory perception. A DCT is then applied and only the low-order coefficients are kept, which describe the vocal-tract formant envelope. Speech recognition discards the pitch and uses these features, which is equivalent to keeping only the low-quefrency part of the cepstrum (low-time liftering).

How is it used in machine vibration diagnosis?

Damage in gears or rolling-element bearings shows up as many sidebands at uniform spacing in the spectrum. The cepstrum compresses those evenly spaced sidebands into a single sharp peak, so the presence and period of damage can be read at a glance. Mathematically this is the same principle as pitch detection in speech: a periodic harmonic structure is compressed into a single point on the quefrency axis. The technique is widely used in predictive maintenance of rotating machinery.

Cepstrum Analysis Simulator — Free Online Calculator

Q: What is the cepstrum?

The cepstrum is obtained by taking the logarithm of the magnitude spectrum of a signal and Fourier-transforming the result again (inverse DFT for the real cepstrum): c[n] = IDFT(log|DFT(x[n])|). Its horizontal axis has the dimension of time and is called the quefrency (in samples or seconds). A sharp peak appears at the quefrency that corresponds to the fundamental period, making it a powerful tool for detecting periodicity.

Parameters

Fundamental frequency f_0

Sample rate F_s

Harmonic decay α

—

Noise σ

—

N = 1024 with a direct DFT. The signal model has 5 harmonics plus additive white noise.

Results

—

Cepstrum peak quefrency

—

Corresponding time τ

—

Estimated fundamental f_0

—

Estimation error |f_0 − true|/true

Signal · Spectrum · Cepstrum

Top: time series x[n] (blue) / Middle: log magnitude log|X[k]| (green) / Bottom: cepstrum c[n] (red, yellow line = peak)

Theory & Key Formulas

The real cepstrum is the inverse Fourier transform of the log magnitude spectrum of a signal. It separates a convolved excitation and transfer system along the time (quefrency) axis.

DFT. x[n] is the input of length N and X[k] is its frequency component:

$$X[k] = \sum_{n=0}^{N-1} x[n]\,e^{-j 2\pi k n / N}$$

Real cepstrum. Inverse DFT of the log magnitude spectrum (real part only):

$$c[n] = \frac{1}{N}\sum_{k=0}^{N-1} \log|X[k]|\,\cos\!\left(\frac{2\pi k n}{N}\right)$$

Pitch quefrency and estimated fundamental frequency:

$$\tau_\text{pitch} = \frac{F_s}{f_0}, \qquad \hat{f}_0 = \frac{F_s}{\tau_\text{peak}}$$

For speech, the cepstrum peak is searched in the 2–20 ms range (500–50 Hz), and pitch is estimated from that quefrency.

What is the Cepstrum Analysis Simulator

🙋

"Cepstrum" looks like a typo of "spectrum". Is that really a thing, and what is it for?

🎓

Good catch. "Spectrum" reversed gives "cepstrum", and "frequency" reversed gives "quefrency". Roughly speaking, you apply a Fourier transform a second time to a spectrum so that you can read off the "period" of the signal on a time axis. With f_0 = 200 Hz and F_s = 8000 Hz in the simulator above, the bottom panel has a sharp peak around quefrency = 40 samples. That is 40/8000 = 5 ms, exactly the period 1/200 s.

🙋

A regular FFT seems enough to find a fundamental frequency though — why bother with two stages?

🎓

Sharp question. On the spectrum the fundamental and its harmonics appear side by side, and for human voices the harmonics can be larger than the fundamental, which fools naive peak-pickers. In the cepstrum, all of those evenly spaced harmonic peaks are merged into a single point at "quefrency = period". Try moving α from 0 to 0.8 — the shape of the harmonics in the spectrum changes, yet the cepstrum peak stays put at the same place.

🙋

Indeed, the peak does not move. So why the logarithm in the middle?

🎓

That is the real magic. Speech is the convolution of vocal-fold excitation and the vocal-tract filter. On the spectrum it becomes a product; the logarithm turns it into a sum. Apply an inverse Fourier transform and the slow vocal-tract part (low quefrency) is separated from the sharp pitch part (high quefrency) along the time axis. Slicing them apart by "liftering" gives the features used in speech recognition.

🙋

When I push σ up to add noise, the cepstrum peak gets a bit smaller.

🎓

Right — noise spreads thinly across the whole spectrum, so the logarithm spreads it thinly across all quefrencies. The peak at 40 samples still survives, doesn't it. The same trick works for machine vibration diagnosis: the evenly spaced sidebands caused by a damaged gear are compressed into a single cepstrum peak, so the damage period can be picked out of the noise. Speech and rotating-machine monitoring look different on the surface, but they are the same mathematics underneath.

Frequently Asked Questions

That range matches human voice pitch. 20 ms corresponds to 50 Hz (around the lower bound of adult male voice), and 2 ms to 500 Hz (around the upper bound of female and children's voices). At very low quefrency there are large components from the vocal-tract filter, and a naive maximum would mistake them for pitch. Limiting the search to a physiologically plausible window avoids that. For machine vibration diagnosis the range is chosen from the rotational speed of the equipment.

The real cepstrum uses only log|X[k]| and discards the phase information; that is also what this simulator computes, and it is enough for pitch detection and harmonic-structure analysis. The complex cepstrum uses log X[k] (the complex logarithm), preserving phase, and it can be used to fully reconstruct the signal (homomorphic decomposition). However, it requires phase unwrapping, which is harder to implement and limits the practical applications.

In this tool we compute log(|X[k]| + ε) with a small ε = 1e-10. In theory, noise prevents |X[k]| from being exactly zero, but for synthetic signals with spectral nulls the logarithm can diverge. In practice it is common to add a tiny amount of white noise to the signal, or to clip the magnitude to a floor value (for example 1e-6 of the maximum) before taking the log.

For teaching purposes and one-off computations at N = 1024 the direct DFT is plenty (O(N²) ≈ 1 M operations, instantaneous). For real-time processing, long signals, or sliding-window frame-by-frame analysis, the FFT (O(N log N)) is essential. The WebAudio API and DSP libraries ship an FFT, and swapping it into the computation here would not change the results — only the speed.

Real-World Applications

Speech analysis and recognition: The cepstrum is central to speech processing. Pitch estimation, voicing decision, and formant extraction are all carried out by separating the low- and high-quefrency parts of the cepstrum with "liftering". MFCC (mel-frequency cepstral coefficients) is its extension and the de facto standard feature for speech recognition, speaker recognition, and speech synthesis.

Machine vibration diagnosis and predictive maintenance: Damage in gear teeth or rolling-element bearings produces many evenly spaced sidebands in the spectrum. They are hard to read directly, but the cepstrum compresses them into a single peak, making damage presence and period obvious at a glance. The technique is widely used for condition monitoring of wind turbines, gas turbines, and large motors, and it underpins predictive maintenance in the IoT era.

Seismology and reflection analysis: In seismic exploration and seismology, multiple reflections from subsurface interfaces produce the same waveform repeated at fixed delay times. The cepstrum places a peak at the corresponding quefrency, helping infer subsurface structure. The same idea is used in acoustic echo analysis to measure room reverberation times and distances to reflecting walls.

Biomedical signal analysis: Periodic physiological signals such as heart sounds, breath sounds, and electromyograms can be analyzed with the cepstrum. The S1–S2 heart-sound period, hidden periodicities in heart-rate variability, and pattern classification of swallowing sounds all benefit from cepstral features that are invisible in either the time or frequency domain alone. The approach is being adopted in clinical decision support and wearable biosensor pipelines.

Common Misconceptions and Cautions

The most common misconception is to assume that the cepstrum's horizontal axis is frequency. It is the quefrency (samples or seconds), with the dimension of time, so a larger value means a longer-period component — that is, a lower frequency. In the simulator, increasing f_0 from 200 Hz to 400 Hz moves the cepstrum peak to the left, not to the right, because the period T_0 = F_s/f_0 becomes shorter. The intuition "right on the cepstrum = higher frequency" simply does not work.

A second pitfall is to overlook integer multiples of the peak location. Smaller peaks appear at n = 2T_0, 3T_0, … in addition to the main peak at n = T_0. These are not "harmonics of the period" but a natural by-product of Fourier-transforming a periodic structure in the log spectrum (harmonics at multiples of the fundamental frequency). In the simulator, lowering α (strengthening the harmonics) makes the secondary peaks at twice and three times the location more visible. A robust algorithm has to pick the fundamental peak rather than just the largest peak — that is the "octave error" problem.

Finally, the cepstrum is not a universal period detector. For non-stationary signals whose pitch changes with time, a long analysis window mixes several pitches and blurs the peak. In practice short windows (about 20–40 ms) are used and a cepstrum is computed per frame to track the pitch contour. Signals without periodicity (plosives, impulsive responses) have no peak at all. Treat the cepstrum as a tool that shines on strongly periodic signals, and choose your window accordingly.

Cepstrum Analysis Simulator — Speech & Vibration Analysis

What is the Cepstrum Analysis Simulator

Frequently Asked Questions

Real-World Applications

Common Misconceptions and Cautions

Related Tools