STI Speech Intelligibility Simulator

Q: How is STI different from RASTI and %ALcons?

STI (Speech Transmission Index) is the full IEC 60268-16 metric and combines 7 octave bands × 14 modulation frequencies, i.e. 98 MTF values. RASTI is a simplified two-band outdoor approximation and is no longer recommended by the current standard. %ALcons (Articulation Loss of Consonants) is a percentage loss metric long used in the PA industry, with STI ≈ 0.5 corresponding roughly to %ALcons = 10%. STI is the international standard in building acoustics and is what this tool computes.

Q: What STI value should I target for a classroom or auditorium?

Typical targets are: offices and meeting rooms ≥ 0.60 (Good), classrooms and lecture halls ≥ 0.65 (children have roughly half the attention reserve of adults, so the bar is higher), theatres and concert halls for spoken passages ≥ 0.55, and EVAC evacuation announcements (IEC 60849) ≥ 0.50 as a legal requirement. Below STI = 0.45, word recognition drops sharply: listeners can tell that someone is speaking but cannot extract the message.

Q: Apart from lowering T60, how can I raise the STI?

Reducing reverberation by adding absorption is the strongest single lever, but four other actions help: (1) increase signal level to gain SNR (above ≈ 80 dB it becomes intrusive), (2) move loudspeakers closer to listeners to boost the direct field (distributed ceiling arrays), (3) use directional column speakers to suppress reflections, and (4) lower the background noise from HVAC and so on. Sliding the directivity Q or the distance in this tool shows how the critical distance and the direct/reverberant balance change.

Q: Why does STI weight the modulation around 1-4 Hz so heavily?

Speech carries its meaning in roughly 4-8 Hz syllabic modulation and 1-2 Hz stress patterns. Steeneken and Houtgast showed that measuring the MTF at fourteen modulation frequencies from 0.63 Hz to 12.5 Hz reproduces the perceived intelligibility of vowels and consonants very well. A long T60 flattens the MTF starting from the low end, but it is the loss in the 2-4 Hz syllable band that damages comprehension the most.

STI Speech Intelligibility Simulator

The Speech Transmission Index (STI) is the international standard score that quantifies how reverberation and background noise damage speech intelligibility, via the modulation transfer function (MTF). Move T60, SNR, distance, room volume and directivity to evaluate classrooms, theatres, PA and EVAC announcements in real time.

Parameters

Reverberation time T60

Time for the sound pressure to decay by 60 dB after the source stops

Signal level

Source level at 1 m (talker or loudspeaker)

Background noise

HVAC, audience murmur, etc. (A-weighted, indicative)

Source-listener distance

Room volume V

m³

Classroom ≈ 200, auditorium ≈ 3,000, gymnasium ≈ 10,000, airport hall ≈ 30,000

Source directivity Q

Omni = 1, human voice ≈ 2, horn ≈ 10, column ≈ 20

Results

—

Reverb time T60 (s)

—

Critical distance (m)

—

Listener SNR (dB)

—

Reverb modulation m_rev

—

STI value

—

Intelligibility rating

—

Room sound field — direct, reflected and SNR zones

Yellow = source, white = listener. The direct ray and ceiling/floor reflections superimpose, and the background is shaded by SNR zone: green (≥ 10 dB), yellow (0-10 dB), red (< 0 dB).

MTF — modulation frequency vs m (reverb + SNR)

STI vs reverberation time T60 (current SNR)

Theory & Key Formulas

$$m_{rev}(F) = \frac{1}{\sqrt{1+\bigl(\frac{2\pi F\,T_{60}}{13.8}\bigr)^{2}}},\quad m_{snr} = \frac{1}{1+10^{-\mathrm{SNR}/10}}$$

Modulation transfer function (MTF). F is modulation frequency [Hz], T60 is reverberation time [s], SNR is the listener-position signal-to-noise ratio [dB]. Reverberation and SNR are combined as independent modulation-reduction factors.

$$m_{k} = m_{rev,k}\cdot m_{snr,k},\quad \mathrm{SNR}_{eff} = 10\log_{10}\!\frac{\bar{m}}{1-\bar{m}}$$

Per-band combined m and apparent effective SNR. This tool averages 14 modulation frequencies from 0.63 to 12.5 Hz as a one-octave-band approximation.

$$\mathrm{STI} = \frac{\mathrm{SNR}_{eff,\,clip} + 15}{30}\in[0,1]$$

STI is a linear mapping of SNR_eff clipped to ±15 dB. Below 0.45 = Poor, 0.60-0.75 = Good, ≥ 0.75 = Excellent (IEC 60268-16).

Speech Intelligibility Evaluation with STI

🙋

In a big hall, the announcement just sounds like mush — does turning the volume up actually help?

🎓

Often it doesn't. Cranking the level adds more reflections, so the reverberation smears each syllable into the next and the consonants pile up. The metric that captures this is the STI (Speech Transmission Index), a 0-1 score defined by IEC 60268-16. Below 0.45 word recognition collapses, and EVAC evacuation announcements are legally required to reach STI ≥ 0.50.

🙋

How is STI actually computed? It's not the same as SNR, right?

🎓

The core idea is the modulation transfer function (MTF). Speech carries its meaning in slow envelope variations — about 4 Hz for syllables and 1-2 Hz for stress patterns — and reverb plus noise erodes those modulation peaks. In formula form, m_rev = 1/√(1+(2πF·T60/13.8)²) tells you how a long T60 flattens the MTF at low modulation frequencies. Multiply that by m_snr = 1/(1+10^(-SNR/10)), back out an apparent SNR, and STI = (SNR_eff+15)/30.

🙋

So both reverb and noise matter. The defaults (T60=1.2 s, signal 70 dB, noise 50 dB) give STI = 0.45 → "Poor". That's the value of a typical classroom?

🎓

Exactly the issue. Japanese primary schools are supposed to be designed for T60 ≈ 0.7 s, but field surveys show many rooms reach 1.0-1.5 s. Children at the back are listening at a "Poor" intelligibility level. ANSI S12.60 actually requires STI ≥ 0.65 and background noise ≤ 35 dB(A) for school spaces. Try setting T60 = 0.7 s and noise = 40 dB on the left — the STI jumps into the 0.7 range.

🙋

Just got STI 0.73 — "Good". So adding absorbers to kill reverb is the silver bullet?

🎓

Mostly yes — but absorption has limits. Pull T60 below about 0.3 s and the room feels dead and music dies. Multipurpose halls solve it with movable absorbers that switch the reverb on demand. A subtler trap is the critical distance shown in the stat cards: the distance at which reverberant sound equals the direct sound. At the defaults it's only 1.65 m, so a listener at 8 m is essentially hearing reflections. Raising Q (a column speaker) extends the critical distance so direct sound dominates farther into the room.

🙋

So that's why airport announcements use column speakers! Evacuation acoustics is deeper than it looks.

🎓

Right. Japan's Building Standard Law and Fire Service Act mandate emergency PA, and IEC 60849 / EN 54-16 go further by requiring STI ≥ 0.50 at every seat. In a big terminal that needs distributed loudspeakers, delay alignment and tight directivity control. STI is not just a number — for life-safety systems it is the metric that decides whether the evacuation message gets through when it counts.

Frequently Asked Questions

STI (Speech Transmission Index) is the full IEC 60268-16 metric and combines 7 octave bands × 14 modulation frequencies, i.e. 98 MTF values. RASTI is a simplified two-band outdoor approximation and is no longer recommended by the current standard. %ALcons (Articulation Loss of Consonants) is a percentage loss metric long used in the PA industry, with STI ≈ 0.5 corresponding roughly to %ALcons = 10%. STI is the international standard in building acoustics and is what this tool computes.

Typical targets are: offices and meeting rooms ≥ 0.60 (Good), classrooms and lecture halls ≥ 0.65 (children have roughly half the attention reserve of adults, so the bar is higher), theatres and concert halls for spoken passages ≥ 0.55, and EVAC evacuation announcements (IEC 60849) ≥ 0.50 as a legal requirement. Below STI = 0.45, word recognition drops sharply: listeners can tell that someone is speaking but cannot extract the message.

Reducing reverberation by adding absorption is the strongest single lever, but four other actions help: (1) increase signal level to gain SNR (above ≈ 80 dB it becomes intrusive), (2) move loudspeakers closer to listeners to boost the direct field (distributed ceiling arrays), (3) use directional column speakers to suppress reflections, and (4) lower the background noise from HVAC and so on. Sliding the directivity Q or the distance in this tool shows how the critical distance and the direct/reverberant balance change.

Speech carries its meaning in roughly 4-8 Hz syllabic modulation and 1-2 Hz stress patterns. Steeneken and Houtgast showed that measuring the MTF at fourteen modulation frequencies from 0.63 Hz to 12.5 Hz reproduces the perceived intelligibility of vowels and consonants very well. A long T60 flattens the MTF starting from the low end, but it is the loss in the 2-4 Hz syllable band that damages comprehension the most.

Real-World Applications

Schools and educational facilities: ANSI S12.60 and the Japanese JIS Z 8731 recommend T60 ≤ 0.6 s and background noise ≤ 35 dB(A) in classrooms for children, which roughly maps to STI ≥ 0.65. Rooms that fall short widen the achievement gap for early-grade pupils, hearing-impaired children and second-language learners. Designers run STI estimates like this tool early in the project to size absorbing ceilings, carpet and bookshelf placement.

EVAC and emergency public address: IEC 60849 / EN 54-16 and Japan's Fire Service Act all require an STI ≥ 0.50 at every seat for emergency announcements. In reverberant spaces like airport terminals, underground malls, large shopping centres and stadiums, distributed column-speaker layouts combined with DSP delay alignment are used so that direct and reflected sound add coherently at each listener.

Theatres, concert halls and worship spaces: Music benefits from moderate reverberation (T60 = 1.6-2.2 s), while dialogue needs STI ≥ 0.55. Bridging the gap is hard. The modern answer is movable absorbing banners, switchable reverberation chambers and electronic acoustic enhancement so the room can be retuned per programme. Mosques and churches face the same trade-off between musical prayer and intelligible sermon.

Open-plan offices and call centres: The WELL Building Standard requires STI ≥ 0.65 in meeting rooms and phone booths. The opposite is true in open areas, where "sound masking" deliberately injects pink-spectrum noise to bring STI down to about 0.30 so neighbouring conversations cannot be understood — the same metric used in two directions.

Common Misconceptions and Pitfalls

The biggest trap is using only a single average T60 for STI. STI is strictly a weighted sum of MTF values across seven octave bands (125 Hz to 8 kHz). If only the low-frequency T60 (125-250 Hz) is long, the room sounds "boomy" and consonants are masked. This tool reports a single representative T60 for clarity, but real measurements often show very different per-band T60 values, so do not ignore low-frequency absorption. Detailed evaluation needs a measurement system (B&K 2270 and similar) or a ray-tracing tool such as ODEON or OPENSTAGES.

Next is the belief that "if I raise the SNR enough, a long reverb won't matter". SNR and reverberation enter the MTF as independent multiplicative factors, so a poor one capped the other. With T60 = 2.5 s, m_rev saturates at about 0.35, and even at 30 dB SNR the STI tops out near 0.5. In reverberant spaces the correct order of investment is absorption first, then SNR, then directivity.

Finally there is the assumption that "more loudspeakers always means more intelligibility". Adding loudspeakers does increase direct sound, but each takes a different path to the listener and the resulting time-delayed copies can create echoes that drop the STI. The fix is to compute the delay between adjacent speakers and the listener, then equalise them in DSP. The reason airports use closely spaced ceiling speakers is precisely so that each listener is dominated by the nearest one and the delay misalignment with the next speaker stays inside the ear's fusion window.

Speech Intelligibility Evaluation with STI

Frequently Asked Questions

Real-World Applications

Common Misconceptions and Pitfalls

How to Use

Worked Example

Practical Notes