Hypothesis Test (z / t) Back
Statistical Testing

Hypothesis Test Calculator

Run one-sample z-test, one-sample t-test, or two-sample t-test interactively. Real-time p-value, critical regions on the distribution chart, and Cohen's d effect size.

Test Setup
Test Type
Alternative Hypothesis
Significance Level α
Sample 1
Sample Mean x̄₁
Sample Std Dev s (or σ)
Sample Size n₁
Null Hypothesis Mean μ₀
Sample 2
Sample Mean x̄₂
Sample Std Dev s₂
Sample Size n₂
Results
Test Statistic t
p-value
Critical Value
Deg. of Freedom df
Cohen's d
Distribution with Critical Region
CAE & Quality Control Applications Statistical hypothesis testing is used to verify significance of material batch differences, compare simulation vs. experimental results, quantify before/after improvement effects, and validate process changes. The two-sample t-test is the standard method for comparing two design variants or manufacturing processes.
Theory & Key Formulas

One-sample z-test: $z = \dfrac{\bar{x} - \mu_0}{\sigma / \sqrt{n}}$

One-sample t-test: $t = \dfrac{\bar{x} - \mu_0}{s / \sqrt{n}}$, degrees of freedom $df = n-1$

Two-sample t-test: $t = \dfrac{\bar{x}_1 - \bar{x}_2}{s_p\sqrt{1/n_1+1/n_2}}$, $s_p^2 = \dfrac{(n_1-1)s_1^2+(n_2-1)s_2^2}{n_1+n_2-2}$

Effect size: Cohen's $d = \dfrac{|\bar{x} - \mu_0|}{s}$ — small: 0.2, medium: 0.5, large: 0.8

What is Hypothesis Testing?

🙋
What exactly is a p-value? I see it in the simulator results, but I'm not sure what it's telling me.
🎓
Basically, the p-value is the probability of seeing your sample data—or something more extreme—if the null hypothesis were true. In practice, a small p-value (like less than 0.05) is evidence against the null. Try moving the "Sample Mean x̄₁" slider away from the "Null Hypothesis Mean μ₀" in the simulator. You'll see the p-value drop instantly, showing it's becoming less likely your result is just random chance.
🙋
Wait, really? So when should I use a z-test versus a t-test? The simulator has both.
🎓
Great question. It boils down to whether you know the true population standard deviation (σ). Use a z-test if you know σ (rare in real life). Use a t-test when you only have the sample standard deviation (s). A common case is testing a new material's strength: you have sample data, not the full population data. In the simulator, if you enter a number for "σ", it runs a z-test. If you leave σ blank and use "s", it automatically switches to a t-test.
🙋
That makes sense. What about the two-sample test? When I enter data for Sample 2, the chart changes completely.
🎓
Exactly! The two-sample test compares the means of two independent groups. For instance, comparing the crash performance of a standard car door vs. a newly reinforced one. When you fill in x̄₂ and s₂, the tool calculates whether the difference between the two sample means is statistically significant. The p-value you get then answers: "Is the observed difference likely real, or could it just be sampling variation?"

Physical model & Key Equations

The core of a one-sample test is standardizing the difference between your sample mean and the hypothesized mean. This creates a test statistic (z or t) that tells you how many standard errors apart they are.

$$z = \dfrac{\bar{x} - \mu_0}{\sigma / \sqrt{n}}\quad \text{or}\quad t = \dfrac{\bar{x} - \mu_0}{s / \sqrt{n}}$$

$\bar{x}$: Your sample mean.
$\mu_0$: The null hypothesis mean you're testing against.
$\sigma$ or $s$: Population or sample standard deviation.
$n$: Sample size. The $\sqrt{n}$ in the denominator is why larger samples give more precise tests.

For comparing two independent samples, the formula changes to account for the variance in both groups. The test statistic measures the standardized difference between the two sample means.

$$t = \dfrac{\bar{x}_1 - \bar{x}_2}{\sqrt{\dfrac{s_1^2}{n_1}+ \dfrac{s_2^2}{n_2}}}$$

$\bar{x}_1, \bar{x}_2$: Means of sample 1 and 2.
$s_1, s_2$: Standard deviations of the samples.
$n_1, n_2$: Respective sample sizes. The denominator is the standard error of the difference between means, which gets larger if either sample is very variable or small.

Frequently Asked Questions

Use a z-test when the population standard deviation is known and the sample size is large (typically 30 or more). Use a t-test when the population standard deviation is unknown. In actual data analysis, the population standard deviation is rarely known, so the t-test is recommended in most practical situations. This tool allows you to try both and compare the differences in results.
On the distribution graph, the rejection region is displayed as colored areas at both ends corresponding to the significance level (e.g., 5%). When the calculated test statistic falls within this region, it is judged as 'statistically significant.' The p-value corresponds to the area outside the test statistic, and a smaller p-value indicates that the null hypothesis is less likely to occur.
As a general guideline, d = 0.2 is interpreted as a 'small' effect, 0.5 as 'medium,' and 0.8 or above as a 'large' effect. However, this varies by field; in medicine and psychology, even 0.8 is considered a large effect. Please do not rely solely on the p-value but also consider this effect size to assess the practical significance of the results.
In this tool, you can automatically select 'Welch's t-test,' which does not assume equal variances. If equal variances are questionable or the sample sizes differ greatly, it is safer to use Welch's test. We recommend switching the settings using the slider or input fields and comparing both results to check for differences in interpretation.

Real-World Applications

Material Batch Quality Control: A manufacturer receives a new batch of aluminum alloy. Engineers test the yield strength of a sample (n₁, x̄₁, s) and compare it to the standard specification mean (μ₀) using a one-sample t-test. A low p-value flags a potentially defective batch before it goes into production.

simulation vs. Physical Test Validation: After a finite element analysis predicts a component's failure load, physical tests are conducted. A two-sample t-test compares the mean failure load from simulation data (x̄₁) against the mean from experimental data (x̄₂) to validate the accuracy of the CAE model.

Process Improvement Analysis: A factory implements a new welding robot. To quantify the effect, engineers measure weld strength before (x̄₁, s₁, n₁) and after (x̄₂, s₂, n₂) the change. A two-sample t-test determines if the observed increase in mean strength is statistically significant or just random variation.

A/B Testing in Design: Two different website layouts (A and B) for an engineering software are shown to users. The average time to complete a task is recorded for each group. A two-sample t-test is used to decide which design leads to a genuinely faster user performance, guiding the final design choice.

Common Misconceptions and Points to Note

When you experiment with this simulator, you'll likely encounter a few "Huh?" moments. A major misconception is the idea that "a smaller p-value = a larger effect size." This is completely wrong. The p-value is merely an indicator of "the probability of observing such a difference by chance." For example, try setting the sample size "n₁" to a large value like 1000. You'll see that even a tiny difference between the sample mean and population mean (e.g., 100MPa vs. 100.5MPa) can yield a p-value below 0.05, making it "significant." This indicates the difference is "likely real," but whether a 0.5MPa difference is practically meaningful is another question. This is where you should get into the habit of looking at Cohen's d. In this example, d would be very small, indicating a negligible practical effect.

Next, be careful with your choice between a one-tailed and two-tailed test. You can select this via the "Alternative Hypothesis" in the simulator, but whether you test for "mean is different" or "mean is greater (or smaller)" completely changes the rejection region and p-value. For instance, in quality inspection where you only need to confirm that a material's strength has "not decreased," you would use a one-tailed test (greater than). Using a two-tailed test without thought risks reducing your statistical power and missing a real difference that exists.

Finally, don't forget the assumption of normality. The t-test implicitly assumes your data follows a normal distribution. Real-world engineering data, especially things like wear amounts or failure lifetimes, often follow a log-normal distribution. Applying a t-test directly to such data can lead to incorrect conclusions. The golden rule is to first check your data's distribution using a histogram or a Q-Q plot.