Design of Experiments (DOE) Simulator

Parameters

Number of factors k

Independent input variables

Design type

From screening to optimization

Center point replicates n_c

Center-level reps for pure error and curvature

Estimated main effect σ

Magnitude of effect you want to detect

Residual standard deviation σ_n

Pure error from repeated measurements

Target power 1-β

Convention: at least 0.80 (=80%)

Results

—

Total runs N

—

Design resolution

—

Main effects

—

2-factor interactions

—

Power (%)

—

Savings vs OFAT (%)

—

Experimental points — cube plot

Cube vertices are the 2^k factorial points, points along the axes are the CCD axial points (±α), and the centre is the replicated center point. Switching design type changes the layout.

Run count by design type

Power vs total runs

Theory & Key Formulas

$$\text{Full } 2^k = 2^k \text{ runs},\quad \text{CCD} = 2^k + 2k + n_c,\quad \text{Power} = 1 - \beta$$

k = number of factors, n_c = center-point replicates. Resolution III: main effects clear. IV: main effects and some 2-factor interactions clear. V or higher: all 2-factor interactions clear.

$$\text{Fractional }2^{k-p} = \frac{2^k}{2^p},\quad \text{PB} \ge 4\lceil(k+1)/4\rceil \text{ runs}$$

Fractional designs cut runs by a factor of 2^p at the cost of aliasing (some effects can no longer be estimated independently). Plackett-Burman estimates only main effects with the minimum number of runs (a multiple of 4).

$$t = \frac{\Delta}{\sigma_n}\cdot\frac{\sqrt{N}}{2},\quad \text{Power} \approx 1 - \exp\!\left(-\frac{(t-z_{\alpha/2})^2}{2}\right)$$

From the estimated main effect Δ, residual standard deviation σ_n and total runs N, build a t-statistic and approximate the power with the two-sided test threshold z_{α/2}=1.96 (simple approximation).

Design of Experiments (DOE) — Orthogonal and Factorial Designs

🙋

I know the name "Design of Experiments", but what is the point? "Designing experiments" sounds like an obvious thing engineers already do.

🎓

Right, that's how it sounds — but "design" here means something specific. Most people do OFAT: fix everything else, change one factor, look at the result, then move on to the next. It feels intuitive, but the information per run is awful. For three factors at three levels OFAT needs about 10 runs and never sees any interaction between factors. A 2^3 full factorial does it in 8 runs and gives you all 3 main effects plus all 3 two-factor interactions, independently. Same workload, completely different amount of insight.

🙋

OK, so I should always go full factorial. But when I bump k up to 5 the run count jumps to 32. That gets impractical fast, doesn't it?

🎓

Exactly — the "curse of dimensionality" for 2^k. k=2 gives 4, k=5 gives 32, k=8 gives 256. That's where fractional factorial designs come in. Switch the slider to "2^(k-1) 1/2 fractional" and you cut the runs in half. The price is "confounding": some interactions become aliased with main effects. That is what Resolution III/IV/V measures. V and above keeps every 2-factor interaction clean. For screening, III/IV is plenty; for optimisation you want V or better.

🙋

What about the mysterious CCD and Plackett-Burman options?

🎓

Different jobs. Plackett-Burman is screening: "I just need to know which of these 10-20 factors matters, and quickly." It can sift through 11 factors in 12 runs. CCD (Central Composite Design) is the next stage: once you have your 3-5 important factors, CCD adds axial points (±α) and centre replicates to a 2^k factorial so you can fit a full quadratic surface y = b0 + Σbi·xi + Σbij·xi·xj + Σbii·xi². The classic industrial workflow is PB to screen, then CCD to optimise.

🙋

"Power" is one of the outputs and it jumps around a lot when I touch the sliders. What does it really mean?

🎓

Statistical power (1-β) is "the probability that, if a real effect exists, your experiment will actually flag it as significant." 0.80 is the conventional minimum. Power goes up with (1) larger effect size, (2) smaller residual noise, (3) more runs. So if you raise σ_n the power crashes; if you add more centre-point replicates n_c the total N grows and it recovers. This tool approximates power from t = Δ/σ_n × √N/2. Production tools use the noncentral t distribution, but for a planning estimate this is plenty.

🙋

Last thing — the "Savings vs OFAT" output sometimes goes negative. Does that mean DOE actually needs more runs than OFAT?

🎓

Yes, it can. With small k and lots of centre points, full factorial + centre points easily exceeds the apparent OFAT count (k factors × 3 levels). The key point though: DOE gets vastly more information out of the same runs — interactions, curvature, the lot. Even when the "% savings" is negative, the cost per estimated parameter is much lower for DOE. Push k up to 5 or higher and the savings flip strongly positive.

Frequently Asked Questions

OFAT (One Factor At a Time) is the naive approach of fixing every other factor and moving just one at a time. For k factors at 3 levels each it needs about 3k+1 runs and reveals only main effects. A 2^k full factorial DOE uses 2^k runs and estimates main effects and all two-factor interactions independently. For k=5, OFAT needs roughly 16 runs and shows only main effects, while a full factorial uses 32 runs and gives all 10 main effects plus 10 two-factor interactions. Per piece of information collected, DOE is dramatically cheaper.

Up to k=4 factors a full factorial (2^k=16 runs or fewer) is perfectly practical. From k=5 onward the run count doubles (32, 64, 128...), so you switch to 1/2 fractional (2^(k-1)) or 1/4 fractional (2^(k-2)) to keep the workload manageable. The cost is that some effects become 'confounded' and cannot be separated individually. Resolution III keeps only main effects clear, IV keeps main effects and some 2-factor interactions clear, and V or higher keeps all 2-factor interactions clear. Use III/IV for screening and V or higher for optimization.

CCD is the workhorse of Response Surface Methodology (RSM), used when you want to fit a second-order model y = b0 + Σbi·xi + Σbij·xi·xj + Σbii·xi² to your response. It consists of (1) a 2^k factorial portion, (2) 2k axial points at ±α, and (3) nc replicated center points — totalling 2^k + 2k + nc runs. This lets you estimate curvature and locate an optimum. The standard industrial workflow is: screen with PB or Fractional Factorial, then refine the 3-5 important factors with CCD.

Plackett-Burman (PB) designs estimate only main effects with the minimum number of runs. The run count is always a multiple of 4 and can handle up to (runs-1) factors. So 12 runs can screen 11 factors, 20 runs can screen 19 factors. Two-factor interactions are completely aliased with main effects (Resolution III), but for the 'which factors actually matter' stage that is precisely the answer you need. PB designs are standard in pharma, food science and bioprocess work where 10-20 input factors are common. Choose PB in this tool and you will see the run count collapse.

Real-World Applications

Manufacturing process optimization: Injection moulding, machining and semiconductor processes routinely use DOE to optimise temperature, speed, pressure and time at the same time. A typical injection moulding study screens mould temperature, injection speed, holding pressure and cooling time with a full 2^4 (16 runs) or 1/2 fractional (8 runs) and minimises sink, warp and dimensional error together. DOE is the analytical backbone of Six Sigma's "Analyze" and "Improve" phases.

Product development and formulation: Plastic compounding, pharmaceutical formulations and food recipes combine mixture designs with CCD. Stat-Ease Design-Expert, JMP and Minitab are standard. A three-component pharmaceutical formulation, for example, can be optimised for dissolution profile by varying the ratios of active ingredient, excipient and disintegrant — exactly where DOE shines.

CAE simulations and virtual experiments: A single simulation can cost hours to days on HPC, so DOE is the obvious sampling strategy. Without "measurement error at the centre point" the focus shifts to building metamodels (Kriging, RBF, polynomial regression). Latin Hypercube Sampling and CCD are the most common space-filling choices and feed directly into surrogate-based optimisation (including Bayesian Optimisation) for CFD, structural and topology problems.

Quality engineering and the Taguchi method: Japan's quality-engineering tradition uses orthogonal arrays L8, L16, L18 with an inner × outer array structure, evaluating robustness via the S/N ratio. The Factorial and Fractional designs in this tool are the Box-Hunter tradition from the West; Taguchi is the Japanese variant — but both share the same underlying idea of "designing the experiment". This thinking is part of why Toyota, Panasonic and many Japanese manufacturers have stayed competitive for decades.

Common Misconceptions and Pitfalls

The most common trap is "interpreting an interaction even though the design is only Resolution III". With a Resolution III fractional factorial, every main effect is fully confounded with one or more two-factor interactions. The value displayed as "Effect of A" might actually be A + BC + DE + …. If you take it at face value as "the main effect is real" you can easily reach the opposite conclusion of what is actually happening. This tool reports the resolution alongside the run count, but before interpreting any result you must check the aliasing structure to see which effects are confounded with which.

Next, "running zero centre points". The replicates n_c at the centre give you a clean estimate of pure error and a test for curvature. A 2^k design without centre points can only fit a linear model — if the true response is non-linear (which is exactly when an optimum exists), the prediction misses the target. Use at least n_c = 3-5. If the centre value differs significantly from the average of the cube vertices, that is your signal that curvature is present and you should extend to a CCD. That is also why power drops when you take n_c to zero in this tool.

Finally, "forgetting to randomise the run order". DOE statistics assume random ordering. In practice it is tempting to do all the low-temperature runs first because changing the oven is a pain — and then any temperature drift, operator learning curve or equipment ageing gets confounded with the factor effects, even in a beautiful Resolution V design. If you genuinely cannot randomise, build "Blocking" into the design from day one. Most "the design was clean but the results are noise" stories trace back to a missing randomisation step.

Design of Experiments (DOE) — Orthogonal and Factorial Designs

Frequently Asked Questions

Real-World Applications

Common Misconceptions and Pitfalls

How to Use

Worked Example

Practical Notes