I’ve seen quality managers spend three days investigating a common cause variation signal — ringing suppliers, pulling batch records, convening a review — and find nothing, because there was nothing to find. The variation was random. It was doing what random variation does. Meanwhile, a genuine special cause signal on a different chart sat unremarked for six weeks because everyone was busy with the first investigation. Walter Shewhart built control charts in the 1920s to prevent exactly this. They work. The problem is that reading them correctly requires a discipline that most organisations haven’t built.
Table of Contents
What Is Common Cause Variation?
Common cause variation is the noise a stable process produces on its own — without any equipment failure, without any supplier change, without anything going wrong. Shewhart called it chance cause variation; Deming renamed it common cause. The terminology shifted but the concept didn’t: it’s what a well-run process looks like in the data.
Every process has it. A delivery operation averaging 4.2 days will produce 3.8 days one week, 4.6 the next, 4.1 the week after. If those movements are random and within the control limits, nothing specific caused them. They’re the process doing what it does. Asking why Tuesday’s delivery took 4.6 days when Monday’s took 3.9 is asking a question that has no useful answer.
The aggregate is predictable, even when individual points aren’t. You can’t forecast whether tomorrow’s delivery will take 3.9 or 4.5 days. You can say it’ll probably land between 3.2 and 5.2. That’s the historical range. A stable process isn’t necessarily a good one — 4.2 days average might be unacceptably slow for the customer. Stability and adequacy are different questions. But a stable process is at least a known quantity. You can plan around it, budget for it, set customer expectations against it. An unstable process — one with active special causes — gives you none of that.
What Is Special Cause Variation?
Special cause variation (also called assignable cause variation) comes from a specific, identifiable source outside the normal operating conditions of the process. A machine begins to wear. A new supplier starts. A different operator runs the line on night shift. A batch of raw material arrives off-spec. Training hasn’t been completed before someone takes over a task. What these have in common is that they’re not inherent to the process design — they’re things that changed, or varied from the standard, or went wrong in a way that isn’t supposed to happen. Special causes produce a signal in the data that’s distinguishable from the natural variation of a stable system.
Most special causes are mundane. A machine that needs recalibration. A process that runs differently on night shifts than days — same operators, same equipment, but different ambient temperature or different supervisory attention. What makes them special isn’t drama, it’s identifiability. They have a cause that can, in principle, be found and removed. Fix it and the process returns to its stable baseline.
Special cause variation is non-random. It shows up on a control chart as a point outside the control limits, a run of consecutive points on one side of the centreline, a trend, or another non-random pattern. The chart identifies the signal. It doesn’t explain it — that’s what the investigation is for, and the investigation should start while the event is still recent enough to have witnesses and records.
| Common Cause Variation | Special Cause Variation | |
|---|---|---|
| Origin | Inherent in the process design | Specific, identifiable external factor |
| Pattern on chart | Random, within control limits | Non-random — point outside limits, run, trend |
| Predictability | Predictable range; unpredictable individual points | Unpredictable — breaks the expected pattern |
| Response | Improve the system (management’s job) | Find and remove the specific cause |
| Wrong response | Investigating individual points adds variability | Ignoring it lets the root cause persist |
| Terminology | Chance cause, noise, natural variation | Assignable cause, signal, special variation |
Deming’s estimate — that around 94% of process problems are common cause, requiring systemic rather than reactive responses — is often cited without the important caveat that it’s a rough heuristic, not a universal constant. The proportion varies significantly by industry, by how mature the process is, and by how aggressively special causes have already been eliminated. A new production line with poorly trained operators and inconsistent suppliers might see the ratio flip.For background on Shewhart’s original framework, the ASQ control chart resource is a reliable starting point. The point that survives context-stripping is the direction: most reactive investigation of individual results in stable processes is misdirected effort, and most sustainable improvement comes from addressing the system rather than individual events.
Reading a Control Chart: Identifying Common Cause Variation and Special Cause Signals
A control chart has three reference lines: the Upper Control Limit (UCL), Centre Line (CL), and Lower Control Limit (LCL). Control limits come from the process data — typically ±3 standard deviations from the mean. Not from the customer. Not from the engineer’s target. From what the process actually does. That’s the distinction people consistently blur, and it matters more than almost anything else in reading the chart correctly.Minitab’s guide on using control charts to detect variation explains the practical setup in detail.
Control limits describe what the process actually does. Specification limits describe what it needs to do. A process can be in statistical control — all points within control limits, no special causes — while still failing to meet specification if the process average is off-target or the natural variation is too wide. These are two separate problems requiring different responses. Confusing them is common and leads to either over-adjusting a stable process or accepting process performance that doesn’t meet requirements.
Control chart signals for special cause variation
The Western Electric rules identify non-random patterns. The most applied in practice: a single point beyond the 3-sigma control limit is the most obvious and catches the most dramatic events. Eight or more consecutive points on the same side of the centreline is subtler but often more important — it suggests the process mean has shifted, which can happen gradually enough that no individual point triggers the first rule. Six consecutive points trending steadily up or down indicates drift — a gradual deterioration that often has a physical cause (tool wear, gradual contamination, seasonal temperature change). Two of three consecutive points beyond 2 sigma is a more sensitive rule and generates more false alarms; many organisations apply it selectively.
On a control chart showing only common cause variation, you’d expect roughly 99.73% of points to fall within ±3 sigma control limits. A point outside those limits has a probability of about 0.27% under normal variation — rare enough to treat as a signal worth investigating. But the rules are not infallible. With enough data points, you’ll eventually see patterns that look like signals but are still random. This is why context — knowing what was happening in the process when the signal appeared — matters as much as the chart itself.
The Two Misidentification Errors: Common Cause vs Special Cause
Misidentifying variation type is the root cause of a huge amount of wasted quality management effort. Both directions of error are damaging.
Treating common cause as special cause (over-control)
A production line averages 4.3% defects with control limits between 2.1% and 6.5%. One batch comes in at 5.8%. Within the limits. Higher than usual, though. A manager adjusts a machine setting, speaks to the operator, makes a note on the supplier record. The next batch is 3.1%. The adjustment worked, apparently. The batch after that is 5.4%. Another adjustment.
By week three the process is producing 4.1%, 6.0%, 2.8%, 5.5%, 3.3% — more variable than before any of this started. The manager is working harder than ever and making things worse. This is tampering. Each intervention is itself a source of variation. The defect rate gets worse and the manager genuinely can’t see why, because from where they’re standing each individual decision was justified by the data they had at the time.
Treating special cause as common cause (under-control)
The opposite failure is quieter but can be more expensive. A new batch of raw material arrives from a different supplier. The defect rate jumps from 4.3% to 11.2% — well outside the UCL. The quality team notes it, decides it’s “just one of those things,” and waits to see if it settles. The next three batches from the same supplier all show similar defect rates. By the time the supplier is identified as the cause, four production runs of product with elevated defect rates have gone out the door.
The cost of under-control is proportional to how long the special cause runs before it’s identified. A special cause that’s caught in one production cycle costs the rework and waste from that cycle. A special cause that runs for three months — because nobody investigated the signals — can cost substantially more, including customer-facing quality failures and the reputational damage that follows.
How to Respond to Each Type
What to do about common cause variation
Reducing common cause variation means changing the process — not investigating individual results. Deming was insistent that this is management’s responsibility, not the operators’. The people running the process didn’t design it. They inherited it. If the process produces 4.3% defects when nothing is wrong, getting to 2% means better equipment, tighter incoming specifications, more consistent materials, or redesigned training. Something structural has to change. Asking the team to “try harder” in response to common cause variation is asking the wrong people to solve the wrong problem.
Investigating individual common cause data points — asking why yesterday’s batch had 5.1% rather than 3.8% — is almost always unproductive. The answer is usually “random variation,” and the investigation itself diverts attention from the systemic improvement work that could actually make a difference.
What to do when a special cause signal appears
Start investigating immediately. The further from the event, the harder it is — witnesses move on, batch records get archived, equipment gets serviced and the state that produced the signal is gone. Ask what was different: machine state, operator, raw materials, procedure version, shift, time of day. Something changed. Document what you find even if it’s inconclusive. And if the signal is an unexpectedly good result rather than a bad one, investigate that too — a positive outlier has a cause just as much as a negative one, and it’s usually more actionable.
Not every special cause investigation succeeds. Sometimes the signal was real but the cause can’t be identified from available records. That’s a process documentation problem — it means the right data wasn’t being captured to support root cause analysis. The appropriate response is to improve what’s recorded rather than to conclude that the variation was random.
Tampering: What Over-Control Does to a Process
Deming had a name for the damage caused by treating common cause variation as special: tampering. He demonstrated it with the funnel experiment — a marble is dropped through a funnel onto a target on paper. Leave the funnel in place and the marbles cluster around the target with natural variation. Adjust the funnel after each drop based on where the last marble landed and the marbles spread progressively further from the target. The adjustments create more variability. Not less.
The experiment is counterintuitive because each individual adjustment is rational. The last marble landed 3cm to the left, so you move the funnel 3cm to the right. Sensible for that specific drop. The problem is the next drop won’t land in the same place for the same reason — it’s random variation — so the correction compounds rather than cancels.
Every organisation that adjusts a process based on individual results rather than demonstrated special cause signals is running a version of this. A call centre changing staffing based on yesterday’s volume. A warehouse adjusting procedures based on last week’s error rate. A plant recalibrating equipment after every reading that isn’t exactly on-target. All of these are tampering if the underlying process is in statistical control. All of them increase the variability they’re trying to reduce.
How to know if you’re tampering
If adjustments are being made based on individual data points rather than control chart signals, tampering is the most likely explanation. The absence of a documented control chart is the most obvious indicator. The next most common: adjustments made by whoever happens to be on shift rather than by a structured response process with defined trigger points. When the trigger is “the supervisor noticed” rather than “the chart shows a signal,” you’re in tampering territory.
Inconsistent adjustment decisions — different operators responding differently to the same data — is itself a source of special cause variation that the control chart will eventually flag. One of the most consistent findings when organisations start charting their processes properly is that a significant proportion of their variation was introduced by their own adjustment behaviour rather than by the underlying process. That’s a difficult finding to present to management.
Examples Across Industries
Construction and project management
A civil engineering contractor tracking daily concrete pour volumes across a major programme found that output varied between 280 and 340 cubic metres per day over a 6-month period. Random, symmetrically distributed around a mean of 312 cubic metres, within control limits. Common cause variation — normal weather fluctuation, crew fatigue cycles, minor equipment variation, the inherent complexity of coordinating multiple trades. No single day warranted investigation.
During month 7, output dropped to 198 cubic metres on three consecutive days — well below the lower control limit. Investigation identified a new admixture batch from a different supplier affecting setting time. Special cause. The batch was quarantined, the supplier was contacted, and output returned to normal within two days. Without the chart, the site team would likely have attributed it to “a bad week” and investigated nothing. Three more batches from the same supplier were already on order. They were put on hold.
Healthcare
A&E departments tracking door-to-treatment times sit in common cause variation for most of the time. Seasonal demand, staffing mix, case complexity, time of day — all of these contribute to natural variation in wait times. A department averaging 47 minutes might see days ranging from 38 to 61 minutes without anything having gone wrong or right in particular. I’ve reviewed A&E performance data where a team had spent two hours in a debrief trying to explain why Friday’s times were worse than Thursday’s. The control chart made clear within about 30 seconds that both days were well within normal variation. The debrief was two hours of explaining random noise to people who needed to believe it meant something.
A day landing above the UCL — say, 78 minutes on a Tuesday afternoon — is different. A staffing gap, a major incident, a systems failure. The control chart makes a distinction between background noise and actual signal that target-versus-actual reporting never can. Several NHS trusts now use SPC charts for this reason. Don Berwick’s 2013 report on NHS patient safety called out the misuse of targets explicitly. Not subtle. Targets applied to common cause variation produce tampering. Tampering produces harm.
Software development
Sprint velocity in agile teams shows common cause variation that managers routinely misinterpret. A team delivering 34, 41, 28, 38, 36, 43, and 31 story points across seven sprints has a mean of around 36 and a range of 15 points. That variation is normal given estimation complexity, task type range, and minor capacity fluctuations. Demanding an explanation for the 28-point sprint — or celebrating the 43-point sprint — is tampering with a stable process.
A sprint delivering 12 points is different. Six consecutive sprints declining is different. Those warrant investigation — technical debt deferred long enough to start costing velocity, a key team member who\’s emotionally checked out before their formal resignation date, a scope definition quietly expanding. I\’ve sat in retrospectives where a team spent 45 minutes trying to explain a 28-point sprint sitting comfortably within their normal variation range. The next sprint was 42 points. Nobody mentioned it because there was nothing to say. Nobody told the team that, either.
Manufacturing
A food manufacturer tracking product fill weights expects common cause variation within ±3 grams of target. Natural variation from minor pressure fluctuations in the filling head, density variation in the product, and vibration during line operation. Points falling outside ±8 grams — control limits from historical data — represent special causes: a worn seal, a temperature excursion in the hopper, a calibration drift.
A common error: confusing specification limits with control limits. A product specified at 500g ±15g might have a process operating at 500g ±4g normally. When a point lands at 509g — within specification but outside the control limit — it’s still a special cause signal. Ignoring it because it passed specification means leaving an assignable cause unidentified. That cause may deteriorate over time until the point lands outside specification, at which point the investigation is more expensive and the product impact is larger.
Common and Special Cause Variation in Six Sigma
Six Sigma methodology is, at its core, a structured response to variation. DMAIC — Define, Measure, Analyse, Improve, Control — frames improvement projects around characterising and reducing variation, and the common/special cause distinction shows up in every phase. Get it wrong in Measure and the entire analysis that follows it is working from bad data.
Measure phase
During Measure, baseline process performance is established and variation is characterised. Control charts are built to determine whether the process is in statistical control before improvement work begins. A process with active special causes can’t be reliably characterised — its Cp and Cpk values are meaningless if special causes are inflating the apparent spread. Special causes need to be found and removed before capability analysis produces interpretable numbers.
Analyse and Improve phases
In Analyse, root cause analysis tools — fishbone diagrams, 5-Why, regression — are applied to variation sources. Common causes point toward process design, equipment selection, material specifications, or training gaps: systemic issues. Special causes point toward specific events, machines, operators, or batches. Improvement for common causes means redesigning the process to reduce inherent variability. Improvement for special causes means eliminating the assignable cause so it doesn’t recur.
Control phase
The Control phase establishes charts and response plans to sustain improvement. A project that reaches Control without a clear mechanism for distinguishing common from special cause variation in future monitoring hasn’t completed the work. The control plan needs to define what constitutes a signal, who responds, what the steps are, and within what timeframe. Without this, the improvement erodes: tampering re-enters, special causes go uninvestigated, and within 18 months the process is back where it started.
Tools for Identifying Variation Type
Control charts
The X-bar and R chart is the most widely used for continuous data where subgroups are taken. The Individuals and Moving Range (I-MR) chart is used when data arrives one point at a time. P charts and NP charts handle attribute data (proportion or count defective). C charts and U charts handle count data (defects per unit). The chart type affects the sensitivity of control limits and which detection rules apply.
All control charts share the same logic: calculate limits from the process data, plot over time, apply rules to identify non-random patterns. The discipline is in updating limits when the process genuinely changes — an improvement that shifts the mean should update the centre line — without updating them simply to make current performance look acceptable. Retroactively adjusting control limits to eliminate signals is the data equivalent of tampering, and it’s more common than quality professionals usually acknowledge.
Histograms and run charts
A histogram shows the distribution shape. A bimodal distribution often indicates two distinct populations — two machines, two operators, two shifts whose results have been combined. That’s frequently a special cause. A run chart shows trends and patterns but lacks statistical rigour to distinguish signal from noise reliably. It’s a starting point, not a substitute for a control chart.
Process capability indices
Cp and Cpk measure how well a process performs relative to specification limits, but only when calculated from a process in statistical control. A Cpk below 1.0 means the process produces output outside specification under normal conditions — a common cause problem requiring process redesign. A Cpk that was 1.4 last month and is 0.8 this month with no process change suggests a special cause has entered and the figure is now unreliable. Reporting capability numbers from an out-of-control process is meaningless at best. At worst it’s actively misleading. Both happen regularly.
Measurement system analysis
Before analysing process variation, confirm that the measurement system isn’t contributing significantly to the apparent variation. Gauge R&R studies quantify how much of total observed variation comes from measurement error versus genuine process variation — separating repeatability (does the same person get the same result twice?) from reproducibility (do different people get the same result?).
A measurement system contributing more than 10–30% of total variation (the threshold varies by application) will distort the control chart and make the signal-noise distinction unreliable. This is often the unglamorous finding when organisations first build control charts properly: a substantial proportion of the variation they assumed was in the process is actually in how they’re measuring it. The measurement device is inconsistent, or different operators interpret the measurement procedure differently, or the sampling method introduces bias. That finding is uncomfortable — it means the previous improvement efforts were based on noise — but it’s necessary before any reliable improvement work can begin. A control chart built on a measurement system with 40% gauge R&R error is telling you more about the measurement process than the production process.
See Also
Control Chart versus Run Chart
Since 2004 I work for ICT Management which provides worldwide quality management service. Passionate about new technologies, i have the privilege to implement many new systems and applications for different departements of my company. I have Six Sigma Green Belt.

I like this post, enjoyed this one thanks for putting up.