What is FMEA?
FMEA is the acronym for Failure Mode and Effects Analysis and was initially introduced by the US military in the 1940s, but is now used in a wide variety of industries; electronics, automotive (FMEA is an integral part of the TS 16949 Standard), and many others. In recent years, there have been improvements to the concept due to limitations found with the standard FMEA procedure. These limitations will be discussed in another article. For now, we’ll focus on basic FMEA principles.
FMEA is part of a risk analysis exercise whereby potential failure modes are outlined (typically on a spreadsheet) and ranked according to severity. “Failure modes” refer to errors found in a product or process and the “effect analysis” will outline the impact of the failure mode. The potential risks are ranked according to severity so we can conclude that FMEA is, at its core, a risk prioritization tool that can be used by quality, engineering and other departments to brainstorm and identify potential risks and to try to prevent new ones from occurring. FMEA techniques can be used at any stage of the process from design conception to final shipment.
Image credit: Courtesy of Flickr
Basic FMEA Principles
FMEA is a method that attempts to forecast what can go wrong and to look for ways to address each possible failure. FMEA documents are typically produced in spreadsheet format and reflect what the company has learned from previous projects (previous errors and preventative methods are listed as possible problems for new projects) and possible new errors that may occur due to “holes in the process.”
For example a “hole in the process” could be evident in situations where it is possible for human error to be introduced, where automation is not possible or at a point where a decision is necessary. These “process flaws” can certainly be identified but may not be feasible to correct, due to costs or other considerations.
FMEA is based on a logical method that ranks risks according to set criteria and helps focus time and energy on prioritizing the ones that are most costly, or ones which have a direct customer impact.
The three relevant elements of risk for proper FMEA implementation are: Severity, Occurrence and Detection, each of which is given a number from 1 to 10 with 10 representing the highest risk. All three numbers are multiplied (the result is known as the Risk Priority Number) and this RPN allows the Project Team to have a clear picture of perceived risks at any moment in time and react accordingly to urgent issues. A real-life example is provided in another article and can be referred to for clarification.
Image Credit: Courtesy of Wikipedia Commons
When implementing an FMEA analysis, the following assumptions are made:
- A process flow chart is available that clearly outlines the entire process (and sub-processes) for each project.
- An FMEA worksheet has been created that reflects the process flow, listing failure modes for each step in both technical and functional terms.
- A new product family (with new features) requires a new FMEA process. One FMEA worksheet is not acceptable for multiple projects although obviously a large percentage of failure modes will be common.
- FMEA is used for the entire life of the product (until the end of life is reached and customer returns/RMAs are no longer accepted).
FMEA Kick-Off Meeting - Calculating an RPN
Now, we’ll take a look at an example of how to calculate an RPN in an FMEA meeting.
Your FMEA worksheet has been created and based on existing and previous failure info (from historical projects); you have identified all possible failure modes for every step in the process. You are now ready to have your first FMEA meeting and this meeting should have knowledgeable representatives present (at least one from each department involved). Involved departments will have failure modes directed to them for corrective action. If your department has no pending failure modes for action, you are very lucky indeed and can skip this meeting.
1. The first FMEA step in a new project is to complete the severity information.
For example: Failure mode is “missing component,” technical information is that “heatsink HS1 is missing.” The failure effect (or end user issue) is that “the unit overheats” or “user burns fingers” (not too typical this one, although a number of laptop manufacturers had to recall batteries due to risk of explosion, so it can happen). Obviously this is a critical error and is given the highest ranking 10 since injury is possible.
2. The next step is determining the occurrence frequency.
Continuing with the previous example, let us assume that this heatsink is manually soldered on, making the root cause a human error at two points in the process, rework area and inspection. How often does this occur? If less than 1% then assign 1 to it, making note of the fact that the issue is not fully removed from the process.
3. Finally, how easy is it to detect this problem?
As there are a number of possible steps where this failure can be detected (by automated or manual inspection after rework) and also during final audit, this particular failure can almost certainly be detected before shipment. We then assign 1 to it.
Image Credit - Courtesy of Flickr
In our example above, multiplying the three numbers 10 x 1 x 1 gives us a Risk Priority Number (RPN) of 10 and indicates low risk but still indicates the necessity of checking for the issue. Issues of this nature, where automated methods are impossible, are always difficult to fully resolve.
Your first FMEA meeting will be the longest as all the failure modes need to be assigned RPN numbers, but follow-up meetings will only deal with new failure modes or solutions to existing failure modes. FMEA meetings should typically be held at least once a week.
While FMEA definitely has its benefits it is mainly a reactive process and apart from offering team members a chance to forecast future issues, it cannot aid in preventative tasks. It is, to put it simply, a growing list of what can go wrong and does not force corrective actions. Within the automotive industry, an improved version is implemented that defines what you must do correctly and error-proof according to these criteria; strategic error proofing (SET).