A Journey to Reliability Excellence: The Effects of Making Work Go Away

Digitally Managed Assets

By Bruce Hawkins, Senior Maintenance and Reliability Consultant, PCA

As I mentioned in the first blog in this series, we achieved some very significant business results. My plant reduced maintenance spend by $22 million per year just by making work go away. This blog describes how we did that.

One of the things that we did right was that we created inter-site Reliability teams, where the rotating equipment folks, and instrument and electrical folks got together on a periodic basis to share lessons learned and share failure history. So that that we would continue to learn, we maintained corporate reliability support. We had a Rotating Equipment Engineer that would provide support as needed to the site Reliability Engineers. We also had a corporate E&I engineer to provide support as needed. Our ongoing SAP support was a critical element as well.

We were able to find some low hanging fruit early in the journey using RCM. For example, we had a chronic issue with a with a spray condensing system that was supposed to help pull a vacuum on a process vessel. We had trouble getting down to the right level of vacuum. During the RCM analysis, our experienced operators told us they used to clean out spray nozzles on that spray condenser periodically, but do not do that anymore. The result was poor spray efficiency, and vapor was carrying over to the steam ejectors that were only sized to remove the non-condensables, not excess vapor. We put the nozzle cleaning practice back in place and the problem went away.

During our journey, four of our plants were purchased by another company that brought the mentality that no failure is acceptable. They challenged us to do some level of root cause analysis on every single failure. We developed a three-level process where the level of rigor put into the analysis was based on the business impact of the failure. The high impact failures would have a cross functional team assigned to do the analysis. Reliability engineers were assigned to perform root cause analysis on the next tier of failures. Finally, we got the craft workforce engaged in the routine failures that happened day to day.

We implemented the root cause analysis at the craft level, adapted from the “seven cause category” approach described in the book Machinery Failure Analysis and Troubleshooting by Heinz Bloch and Fred Geitner (Bloch, H. and Geitner, F., 1999, Machinery Failure Analysis and Troubleshooting, 3rd Edition, ElSevier, Houston, TX, Ch 10). Their philosophy was that there are only seven ways that equipment will fail unexpectedly. By using a process of elimination, from looking at the failed components, talking to the operator, look at equipment history, spending only about 20 to 30 minutes of craft resource time, you can generally rule out five or six of those cause categories and zero in on the most likely one. We also set up all our cause codes in SAP to fit those cause categories. Then we trained the crafts at each one of the sites in using that method.

One of the sites was a chemical plant that had dozens of distillation columns, tanks, heat exchangers, fired heaters, miles of piping and a whole bunch of centrifugal pumps – over 1,100 centrifugal pumps. When the Reengineering project began, they started measuring mean time between failure for these pumps and found that the average life was nine months. That means they were going through about 1,400 pump replacements a year, which is about four a day. Day shift was installing the spare pumps and night shift was rebuilding them (and neither were being done very well).

They studied the failure cause category data in SAP found some patterns. They found that they had some design issues with the wrong mechanical seal for the service. Some pumps were of a flimsy design when they required a more robust design. They found that they had some assembly and installation defects. With that many pump replacements a day, it’s likely that some mistakes are going to be made in installation, it’s likely that some mistakes are going to be made in assembly. They implemented a precision centralized pump repair facility, where they did a precision job on repairing the pumps. They also ratcheted up expectations on field installation, attacking things like corroded foundations, correcting bolt bound situations, eliminating pipe strain and doing precision installations in the field.

We also found that there were a lot of failures caused by improper operations. In some cases, the operators were doing some things wrong around startup and shut down that resulted in pump deaths. The site implemented operator training program to teach operators how pumps work and how to properly care for them.

To summarize, Reliability personnel at the site analyzed the data provided by the crafts and designed and implemented a programmatic solution for each one of those issues. These solutions affected the entire pump population rather than a single installation. The chart below illustrates the dramatic improvement in pump reliability that was experienced.

As I mentioned above, we drove a lot of maintenance cost savings by making work go away. Our success with pump failures provides illustrates how much improved reliability can impact work. Pump failures and repairs decreased by more than 83 percent as a result of the program. Our ability to operate reliably also improved dramatically.

Dramatic Improvement in Mean Time Between Failures for Pumps

Corporate leadership from the new owners also drove increased accountability, especially at the management level. We conducted regular Site Reliability audits. By this time, I was in a corporate reliability leadership role, and I reported to the same guy that the plant managers reported to. He was a bit impatient with progress at some of the sites in improving business performance. He made me dig into their SAP systems and find out areas where they had known deficiencies. We would both go to the sites, do this audit, highlight those deficiencies, and he could provide some expectation setting and that is exactly what he did.