Translation of a post published on our French-speaking blog in September 2018.
More transparency on IT incidents
Since August 15, 2018, British banks have been required to indicate on their websites the number of IT incidents that have caused interruptions to payment services. According to Les Echos article of August 17, 2018, the five largest banks in the United Kingdom reported 67 incidents that caused disruptions to their payment services. For its part, since the summer of 2017, the European Central Bank has required the 130 banks in the EU to make major IT incidents public. These measures are primarily aimed at protecting customers from a critical incident that could have significant financial consequences for individuals and businesses. This transparency obligation also has a direct impact on the reputation of banks. Improving the quality of applications and information systems is becoming a major strategic challenge for banks.
The point of view of lean management
This challenge is reflected in the following operational objectives:
- Reduce the impact of incidents on quality and service rates to improve customer satisfaction and reduce risks.
- Accelerate the resolution of incidents to devote more time to value-creating projects.
An effective way to reduce incidents
I have been working on the subject for several years. Recently I worked as part of an IT team of around thirty people specialised in a key banking process in a major European bank. The situation was quite delicate. The projects were coming out in slow motion despite the strong commitment of the employees. The morale of the team was low.
Thanks to this approach, the team managed to reduce the stock of incidents by 80% and the volume of new incidents by 50% in 2 months. The non-quality of the corrective measures, i.e. the fact of having to go through the process several times to resolve an incident, fell from 45% to 5%. The number of implementations in production was multiplied by 5. Customer satisfaction has gradually increased from a score of 6/10 to 8/10.
The initial situation
The team was confronted every day with new incidents that had a critical and direct impact on the client.
On a regular basis, the production manager had to set up a task force to resolve a major incident, explain the situation to senior management and apologise to the bank’s customers. Everyone was constantly on the alert waiting for the next major incident. I remember a discussion during which he told me that working conditions were no longer tenable for his teams, who were constantly extinguishingt fires and unable to meet project commitments.
The situation was indeed very difficult. The team was facing 2 incidents a day. Under the pressure of the project, combining both change and run, everyone was doing their best to repair the effects of the incidents without correcting them definitively. As the real causes were not addressed, the stock of incidents was constantly increasing and reached the level of 68. The team was putting 3 incident corrections into production every month. The correction time was approximately 6 months.
So this was the situation I discovered at the beginning of the team’s coaching.
How did I go about it?
1 – Build a stable team
The first thing we set up with the production manager was the creation of a small team with all the necessary technical skills. This team consisted of 7 people, one manager, two analysts, two developers and two testers. This reorganisation, a real managerial decision, had no impact on the budget.
2 – Develop collaboration by seeing and understanding together
First of all, with this team we built a visual management to reveal quality and waiting problems in the resolution process. This allowed us to each have the same level of understanding of the situation, to see the total number of incidents to be resolved, to identify at a glance what stage of correction the incidents were at and to understand the difficulties in resolving them.
3 – Accelerate production
Together with the team leader and the business, we then identified the first 10 most important incidents for the client in the stock. Together with the team, we chose the order in which we were going to deal with them, giving priority to the simplest incidents to resolve. Once this new organisation was in place, we met every morning in front of the visual management to understand the obstacles to correcting the incidents chosen the day before and to agree on the day’s commitment: which incidents we were going to make available to production.
Having a tangible list of 2 or 3 corrections to be delivered each day, shared by all, allowed us to focus collectively on the obstacles that could prevent us from reaching this objective. Each obstacle was clearly identified and highlighted in a “red bin” located close to the process step concerned. As soon as an obstacle was identified, if the employee responsible for the operation could not find a solution, the problem was then taken care of by the team leader.
It must be understood that Lean tackles the real causes of the difficulties observed and has nothing to do with brainstorming on the process. What we propose to tackle with problem solving is the reality and not a preconceived idea of dysfunctions.
4 – Improve operational efficiency
In most cases the manager was able to help his teammate or find a workaround. This technique considerably speeded up production by removing a large part of the waiting time in the process.
Together with the manager and the team we then systematically analysed each of the obstacles identified in the red bins. For each of them, one by one, we started a problem-solving process using the lean technique of the PDCA (Plan, Do Check Act), a scientific approach to problem-solving. This has enabled each employee to improve his or her business expertise or to develop totally new skills. Thus, analysts have been able to develop tester skills and testers have been able to develop analyst skills. The testers learned how to automate certain tests with PowerShell. Analysts wrote standards to describe the work to be done by developers.
5 – Become aware of the effect of incidents on the customer
In order to better understand the impact of non-quality on customer satisfaction we have set up two rituals with the team leader:
- As long as there is stock, define with business the 10 incidents to be corrected as a priority and understand why this is important from the customer’s point of view.
- After each incident resolution, call the customer to ask for their level of satisfaction and monitor its progress on an ongoing basis.
In this way we have developed a very strong customer culture within the team.
A transformed team that continues to improve
Three months after my intervention, the team switched to a change activity. In fact, it managed to completely eliminate its stock of incidents and stabilise the system by reducing the volume of new incidents to only 2 per month. The team continued to apply the lean principles discovered on incident handling and continued to accelerate by tackling first the small changes which were quicly implemented. It is now focusing on project functionalities. This team has become a reference for the bank.
The production manager was delighted:
- The culture of firefighter mode has completely disappeared.
- His office is no longer the crying office.
- He is no longer woken up during weekends to manage crisis situations with his teams, senior management and clients.
- He can now devote himself to the development of the information system and to innovation.
This approach is radically different from the ITIL approach and it enables the quality of service of information systems to be rapidly improved. In concrete terms, it responds to the requirements of European bodies concerned with providing better service to customers in the banking sector.
In conclusion, the implementation of a lean approach to reducing production incidents offers exceptional results every time and allows the CIOs to free up capacity initially allocated to the RUN to strengthen the CHANGE activities.
In doing so, the “firefighting” culture disappears and working conditions improve significantly. The teams are more serene and can devote themselves to developing value for more satisfied customers.
The important point: there is no fatality on incidents. In 100% of cases, with lean management, we manage with the teams to reduce the volume of incidents by 2, 3 or 5 and to eliminate stocks.
This post was written by Pierre Jannez and translated to English by Paul Gette.
To get a better understanding of how lean can help you reduce your IT incidents, feel free to get in touch with Pierre: firstname.lastname@example.org – Phone: +33 619 053 493