To O2, a key player in the Czech telecommunications market, online is the main channel to reach its end customers. Today, due to the unceasing efforts of competitors to drag potential unhappy customers over to them, customers are more demanding than ever, both as for the actual services and the way they are provided. The seamless operation of the Moje O2 (My O2) online self-service, where customers can check the status of their services, including billing, is therefore a basic prerequisite for success.
We talked to Petr Venera, Head of Supervision over the Online Systems Operations Division at PPF IT Services, how the Dynatrace application monitoring tool deployed since 2014 contributes to its provision.
What were your primary expectations in deciding to deploy APM for online self-service monitoring?
We were facing a communication mismatch between the IT operation department and the marketing and business departments. The departments often discussed with each other what exactly was a downtime, what was not a downtime, and why the information that something was going on did not reach the customer care department in time. A common, but not ideal, scenario in such cases is that individual departments, in an effort to have a perfect overview of the system status, procure their own system status monitoring tools, and thus try to resolve critical situations solely on their own. This approach often results in departments reporting and comparing inconsistent data from different monitoring systems, which are not necessarily always compatible. Instead of discussing the nature of the issue at hand, the whole discussion moves to the level of arguing about data compatibility and comparing individual results. A typical example is that a business monitoring tool reports a downtime. The business turns to the IT department, which looks at its logs and declares that it was not a downtime, but a failure of the monitoring tool, and so it goes on and on, wasting time and growing hostility between departments.
Therefore, our main expectation was to get a tool that would allow us to look at a single set of data. Another condition was that we must have a common monitoring, where each alert is simultaneously seen by IT that can immediately start addressing the issue, and at the same time the business and the customer care departments immediately learn about it and can respond accordingly.
In the end, you decided to pick a tool that can do more than you just described. What made you do that?
When we started to orient ourselves in the APM software market a bit, we compared the various options and it occurred to us that it would definitely be a shame to stay with an application that can only offer us monitoring. With the Moje O2 (My O2) self-service, it was imperative to look for a tool that would allow us to immediately analyse what actually happened. This means that it would direct us to the issue without having to alert database experts, operating system administrators, and application administration. If we didn't have Dynatrace deployed, we would be able to quickly find out, for example, that an application wasn't running, but not why it was happening. An analysis of why the issue occurred alone can take four to five people, often for many hours. Dynatrace, on the other hand, can quickly target this phase, so we know immediately where the issue is and where to start looking. At the beginning, we did not expect the required APM tool to come with such functionality, and it turned out to be a really significant benefit in the final round of the tender. So, was rapid analysis and focus on the issue a crucial criterion for implementation? Let's say it was an optimal mix of multiple decision-making factors.
The price was another important moment, too. Out of the three tools we finally considered, Dynatrace clearly offered the most, while not being the most expensive. After IT more or less decided to go for it, it was still necessary to defend our proposal before the decision-making committee, with a convincing use case. This is a bit difficult to do without real-operation data, but in the end we succeeded.
What were the main benefits of the deployment?
The business has gained much more control over the functionality of the Moje O2 (My O2) application, and issue analysis has become easier for us. Looking from a distance, I see another great benefit for developers here. Places that used to be very difficult for us to explore are easily accessible via Dynatrace dashboards. For example, if querying the SQL database slows down significantly, just look at the dashboard now and the whole issue can be detected in literally two clicks. Previously, the same work involved reviewing a lot of code. Just analysing the logs involved selecting all the database queries, sorting them by response time, averaging them, and then drawing a conclusion.
Were there any implementation issues?
On the contrary, we were surprised at how seamless the whole implementation was. One day a consultant from Adastra appeared and the next day we were already looking at the first data. What we did not initially anticipate was the need to purchase new server hardware. However, this handicap was a part of all considered solutions. Didn't any other issue occur in fact? In the post-implementation phase, of course, it took a while before we realized exactly what we wanted to measure and where the potential of Dynatrace could further be used effectively. We also spent some time getting everyone (especially people in operation and developers) used to actively solving issues with the assistance of Dynatrace. However, these are all psychological nuances rather than a software issue.
Are you really using Dynatrace today as you originally imagined?
Yes and no, the original plan was that by saving time in the analysis of application errors, we would save time and employees who deal with the analysis and subsequent solution. However, this is not how it works in practice. Dynatrace, on the other hand, keeps revealing ever new critical situations, which we would otherwise only detect much later, usually in a serious failure in front of a customer. Thus, the volume of work is the same in the end. The difference is that we are able to resolve individual failures faster and often much earlier than they grow into real problems. It may not be so visible at first glance, but this significantly improves the quality of the entire application, and thus the service for the target client.
Interested in learning more about this topic?
Are you interested in a similar solution? Contact us.
We will contact you as soon as possible.