Craig Risi
- Aug 12, 2022
- 6 min read

Monitoring Applications At Scale

Developing and maintaining software applications is only a part of an application’s life cycle.

Applications also need to be kept operational and often updated and optimized to better cater to changing usage patterns and the demands of scale. And the only way to fully respond to potential system issues and outages or identify ways to better improve the software and user experience, is through effective monitoring.

More, than merely a buzzword, monitoring is a vital aspect of software operation that tells us a lot about application behavior and plays a vital role in a company’s ability to respond to both failures and changing usage patterns.

With software increasingly becoming more complex and needing to operate at a global scale, monitoring applications effectively is also becoming more difficult and organizations can not simply just rely on sticking some tool on a server to try and feed it the right information. Instead, with applications gathering increasing amounts of data, operating across multiple servers both on-premises and in the cloud, and across different browsers and devices, companies need to get smart with how they not only just collect and gather the required information but also shape it to provide monitoring that is both purposeful and meaningful. Allowing for responsiveness and information that is informative and can measure the right application behaviors the company is targeting.

And while monitoring is a complex topic that we will be exploring in more detail, it's perhaps best to start by highlighting a few best practices that will help you better monitor applications at scale:

Selecting the appropriate monitoring tools for you

The more things that need to be monitored across an application often mean that more tools will be required to get the right information. It’s important that you as a company do your analysis on the different toolsets available to you and understand what information they can provide, how customizable they are, and how best they could possibly be integrated into the rest of your application infrastructure.

Have a data management strategy in place

Monitoring requires data to be collected from multiple different sources and applications and so it is therefore vital that a plan is put in place to understand how this data is to be collected and retained. Data can either be collected into a central repository where the different monitoring services can then trawl through to provide the information that is needed or be distributed across multiple different locations based on their relevance and have the separate monitoring services read from the appropriate locations and have security more strictly monitored for each respectively. In saying this, whatever strategy is chosen, it is important that data required for monitoring should always be kept separate from actual customer data or data required for application usage.

It’s also important to understand how long data is needed and while some data might be worthwhile hanging onto to identify proper trends and understand application behavior under a specific period of time, other data might not need to be kept around for long and should be discarded quickly and appropriately.

It’s also important that data is appropriately backed up and archived to ensure that failures do not lead to loss of data.

Set proper metrics for monitoring

A key thing with monitoring is knowing what sort of metrics you should be monitoring to provide you with the information you need. And while different parts of the business will probably be interested in trying to measure different things to gain insight into the information that is important to them, below are a few metrics that all companies should look to ensure they have in place:

CPU and Memory usage – Monitor CPU usage to evaluate its effect on performance. This can be done across both internal servers that operate your application software and cloud VMs, to identify when they are coming under strain to prevent an outage and also better understand how your application behaves on them - and find possible opportunities for optimization.
Error rates – Track how often the app performance fails or degrades and where these issues are occurring so that effort can be made to reduce these over time.
Uptime – Track the availability of the application to verify the overall reliability and compliance with service level agreements. This does not just need to be done for critical aspects of the application but should include all APIs, databases, and serverless functions to ensure services can be restored whenever failures occur.
Response times – Check if the speed is affecting the performance of the application by measuring the average response time between different parts of an application. This will hopefully identify performance bottlenecks and allow for further optimization.
Request Rates – Monitor the traffic for your application for spikes in traffic, inactivity, and the active number of users. This tells you useful information about usage patterns and allows you to better plan your load and scaling needs appropriately.
Costs - This is especially important for companies operating on the cloud where usage of resources comes at an expense. The purpose of this is not just to prevent unnecessary spending or to catch scaling mistakes that will drive costs, but also better measure the ROI a particular application has for the organization.
The number of app instances – Monitor the number of app or server instances running simultaneously. You can scale your applications with auto-scaling to meet the actual user demand based on that number or find ways of better distributing the load.
User experience – How users experience any app is vital information. You can measure customer satisfaction or tolerance using a combination of SLA thresholds and Apdex scores. These can be used to improve aspects of design, cater applications to better suit what users are mostly doing on the application, and better understand what is important to the needs of your customers

Set up custom alerts and Notifications

With so many metrics and data to keep track of, you can’t rely on simply monitoring data to be able to react to certain application behavior. Making use of alerts and notification is, therefore, vital to be able to help teams to respond appropriately. Many monitoring tools allow for various amounts of preset alerts and notifications to help make things easier, but it's also important that custom alerts and notifications are also created that are relevant to you as an organization.

Applications may differ in various aspects, and different organizations may have things that are more important to them than others, so it’s important to understand these needs and then create and adjust these alerts appropriately. When performance issues arise, team members should be notified about them and understand their impact before the end-user notices anything. Additionally, certain APM tools enable you to create custom alert tools by defining thresholds on different metrics such as application response time, error rates, and ApDex scores.

In addition to notifications, some solutions even allow you to integrate these notifications with communication platforms like Slack which can help teams be more responsive to these notifications and ensure that they can be addressed as soon as possible.

Ensure metrics are used, analyzed, and have accountability

This might sound like an obvious thing to say, but many companies invest incredible effort in putting monitoring systems in place and gathering key data with the best of intentions and then don’t actually make enough effort to go through it and learn from the data. While alerts and notifications help teams address specific issues and incidents that need immediate attention, there is a lot that can be gained by going through trends and having regular sessions where monitoring data is analyzed more specifically to understand where applications can be improved, monitoring adjusted or perhaps where monitoring or systems can be abandoned entirely as the company is not gaining the full usage out of them.

Having dedicated people responsible for analyzing monitoring data, coming up with strategies and actions based on them, and reporting what is learned through them to the appropriate audiences is an important step in making the best use of your monitoring data and ensuring it is not wasted.

Don’t underestimate the effectiveness of monitoring

Monitoring is crucial to the successful operation of any application and so it's important that you as a company ensure that you have a monitoring strategy and plan in place before you are ready to deploy your software to the world. Otherwise, you will miss out on key learnings and not be in a position to ensure its availability to your client base.

CRAIG RISI