One of the biggest gaps which I feel is holding many South African companies back in the tech space is a lack of understand in how to truly build for scale. And I’m reminded of this as many online retailers prepare themselves for Black Friday in what is inevitably going to be a fair amount of online crashes, performance issues and frustrated companies. And this reason for this is twofold. One is a clear lack of performance and load testing, which I’ve spoken on before – and the other is an inability to build to scale.
The problem with the latter is that while you can easily address your performance and load issues, re-architecting your solutions and infrastructure to be more responsive to scale, is an incredibly complex and time-consuming challenge. And probably why many companies are particularly slow to respond to it. Yes, some companies are addressing issues here and there to ready themselves for a mass consumer hysteria like Black Friday, there is still a likelihood that even if they can scale for these days, their software might still not be able to scale for future growth and development as they their customers and business needs change.
So, I guess to start – what makes software scalable? Well, essentially scalable software doesn’t just mean that it can leverage ramp up or down server capacity (mostly likely through the cloud) to adapt to current production capacity (though that is certainly helpful), but importantly how easy it is to make changes and deploy, ease of recovery from failure and also its ability to meet the needs of the underlying dependencies of different systems it needs to interact with. In the form of data requirements, APIs and any sort of testing or CI tooling. Essentially, the ability of your software to be responsive to change with as little effort as possible.
Something that is not easy to do without some concerted design effort and upfront thinking. Below are some tips which could assist you in this and in particular give you some technical ideas of what to think of to achieve a software design that is ready to scale:
1) Avoid single points of failure
The idea is to never have one of anything, but rather always assume and design for having at least two of everything. This adds costs in terms of additional operational effort and complexity, but provides tremendous gain in terms of availability and performance under load. Also, it forces the team into a distributed-first mind-set. If you can’t split it, you can’t scale it has been said by various people, and it’s very true.
2) Scale horizontally, not vertically
The most obvious option here is to use the cloud, but that’s not applicable for every scenario – however this same approach applies whether you are utilizing a cloud provider or managing your own servers (most likely both). There is a limit to how large a single server can be, both for physical and virtual machines. There are limits to how well a system can scale horizontally, too. That limit, though, is increasingly being pushed further ahead. Even databases are moving in that direction. Furthermore, the cost of (vertically) upgrading a server increases exponentially whereas the cost of (horizontally) adding yet another (commodity) server increases linearly.
3) API first
In addition to pushing work to the clients, view your application as a service with an API first. Clients these days are smartphone apps, web sites and desktop applications. If the API does not make assumptions about which clients will connect to it, it will be able to serve all of them. And you open your service up for automation, as well.
4) Keep the core of the application simple and modularise
Don’t design any software with unnecessary complexity, but rather keep it as light and distributed as possible. If possible there should be no core application, but rather several core microservices which will work together to form a core services and when needing to add new features to existing software, explore the option of writing new services to achieve his before make large scale changes to existing ones.
5) Cache everything. Especially your data
Caches are essentially storage of precomputed results that we use to avoid computing the results over and over again. Find ways of doing this as much as possible. This is something we do commonly with page rendering, but the biggest performance gains are actually found in caching your data. Depending on your application, users might not need the freshest data right away. In this world of wanting things now, I can see few companies wanting this, but many actually needing it. While some companies need data for every second, many could probably get away with less regularly refreshes and save massive amounts of performance by refreshing it periodically. The other snapshot to this is it decreases the amount of opportunity for data to corrupt and makes your overall data experience more consistent, which is better for your quality too.
6) Monitor
Software needs monitoring and updates to ensure proper operation over time. Whether its adjusting load balancing or server operation based on current demand or being able to quickly identify performance constraints or services that are down, monitoring is crucial to any company’s ability to respond to potential issues and allows them to mitigate these risks before they occur.
7) Design with testing and automation in mind
I’ve written on this before, so will simply reference the topic here. Needless to say, if we design our software with testable components, put a high focus on unit testing and ensure that automation forms part of its completion, it will allow your team to be able to test quickly and be more responsive to change.
8) Asynchronous rather than synchronous
Understanding asynchronous communication in real-if is easy. Mail a letter, package or document, and sometime later, it arrives. Until it does, we convince ourselves that it is underway, oblivious to the complexity of the courier system. A similar approach should be taken for our applications. We need to make use of queuing systems which can translate messages across services independently and at scale, but we also need to trust that these messages will be delivered and our services should behave like they are synchronous, even when at times they aren’t.
Did a user just hit submit? Tell the user that the submission went well, and then process it in the background. Perhaps show the update as if it is already completely done in the meantime.
9) Strive for statelessness
It may seem tempting to avoid inter-component communication by keeping track of certain state information in e.g. your application servers, but don’t. Unless you’re hosting purely static pages, you can never get away from state information. Rather make sure state information is kept in as few places as possible, and within components made for it. Web and application servers are not, but distributed key-value stores are. Keeping it there lets you treat your web and application servers as completely replaceable instances, which is ideal from a scalability point of view since your server fleet can much more easily be modified when any server is able to handle any request.
10) Plan for failure
Computer systems fail. Software fails. Hardware fails. Designs fail. Failure handling fails! Be prepared for failure, but spare end users from witnessing it too obviously. It reflects poorly on you, even if failure is inevitable. The key here is the removing single point of failures, but also the modular design that limits the impact of services should things go wrong and for critical system which must be up at all times –have remarkable DR systems in place all over even potentially running on a version behind, so that you can easily switch to a stable system should the need arise. Failure is inevitable, it’s how we respond to it that enables you to stand out.
The only way to scale is to ensure you have as many bases covered as possible, which is why there is no one solution to the problem – but rather many. Even just making changes to existing software to cater for one of these 10 points can be a challenge in its own right, so rather than try and tackle all of them – put together a plan on where the biggest constraints lie and tackle them one at a time .It might take a while, but the whole concept of software running at scale is long-term and so any investment made now that removes effort later is worth it in the end.
Comments