top of page
  • Writer's pictureCraig Risi

When To Use Machine Learning (And When Not To)



This article first appeared in Snapt.


In previous articles, we looked at how machine learning (ML) works and how to choose the right learning model for you. However, just because we can use machine learning, it doesn’t follow that it is always the best solution for any computational problem. So, in this guide, we will look at some of the limitations of ML, when to use it, and when to avoid it.


Before we look at tips that will help you decide whether or not to use ML in your solution, it’s important to understand the limitations of ML technologies. Understanding where ML is not going to be effective will help us see where we shouldn’t be using it.


Limitations Of Machine Learning


Ethics And Responsibility

Ethics is becoming an increasingly hot topic in the field of ML and one that we are still learning so much about. We are also slowly moving into the stage called “dataism”, where humans trust data and algorithms more than their personal insights.


This raises questions about the ethics of ML algorithms and systems, the trust that we inevitably put in them, and the consequences when those ML systems may fail.


For instance, consider a self-driving car that is the cause of a fatal collision. Who do we blame in this situation? Is it the responsibility of the company that designed the car itself, or is it the driver’s fault because they trusted the car with their decisions and didn’t intervene at the appropriate time?


Many such questions remain unanswered and depending on the laws and regulations in your particular industry, ML might not be the right tool for your problem.


These things are evolving, though, and in the future, we might be able to select an ethical framework for our different learning models, which might help with these decisions.

Until then, however, there is an element of ethical ambiguity and liability that you need to factor in before deciding whether ML is for you.


Data

Machine learning systems need data to be able to train effectively, and depending on the model, many of them require significant amounts of data in order to be effective.


However, where that data comes from and whether users are aware of and happy with their data being used for this learning are increasingly important questions. Therefore, before we can solve the ML problems of the world, we first need to determine how to deal with and handle data in an ethical and legal manner.


Issues of data privacy and increased data regulation are important aspects for companies to consider. And while these are important issues that need to be addressed by introducing clearer and fairer practices, they do have an impact on the direction in which the ML industry is moving, considering the vast amounts of data required for many ML models to be effective.


So, with governments looking to lock down access to personal information, and rightfully so, this means that many companies may not have the data needed for their respective ML models to produce their desired results.


However, it’s not only about access to data but also about the quality of the data. We’ve all heard of examples where ML algorithms produced results that incorrectly used racial profiling to make certain decisions. And yet, it is not necessarily the models themselves that are to blame but often the quality of the data we feed a machine. If that data is limited or contains an inherent bias, then the models will simply replicate those biases.


Tackling these data issues is critical, but it’s also important to understand that if your organization doesn't have access to enough data or to data of a high enough quality, then perhaps you should not use ML in the first place.


Interpretation

As much as we like to believe that data is factual, it is not. All datasets require a level of context and interpretation—and the same is true for the results delivered by ML models.

We are increasingly relying on ML to make critical decisions for us, but how can we make sure that a model is interpreting the data and results correctly? We first need to understand how a system interprets data and its own results before we can begin to trust it with certain decisions.


Evaluating the criticality and complexity of the decisions that an ML system needs to make is crucial in considering whether ML is the right approach for a particular problem.


Deterministic systems

Machines can learn and extrapolate information from data quite quickly, but some systems are deterministic and are not always open to interpretation.


Consider a deep learning system evaluating some complex scientific problems in the domain of physics—problems for which scientists will often spend many months and even years evaluating the data to validate certain findings. An ML system might be able to give results in a short period of time, but it may not understand all the laws of physics and how they may apply to the problem.


So, even if it might be evaluating the data and producing correct results, further analysis of the data might be needed to produce the final outcome.


Reproducibility

Reproducibility is a growing issue in the ML field due to a lack of transparency for the code and testing methodology of the models being developed. New models developed in research labs are being implemented in real-world applications at a fast pace. However, these models can fail to perform in the real world despite their state-of-the-art performance in research papers.


The reasons for this can be biased data, incomplete data, or not understanding certain environmental, cultural, and political factors (which are often difficult to quantify in data).

In this context, reproducibility can help different industries and practitioners implement the same model and find any hidden problems sooner. A lack of reproducibility can prevent models from being assessed for bias, safety, and robustness.


Be aware of the environment in which you operate and the likelihood of it being affected by external factors that may prevent reproducibility.


It’s also important that, as a company, you evaluate a model and its results across other organizations that may use it to ensure that the answers you get are consistent. Otherwise, there is a chance that the lack of consistency and reproducibility of results is an indication that your environment may not be suited to ML.


When You Should Use Machine Learning

Despite all the limitations of machine learning, under the right conditions, it is undoubtedly the best choice.


Use machine learning when you have data that is sufficient, sorted, and labeled


The most important thing to do is ensure that before you jump into the ML game, you understand your existing data and empower your data engineers to shape this data so that you can determine where ML can be of most benefit to you as an organization.


Too many companies approach ML from a top-down perspective, trying to get ML teams to retrofit ML solutions to the data to achieve corporate outcomes. Machine learning decision-making is best done using a bottom-up approach and the right engineers to better understand the data available to an organization, thus shaping a more effective strategy for using ML.


Data scientists can either provide you with better ways of using data to leverage ML more effectively or identify gaps in data to achieve these aims. They then work with your engineering team to see how data can be gathered/tracked better using existing applications to then allow for the use of ML.


As a company, it’s important to empower your data engineering team to come up with the right strategies and to support it in how it can better prepare your organization for ML.


Use machine learning when you cannot code the rules

Many human tasks (such as recognizing whether email is spam or not) cannot be adequately solved using a simple (deterministic) rule-based solution. There are simply too many factors that can influence the outcome, which makes it difficult to find any logical, code-driven way of doing this effectively.


When rules depend on too many factors and many of these rules overlap or need to be tuned very finely, it soon becomes difficult for a human to accurately code the rules. This is when it makes sense to look to ML to solve your problems.


Use machine learning when you cannot scale

Coded algorithms might be more effective than ML at solving certain problems at small scale but often perform poorly when required to scale to millions of interactions with increased variability between them.


This is where we often need human interaction to help sort through the many variations and provide more customized responses; however, this too has limitations and cannot be scaled without additional human resources.


By using ML, you can adjust to perform at scale while also allowing your ML systems to cater to those variables to which coded solutions may not apply. Machine learning solutions are great at handling large-scale problems and making decisions when lots of data is required.


When You Should Not Use Machine Learning

It should be clear by now that there are current limits to the types of problems ML can solve for us, though no doubt we may find solutions to many of these problems over time.

There are also instances in which even though ML doesn’t have any limitations, it is simply not going to be the right solution, in which case you should stick to a traditional software development approach to help you achieve the desired results.


Don't use machine learning to solve simple problems

Machine learning, specifically deep learning algorithms, is useful for finding complex relationships and hidden patterns in data that consist of many interdependent variables. For less complicated problems, if the rule-based system has a performance comparable to that of an ML system, then it is advisable to avoid the use of an ML system.


Don't use machine learning without labeled data and in-house expertise

Most deep learning models require labeled data and an expert team to train the models and put them in production. It is advisable not to use deep learning algorithms to deliver projects if you don’t have enough labeled data and a dedicated team.


Many companies are not able to make a success out of their ML ventures because they have not understood the criticality of labeled data and have not put the right people in place to shape their data in such a way that it can be interpreted correctly by the respective ML models.


If you are unwilling to invest in this or if it is not possible to label your data, then you may need to forego ML and stick to your traditional software development systems instead.


Machine learning is not for everyone, and that is okay

Machine learning might be one of the most exciting technologies that every company wants to be on board with or fears missing out on.


But the reality is that with the industry still in its infancy in many ways, there are still lots of limitations on what can be achieved by the technology. And until we correctly understand how to counter these limitations, it is likely that your ML ventures may only prove to be a waste of time and money.


We can continue to solve many problems without using ML, and it is perfectly okay to choose to wait before making the leap to the world of ML.

Thanks for subscribing!

bottom of page