When you read about Machine Learning (ML), and what this concept represents, everything may seem idyllic and simple - you have some data, you pass it on to an ML algorithm, and voila, magic happens. You get a model that can predict the future. Not long ago, I would’ve said: "Not so fast, there is a LOT of manual work behind this story, and you can't simplify ML just like that." However, today, when Automated Machine Learning (AutoML) is a reality, I would’ve to think twice before saying this, and maybe even believe that magic does exist.
So, what exactly is AutoML?
AutoML stands for Automated Machine Learning, and represents the process of automating some, or even all, parts of the machine learning pipeline. Such a pipeline consists of many steps: collecting and processing the data, building, training, and evaluating the predictive model; optimizing various parameters; summarizing and visualizing the results; and finally deploying the model. These are very high-level ML steps that are repeated until we are satisfied with the solution, and behind them lies a lot of statistics, mathematics, analysis, experimentation, programming, debugging, brainstorming, documentation, and knowledge about ML. As a data scientist, I was thrilled to find out that some of these steps can be automated with AutoML.
AutoML is intelligent due to the heuristic or probabilistic approaches behind it, as well as the elements carefully designed by experts. AutoML knows when to start and end the automation process, how to assess its performance, and learn from the experience. When considering choosing among all of the possible combinations of choices, recent trends show that AutoML solutions are moving away from the brute-force approach – instead, they are designed to be more sophisticated and choose wisely the most promising combinations.
With AutoML, is there still a need for data scientists?
No worries here (at least for now). By automating repetitive and time-consuming tasks, AutoML allows data scientists to not only increase productivity, but also to focus on tasks related to human creativity and intelligence. Furthermore, it enables them to apply domain knowledge, address ethical issues, and create business value – the necessary parts of any Artificial Intelligence (AI) project, which are more challenging to automate. Data scientists are very fond of automating repetitive tasks, because it allows them to work on new and exciting problems again and again, during which they can unleash their creativity. Therefore, in general, data scientists are still very much needed, but their focus is realigned to more exciting and innovative tasks, and they greatly appreciate such a change of direction.
Apart from professional data scientists, other people also benefit from AutoML. For example, beginners and citizen data scientists can now quickly create high-quality machine learning models, without getting into the technical details. In this way, machine learning is made more accessible for non-experts, which is an enormous advantage due to the current lack of data scientists at a global level. Besides, AutoML benefits decision-makers, by reducing the time required to prepare the solution, and we all know how important this is in today's agile world.
Although AutoML is powerful and very helpful, to the best of our knowledge, it does not support all possible scenarios and aspects that may occur in an AI project. Where challenges tend to arise is when it comes to dealing with unsupervised machine learning (when data is not labeled), the handling of complex data, or incorporating specific domain knowledge. In addition, an AutoML solution may have its own set of parameters that need to be manually adjusted using expert knowledge and experience. In these cases, human intervention may always be required, so it’ll be interesting to see how these challenges are tackled in the future.
Which AutoML tools are available?
There are many great AutoML solutions available today, but before we highlight some for consideration, it's important to ask what we’re trying to automate and under what conditions we want to perform our experiments. There is a range of solutions which differ in what they offer:
- Some specialize in particular automation (e.g., optimization of the elements of a neural network), while others try to automate the entire ML pipeline;
- Some are free and open-source, while others are commercial;
- Some take the form of code libraries, others (also) provide the graphical user interface (GUI), etc.
The following figure shows some options, but note these are only a subset of all available AutoML solutions currently on the market.
With so many options, where is AutoML heading?
After studying this topic and analyzing many AutoML solutions, here are some interesting trends that I think are of significance:
- No brute-force
AutoML tries to automate the ML pipeline and find the optimal set of choices, whether that means choosing the right set of features, the machine learning algorithm, its parameters, etc. Each step offers different options – when combined, the scope of possibilities can become extremely large, and it can be costly to explore it thoroughly. Indeed, the traditional grid search, which brute-forces all combinations to find the best one, is still not forgotten. Yet, nowadays, many solutions try to sail intelligently through the search space by using, for example, Bayesian search or evolutionary algorithms.
In recent years, much attention has been paid to creating AI/ML solutions that people can understand, not only in terms of the connection between inputs and outputs of a predictive model, but also in terms of what really happens inside of it. Naturally, the quest for knowledge and understanding has also extended to AutoML in relation to the aspects described above, but especially by making the entire AutoML process transparent so that the user can obtain information about the experiments after, or even while, AutoML is running.
The beauty of some AutoML solutions is their ability to adapt like a chameleon to the user's level of knowledge. For non-experts, there is a simpler, no-code, or default path, without many technical details. Meanwhile, professional data scientists have the possibility to go deeper into the intricacies, control the process, and tweak the parameters (which they absolutely love to do :)).
It's a very exciting time, and the future is even more so…
We live in a world where we witness the fast-paced development and evolution of some of the most amazing and creative technological solutions and products. One of them is undoubtedly the automation of ML with all of its various solutions, approaches, and sub-fields – you only have to look at Neural Architecture Search (NAS) to see one of the emerging and fascinating fields within this area.
AutoML is very powerful and helpful at different levels. However, it’s still just an assistant (albeit an amazing one), to a human who is the captain of a sailing ship on the wavy sea of machine learning. Nevertheless, with the rapid pace of technological progress and scientific research, I believe it’s just a matter of time before we’ll overcome many of the current challenges of AutoML. It's very exciting to think about what the future holds for AutoML development, and I look forward to writing more about it. Stay tuned!