AI is not a new term. It is multiple spans old, starting around the early 80s when computer scientists designed algorithms that could learn and simulator human behavior.
On the learning side, the most significant algorithm is the neural network, which is not very successful due to overfitting (the model is too powerful and there’s not enough data). However, in some more specific tasks, the idea of using data to appropriate a function has gained important success and this form the basis of machine learning today.
On the imitating side, AI has concentrated a lot on image recognition, speech recognition, and natural language processing. Experts have been spending a wonderful amount of time making features like edge detection, color profiles, N-grams, syntax trees, etc. Nonetheless, the success has been reasonable.
Suggested Read: Difference between machine learning and artificial intelligence
Traditional Machine Learning
Machine learning (ML) techniques have played an important role in prediction, and ML has experienced multiple generations to have a rich set of model assemblies such as:
- Linear regression.
- Logistic regression.
- Decision tree.
- Support vector machine.
- Bayesian model.
- Regularization model.
- Ensemble model.
- Neural network.
Each of these predictive models is based on certain algorithmic structure, with parameters as tunable knobs. Training a predictive model involves the following steps:
- Select a model structure (for example, logistic regression, random forest, etc.).
- The knowledge of algorithm will output the best model (i.e. a model with precise parameters that minimize the training mistakes).
Each model has its features and will perform well in some tasks and badly in others. But usually, we can group them into the low-power (modest) model and the high-power (complex) model. Select between different models is a very tricky question.
Also Read: Top artificial intelligence technologies
Usually, using a low power/simple model is favored over the use of a high power/complex model for the following reasons
- Until we have the huge processing power, training the high power model will take too long.
- Until we have a massive amount of data, training the high power model will cause the overfitting problem (since the high power model has rich parameters and can fit into an extensive range of data shape, we may end up train a model that fits too exact to the current training data and not widespread enough to do good prediction on future data)
In other words, if we don’t have sufficient processing power and enough data, then we have to use the low-power/modest model, which requires us to spend important time and energy to create suitable input features.
Return of the Neural Network
In the early 2000s, machine processing power has increased tremendously, with the advancement of cloud computing and massively parallel processing infrastructure together in the big data era where a massive amount of fine-grained event data being collected. Nevertheless, although both of them are very powerful and provide non-linear model fitting to training data, data scientists still need to carefully create features to achieve good performance.
At the same time, computer scientists have revisited using many layers of the neural network in doing these human mimicking tasks. This gives new birth to the DNN (deep neural network) and provides a significant breakthrough in image classification and speech recognition tasks. The major difference in DNN is that you can feed the raw signals (for example, the RGB pixel value) directly into DNN without creating any domain-specific input features. Through many layers of neurons (which is why it is called a “deep” neural network), DNN can “automatically” generate the appropriate features through each layer and finally provide a very good prediction.
DNN also evolves into many different network topology structure, so we have CNN (Convolutional Neural Network), RNN (Recurrent Neural Network), LSTM (Long Short Term Memory), GAN (Generative Adversarial Network), Transfer Learning, Attention Model… etc.
Another key component is about how to imitator a person (or animal) learn. Imagine the very natural animal behavior to observe/act/reward cycle. A person or animal will first understand the environment by sensing what “state” he or she is in. Based on that, he or she will pick an “action” that brings him or her to another “state.” Then he or she will receive a “reward.” The cycle repeats until he or she dies. This way of learning (called reinforcement learning) is quite different from the curve-fitting approaches of traditional supervised machine learning. Reinforcement learning has gained tremendous success in self-driving cars as well as AlphaGO (a chess-playing robot).
Reinforcement Learning also provides a level integration of prediction and optimization because it preserves a trust of the current state and possible transition chances when taking different actions, and then make decisions which action can lead to the best result.
Deep Learning + Reinforcement Learning = AI
Featured article: Top 5 jobs in AI
Associated to the classic ML technique, DL provides a more powerful forecast model that usually produces good predictions. Compared to the classic optimization model, reinforcement learning provides a much faster learning mechanism and is also more adaptive to changes in the environment.