Our company builds an AutoML framework that allows machines to operate in autonomous mode. While equipped with the ability to learn and operate on their own – the machines are allowed to collect data and learn during operation. To extend, update and improve all the information they possess about the external environment.
The machine does not require human participation, because it is able to evaluate if it has seen a specific object or its part before – to attach new pieces of information to existing ones or create completely new memory items.
Just like people have separate neural networks for task-oriented activity and the default mode that favors thinking, evaluating the options, and free exploration – the machine is also able to decide if it needs more data about specific objects or locations or should quickly perform an urgent task.
In order to realize zero-shot learning (assimilation of completely new information without any supervision), the machine needs to use existing perception models and reason through elimination (like children or even some animals do in cases of interaction with previously unseen entities).
Perception can therefore be seen as a process with two elements:
· passive perception (assimilation of visual features of known objects) with feature extraction and recognition
· active perception (assimilation of new information through reasoning about the context of the situation) with active perception models that store information about object relations, co-occurrences, and spatiotemporal patterns
While we have emotional signals, driven by neurochemicals that motivate or discourage us to do something, force us to think or decrease physical activity – the same attributes can be used to drive machine’s autonomous decisions.
As currently, Reinforcement Learning techniques utilize only a single metric – reward, inspired by neurochemical dopamine – an addition of several other ones associated with internal memory activation, data processing inhibition, focus on sensory input data, data exchange with other agents or available battery level can strongly extend what a machine can decide to do, based on its status and specific state of the environment.
The machine should be able to discard most of the irrelevant information. If its current task involves recognizing a single class of object – it makes much more sense to construct a temporary binary classifier (object/non-object) instead of recognizing all elements in the scene.
The machines should model not only the world but also maintain a self-model that stores relations of own physical equipment and software features to the external world.
We are not able to allow machines to reprogram themselves yet. But we are able to let them choose from a set of predefined actions, input data sources, and external wireless connections.
The goal of providing autonomy to machines can’t stop in the middle of the path. It is not only about adapting neural networks to new input samples or finding the optimal network structures. It is about building complex relations of information, actions, and results to allow machines to perform their jobs with the highest quality possible.
Self-supervised and unsupervised learning
Neural networks are a form of associative memory, between input and output. And their variants differ on how to realize the mapping. Besides feed-forward neural networks, the recurrent networks also aggregate current and prior input to get the outcome and generative networks also store transformational patterns formed during supervised training to transform the provided input data according to them.
In traditional Machine Learning techniques, the process of training neural networks is supervised, i.e. the solutions have access to both input data and correct answers – to compute the compressed mapping between them – a model that would be then used to perform inference with the stored input/output patterns.
There is no learning without direct information about what a specific piece of data means (labeled data). Even if this is just a new example of an already known item. The outcome is dependent on the quality of the training dataset. And there is no real-time learning without a separate phase of re-training or fine-tuning the model.
Preparing an automated version for unsupervised or self-supervised (the model is its own supervisor, after getting a small training dataset) is challenging.
In the case of sensory input processing (e.g. vision, speech) – it is necessary to define where the data of interest is located (object boundary, word division) and how to store it in the memory. When the label is not provided – the solution needs to compare it to already known items. And then decide – whether to attach a new piece of data to an already existing category (class) or create a new one.
The object boundary can be predicted by analyzing raw visual data, based on depth continuity and color consistency – as single objects or their parts have continuity in 3D space and usually look similar. The final result might be refined with edge data that suggests the potential object boundaries.
Another challenge is that the object can be labeled with multiple different labels. For example, a dog that is visible in the camera view – can be recognized as just a dog, a specific breed, or labeled with an individual name.
This situation requires a hierarchical categorization method and is similar to other examples as well. The neural architecture needs to gradually build ‘prototype’ representations of high-level (dog) and mid-level (breed) objects and then store ‘exemplars’ of individual entities (name of the dog).
There may be multiple hierarchical categories like we could imagine in the case of people recognition. When we see a person in the camera view – she or he may be categorized with many possible criteria: age, sex, occupation, nationality, race, etc.
The neural architecture needs to build ‘prototypes’ – generalized representations of each category/sub-category and also store individual examples.
Fortunately, the categories of self-organizing and prototype-based networks, based on comparing input and stored data similarity, provide an opportunity to do exactly that. By storing similar data in sets, it is possible to define which features are common (low variance) and which are not (high variance) characteristics of a specific set. Therefore, it is possible to automatically categorize data in a hierarchical manner – with or without labels, just based on the data values.
In fact, the supervised learning approach does very similar things (groups similar data features and discards the rest – by compressing useful feature representations), but the algorithm that is commonly used (backpropagation) requires an attached label or desired output to work.
When we store representations of ‘prototypes’ and ‘exemplars’ the data can self-supervise itself when the label was previously provided (semi-supervised approach) or a solution can just store data without a label (unsupervised approach) – that can be described later if needed.
AGICortex’s technology combines the power of deep networks with sophisticated external memory. We are able to perform a quick and reliable inference, but also provide additional benefits of real-time learning, transparent memory content, explainable decision models, and more.
Many of these features rely on the Cortex – the memory structure that is heavily inspired by the interactions of multiple components of a biological brain.
Through this design, machines are able to self-correct, hierarchically categorize knowledge, or even show signs of curiosity.
The future of Machine Learning lies on a common path with neuroscience. The most powerful AI systems will probably not be built exactly in the same way as the brain is – but it will perform similar operations.