Social Insects - AI for Big Data
Introduction
The primary goal of Artificial Intelligence (AI) is to transform
electronic brain depicted in Figure 1, into a human-like brain as in
Figure 2, in order to gather data, process it and create models
(hypothesis), predict or influence outcomes, by relying on huge data
sets, which will improve human life.
The availability of Big Data has accelerated the growth and
evolution of AI and machine learning applications. Data volumes are exploding; more data has been created in the
past two years than in the entire history of the human race.
Here is a quick comparison of AI before and with with Big Data:
Here is a quick comparison of AI before and with with Big Data:
AI before Big Data
|
AI with Big Data
|
Availability of limited data sets (MBs)
|
Availability of ever-increasing data sets (TBs)
|
Limited sample sizes
|
Massive sample sizes resulting in increased model
accuracy
|
Inability to analyze large data in milliseconds
|
Large data analysis in milliseconds
|
Batch oriented
|
Real-Time
|
Slow learning curve
|
Accelerated learning curve
|
Limited data sources
|
Heterogeneous and multiple data sources
|
Based on mostly structured data sets
|
Based on structured / unstructured and semi-structured
data sets
|
The term Big Data represents growing volumes of data. Along with
volume, the term also incorporates three more attributes, velocity,
variety, and value:
Ants work in a swarm mode, they move in a coordinated line one behind another. They can collect and carry foods much larger than their size all the way to their nests, and can form bridges to cover large gaps.
- Volume: The ever increasing and exponentially growing amount of data. Ex. a flight traveling from one point to another generates half a terabyte of data
- Velocity: The amount of data generated with respect to time and the need to analyze that data in near real-time for some critical operations
- Variety: The variety of data formats and structures. Not all data is structured in databases, especially social media data
- Value: The data is only as valuable as its utilization in the generation of actionable insights
Note:
- In 1992: 100 GB / Day produced
- In 1997: 100 GB / Hour produced
- In 2002: 100 GB / Second produced
- In 2013: 28,000 GB / Second produced
- In 2018: 50,000 GB / Second produced
Swarm Intelligence
Ants Swarm - Photo by Poranimm Athithawatthee from Pexels |
Ants work in a swarm mode, they move in a coordinated line one behind another. They can collect and carry foods much larger than their size all the way to their nests, and can form bridges to cover large gaps.
Considering that their brains are nowhere close to the human brains
in terms of neurons and connections, when they work in a group, they
can achieve their goals intelligently; thus, they are called
social insects.
Those social creatures have prominent characteristics:
- they live in colonies
- they have division of labor
- they have strong group interactions
- they are flexible
Collectively, they achieve intelligence, and this type of phenomena
has prompted researchers to achieve what is called Swarm
Intelligence (SI).
The SI system is composed of a colony of agents (individual ants)
which are also called boids. Each boid interact with its neighbor and
environment (context) to achieve individual goals, and together, they
achieve a larger goal without governance or centralized
authority.
In AI language, "Swarm intelligence is a collection of intelligent systems inspired by the collective intelligence of a group. This collective intelligence is achieved through the direct or indirect interactions of agents that are homogeneous in nature, yet co-operate with each other in their local environment without being aware of global context or pattern."
If you will build your SI-based system, there are three fundamental concepts that your system should at minimum comply with:
In AI language, "Swarm intelligence is a collection of intelligent systems inspired by the collective intelligence of a group. This collective intelligence is achieved through the direct or indirect interactions of agents that are homogeneous in nature, yet co-operate with each other in their local environment without being aware of global context or pattern."
If you will build your SI-based system, there are three fundamental concepts that your system should at minimum comply with:
Self-Organization (SO)
SO is the property of SI systems that determines the underlying
cooperation among SI agents to achieve a desired collective
behavior. The agents are not aware of any global patterns or
behavior, however, the global behavior is emergent out of individual
functioning of agents.
Here's an example of self-organization: The ant colony as a whole
is always striving to construct a nest that is safe from harsh
environments and organize individual ant activities so as to locate
the source of food that is nearest among all the available food
sources. The ants apply a very unique and smart algorithm for
locating the nearest and most abundant food source. Once the shelter
(colony) is established, the most important aspect for the colony's
survival is to find the nearest and most abundant source of
food.
Stigmergy
The rules need to be reactive to the changes in the environmental
state and the agent should be able to adapt to the changes
autonomously and continue to perform its function.
Here's an example of stigmergy:
an ant moving on a path to the food source and there is some
water poured on the path. As soon as the ant encounters water on
the way, it starts looking for an alternate path based on the
pheromone(ant chemicals) signal. It may also traverse its way back
to the colony and then start over again on another path
autonomously (without any central control). At the same time, the
ant leaves traces for other ants to know that on a particular path
to the food source, there is trouble on the way. Other ants
immediately adapt to the change in environment based on the
previous ants' experience and modify their trajectories based on
the simple rules. The ants interact with each other without any
explicit communication, but only with the modifications in the
environmental state.
Division of labor
The individual agent within the swarm is extremely limited in its
capability to achieve the goal for the entire swarm. The natural
system applies division of labor with individual agents performing a
set of very specific responsibilities that contribute to the overall
success of the swarm.
For example, all the bees in a hive are not doing the same thing. There is a clear
division of labor within the bee hive based on the type of the bee.
The Queen bee is responsible for laying eggs, the male drones are
responsible for reproduction, and the worker bees build the hive and
work to get food for the entire population. They also take care of the
Queen bee and the drones by feeding them.Applications in Big Data analytics - Network Load Balancing
In the current form of networks, the big data computing framework is an enormous collection of computation nodes that are
distributed across the globe.
Two types of deployment exists: on-premises and on the cloud.
Cloud is a virtualized infrastructure and it is geographically distributed in various regions where the nodes (computers) exist, and they are controlled by a centralized unit that keeps track of all nodes operations.
However, the swarms do not have a central command and the agents work autonomously based on their rules, after which the agents adjust themselves to changes.
SI concepts can be applied in securing the infrastructure as well as the nodes and making sure they are fully balanced.
Without AI, we start by submitting a computation job to the master node, which in turn breaks it down into multiple chunks to be executed independently by the slave nodes, the jobs finish at different times and require different degrees of computation and storage. It may happen that the core compute load is not evenly distributed across the nodes.
With AI, we deploy the Ant Colony Optimization (ACO) model in the distributed computing environment, and the process will be:
Two types of deployment exists: on-premises and on the cloud.
Cloud is a virtualized infrastructure and it is geographically distributed in various regions where the nodes (computers) exist, and they are controlled by a centralized unit that keeps track of all nodes operations.
However, the swarms do not have a central command and the agents work autonomously based on their rules, after which the agents adjust themselves to changes.
SI concepts can be applied in securing the infrastructure as well as the nodes and making sure they are fully balanced.
Without AI, we start by submitting a computation job to the master node, which in turn breaks it down into multiple chunks to be executed independently by the slave nodes, the jobs finish at different times and require different degrees of computation and storage. It may happen that the core compute load is not evenly distributed across the nodes.
With AI, we deploy the Ant Colony Optimization (ACO) model in the distributed computing environment, and the process will be:
Ant Colony Optimization - ACO |
- Reproduction: the controller checks the platform periodically and generates artificial ants (agents) based on the load on the cluster nodes. If the nodes are overloaded or underloaded, new ants are generated for carrying the message across.
- Exploration: the agents are independently in charge of finding the nodes that are overloaded. They can trace the network operating parameters and leave a pheromone (incremental counter) for other swarm agents to get notified.
How did the ants achieve load balance:
- Agents calculate and quantify the (under or over) load at which it is connected
- Start in the direction of random node to check for load balancing suitability
- Backward artificial ant is generated when a candidate node is found. The agent updates the incremental counter for trail tracing
- Calculate the collective load balancing requirement based on the candidate nodes found by the agents
- Balance the cluster load
Practical applications:
- MASON library: java-based multi-agent simulation API library
- Opt4J library: simple and intuitive GUI for loading meta-heuristic optimization models that can be applied for evolutionary algorithms
Resources: Artificial Intelligence for Big Data by Anand Deshpande
and Manish Kumar