Self-Assembling and Self-Organizing AI- The future of Machine Learning? [Breakdowns]

This can change the way we approach Deep Learning and Artificial Intelligence

Apr 04, 2023

Hey, it’s Devansh 👋👋

In my series Breakdowns, I go through complicated literature on Machine Learning to extract the most valuable insights. Expect concise, jargon-free, but still useful analysis aimed at helping you understand the intricacies of Cutting Edge AI Research and the applications of Deep Learning at the highest level.

If you’d like to support my writing, please consider buying and rating my 1 Dollar Ebook on Amazon or becoming a premium subscriber to my sister publication Tech Made Simple using the button below.

Help me buy chocolate milk

p.s. you can learn more about the paid plan here.

With Transformers, Deep Learning, and Large Language Models like PaLM, GPT-4, and LLAMA, it can be hard to find other kinds of Machine Learning to be impressed by. Especially, now that the entire world is discussing *that AI Letter signed by Elon Musk*, people have put a lot of attention on NLP and away from everything else. . However, there are many such ideas in AI that are often overlooked, and that have the potential to be a paradigm shift (don’t forget Deep Learning was itself overlooked at one point in time).

In this article, I will be sharing a variant of AI that has a lot of potential to shape the future of Machine Learning and Tech as a whole. In this article, I will be covering Self-Organizing AI, a unique and powerful idea. I will primarily be referring to the writeup, The Future of Artificial Intelligence is Self-Organizing and Self-Assembling by the amazing ML Researcher Sebastian Risi. He also has an amazing interview on Yannic Kilcher’s Youtube channel, which I would suggest listening to here.

Why Self-Organizing AI? What is the current problem with our Deep Learning Systems?

Before getting into an idea, it is helpful to understand the context surrounding the idea. What is the problem that your idea is solving, and where are the current solutions failing? This approach can help you develop a deeper appreciation for the concept and will make your learning process more fun.

So what is wrong with our current Deep Learning Systems? One of the biggest problems we see in our current systems is how fragile they can be. One research paper showed that we can fool State of The Art image classification networks just by changing just one pixel. Deep reinforcement learning agents (Gleave et al. 2019) completely break down if confronted with an unknown player strategy and even the vaunted Large Language Models are susceptible to delusion (caused by providing misleading text input).

This is actually a huge point of concern. This can lead to unsafe AI applications.

Self Organizing Systems seek to solve this issue. As introduced in the writeup-

… combines ideas from deep learning with ideas from self-organization and collective systems. In this first post, we’ll look at some of the developed approaches and the domains they have been applied to, ranging from growing soft robots and Minecraft machines to self-assembling modular robots, and creating more resilient and adaptive reinforcement learning agents. The merger of these ideas could ultimately allow our AI systems to escape their current limitations such as being brittle, rigid, and not being able to deal with novel situations.

Google’s amazing Pathways system, which can be used to handle a variety of tasks on a single network using multi-modal learning, sparse activation, and training for multiple tasks (all powerful ideas), can possibly be used to overcome these problems. However, it has over half a trillion parameters. Sebastian presents a solution that is much more cost-effective.

Artificial Intelligence Made Simple

Understanding Google's GPT-Killer- The Revolutionary Pathways Architecture

In one of my recent articles, I wrote about why Bard will be much more powerful than ChatGPT. One of the reasons I gave was their unique system for training models, Pathways. A lot of you reached out and asked me to cover the Pathways architecture in more detail. So that is what I will be doing. In this article/post, I will be covering the key design de…

2 years ago · 7 likes · 2 comments · Devansh

Now that you’re aware of why self-organization can be useful, let’s get into what it is and how it works.

Self Organizing Systems

Risi’s work takes inspiration from nature (really interesting how so many major breakthroughs come from here). They were interested in how, “groups with millions or even trillions of elements can self-assemble into complex forms based only on local interactions and display, what is called, a collective type of intelligence. For example, ants can join to create bridges or rafts to navigate difficult terrain, termites can build nests several meters high without an externally imposed plan, and thousands of bees work together as an integrated whole to make accurate decisions on when to search for food or a new nest.”

Yes, that is a raft made from fire ants. Source- Floating Fire Ant Rafts Are Horrifying, Dangerous, Really Cool

What is fascinating about these large-scale organizations are done by following very simple rules, without a grand external blueprint. Contrast this with human-engineered solutions, where we have a top-down approach, and things are built from rigorously laid out plans. This can make organization easier but is inherently less flexible and thus more fragile to changing circumstances. This is one of the reasons that Deep Learning Networks can be fragile. They will train for the data they are presented with but will not be able to adapt to new kinds of problems. This is also why ensembles outperform singular networks (and why Random Forests are my favorite model).

Self-Organizing systems don’t face the same problems. “Self-organizing systems are made out of many components that are highly interconnected. The absence of any centralized control allows them to quickly adjust to new stimuli and changing environmental conditions. Additionally, because these collective intelligence systems are made of many simpler individuals, they have in-built redundancy with a high degree of resilience and robustness. Individuals in this collective system can fail, without the overall system breaking down.” Take a look at the video above, where a robot can regrow an entire lost limb to start moving. Contrast this with our Deep Learning networks, which break when we change one pixel.

Source- My genius brain and wayy too much time looking at memes.

Such an approach also leads to much better generalization when dealing with domains never encountered before. “We found that starting from completely random weights, evolved Hebbian rules enable an agent to navigate a dynamic 2D-pixel environment; likewise, the approach also allows a simulated 3D quadruped to learn how to walk while adapting to some morphological damage not seen during training and in the absence of any explicit reward or error signal in less than 100 timesteps.”

One of the very interesting ideas I found while looking into this idea was how these simple rules could encode very potentially complex behaviors. These complex behaviors would be the result of the many simple interactions between the millions of individual members. Using this kind of encoding could be amazing for all kinds of compression algorithms, and is something I will be looking into. If you know anything interesting, you know how to reach me.

Can you imagine the amazing applications in denoising and upsampling?

Controlling the Learning of these methods

One of the biggest challenges of such an approach is actually directing the learning process in the direction you want. Remember, if something can explore a large search space, it will explore that. Which can be very expensive. Especially when we have a lot of organisms interacting. Predicting what outcome we will get from our system (before running it) is impossible.

While you can’t tell your system what to do, you can guide your AI towards certain outcomes using nudges (similar to Evolution or RL).

Guiding a swarm system can only be done as a shepherd would drive a herd: by applying force at crucial leverage points, and by subverting the natural tendencies of the system to new ends.

— Kelly (1992)

This is speculation on my part, but I think implementing something like the attention mechanism to your data might be useful. As I’ve covered here, attention gives vision networks ‘a global view’ of an image. Something like that might be useful in designing a nudge(r), that will guide the learning outcome towards a direction (like a sheepdog). I’d be interested in discussing this with any of you.

Automatically discovered self-organizing patterns in the continuous cellular automata system Lenia (Reinke et al. 2020).

For growing towards specific target structures, researchers have found success integrating gradient-based approaches alongside the Cellular Automata -creating NCA, or Neural Cellular Automata. In NCAs a neural network learns local rules based on communicating with its local neighbors, updating its internal state. This allows us to avoid hardcoded CAs but it leads to our next problem.

Training Costs

Gradients are awesome. However, “using gradient descent-based approaches requires backpropagating the gradients through the whole sequence of developmental steps. Thus this process becomes increasingly infeasible in terms of memory requirements with an increase in developmental steps.” These costs can get out of control. Also-

NCA is only trained to grow a given structure and not to discover new structures

To work around this, one such alternative is, “searching for self-organized patterns in complex dynamical systems are more open-ended search methods such as quality diversity (QD) (Pugh et al. 2016) and intrinsically-motivated learning approaches (Baranes & Oudeyer 2013). In these methods, the idea is to not search for one particular solution (as is typical in machine learning) but instead try to find a maximally diverse set of outcomes.” This reminds me of how Evolutionary Algorithms work to maximize solutions for a fitness function. Google AI was able to create new ML algorithms using EAs.

Sebastian’s writeup also mentions possibly using RNN-based controllers. The challenge with these was that they would get stuck in local optima. Adding an element of randomness (randomly jumping to a new point) or even momentum can help tackle this issue. Furthermore, it has been shown that Sparsity can be used to reduce training costs by 8x while maintaining performance. This might be useful going forward. These ideas will be discussed in upcoming editions, so keep your eyes out.

If you like what you read, I am now on the job market. A quick summary of my skill set-

Machine Learning Engineer- I have worked on various tasks such as generative AI + text processing, modeling global supply chains, evaluating government policy (impacting over 200 Million people), and even developing an algorithm to beat Apple on Parkinson's Disease detection.
AI Writer- 30K+ email subscribers, 2M+ impressions on LinkedIn, 600K+ blog post readers over 2022.

If you would like to speak more, you can reach me through my LinkedIn here

That is it for this piece. I appreciate your time. As always, if you’re interested in reaching out to me or checking out my other work, links will be at the end of this email/post. If you like my writing, I would really appreciate an anonymous testimonial. You can drop it here. And if you found value in this write-up, I would appreciate you sharing it with more people. It is word-of-mouth referrals like yours that help me grow.