How to take back control of the internet [Investigations]
The internet sold out to corporate interest and governments. How to start fighting back.
Hey, it’s Devansh 👋👋
Some questions require a lot of nuance and research to answer (“Do LLMs understand Languages”, “How Do Batch Sizes Impact DL” etc.). In Investigations, I collate multiple research points to answer one over-arching question. The goal is to give you a great starting point for some of the most important questions in AI and Tech.
II put a lot of effort into creating work that is informative, useful, and independent from undue influence. If you’d like to support my writing, please consider becoming a paid subscriber to this newsletter. Doing so helps me put more effort into writing/research, reach more people, and supports my crippling chocolate milk addiction. Help me democratize the most important ideas in AI Research and Engineering to over 100K readers weekly. You can use the following for an email template.
PS- We follow a “pay what you can” model, which allows you to support within your means, and support my mission of providing high-quality technical education to everyone for less than the price of a cup of coffee. Check out this post for more details and to find a plan that works for you.
Use your eyes, or I will remove them in atonment
-We need to actively strive to take back our power. If you know the source of this quote, you have good taste and we should be besties.
The internet has done great things for accessibility and empowering social mobility. However, developments in Technology, Machine Learning, and data-gathering technologies have enabled large-scale surveillance, hyper-targeting, and mass propaganda. Moreover, governments have now started to block sites to restrict access to information, deploy signal jamming to cause internet blackouts in selected areas, and companies have started mass-scraping data from the internet to build their products without compensating the owners of the data.
Liberal democracies are major users of AI surveillance. The index shows that 51 percent of advanced democracies deploy AI surveillance systems. In contrast, 37 percent of closed autocratic states, 41 percent of electoral autocratic/competitive autocratic states, and 41 percent of electoral democracies/illiberal democracies deploy AI surveillance technology
In this post, I will cover some important techniques that you can use to address these problems and start to regain some control over the internet. Specifically, we will cover how to use AI to achieve the following:
How to get around network disruptions used by governments to block access to information.
How to break signal jamming.
Protecting your privacy and punishing people for stealing your data by poisoning the well.
Let’s get right into it.
How to Break Automated Internet Censors
Among the most common types of suppression involves using middle blockers to tamper with your traffic. These middle blockers are the Casemiro of internet networks- disrupting your online research by sliding on your outgoing packets and thus preventing you from accessing restricted sites.
Luckily, there are ways to fight these. My favorite is Project Geneva, which utilizes evolutionary algorithms to modify your network traffic in ways that prevent middle blockers from getting involved. I like Project Geneva for the following 4 major reasons:
Fully automated: you don’t need any expertise to do it.
Geneva runs exclusively on one side of the connection: it does not require a proxy, bridge, or assistance from outside the censoring regime.
It defeats censorship by modifying network traffic on the fly so that middleboxes can’t block traffic.
Since Geneva works at the network layer, it can be used with any application. With Geneva running in the background, any web browser can become a censorship evasion tool.
Put another way, Geneva is something that can be utilized by the masses, even when they have no external network support. Let’s now cover how it works. We first have the 4 basic commands that we can apply to network packets:
duplicate
: takes one packet and returns two copies of the packetdrop
: takes one packet and returns no packets (drops the packet)tamper
: takes one packet and returns the modified packetfragment
: takes one packet and returns two fragments or two segments
Since duplication and fragmenting create multiple packets, these 4 basic blocks are combined by our Evolutionary Algorithm to create “Action Trees”. The actions are figured out through triggers- “The trigger describes which packets the tree should run on, and the tree describes what should happen to each of those packets when the trigger fires.”
Since this is a Genetic Algorithm, it’s important to have well-defined fitness functions. Given the infinite search space, the authors put a heavy premium on trimming deadweight solutions: This hierarchy accomplishes a significant search space reduction. Instead of Geneva fuzzing the entire space of possible strategies (for which there are many!), it instead quickly eliminates strategies that break the underlying connection and encourages the genetic algorithm to concentrate effort on only those strategies that keep the underlying connection alive.
To those of you who wish to play into building your own Evolutionary Algorithms for this, here is a handy cheat sheet to put all the elements in one place-
If you want to get involved with Project Geneva, I covered them in more depth here. Now let’s proceed to the next tool of control.
Building Low Resource Anti-Signal Jammers with Machine Learning
Signal Jamming was originally created for military purposes, but it is now a common tactic used to disrupt civilian protests and control the narrative. By employing large-scale signal jamming, one can disrupt communications and cause blackouts- allowing them to paint the picture however they please. By disrupting communication Signal Jamming makes large-scale coordination difficult. Fortunately, there are ways to fight back.
By analyzing the signals received during jamming, we can start to infer the properties of the interfering jammers. This section will explain the basic techniques you need to develop your own lightweight anti-signal jamming tools. We will use AI to analyze the properties of the disrupted signal, and in doing so be able to work around the blockage.
For these purposes, the publication “Jamming Prediction for Radar Signals Using Machine Learning Methods” is a good read. It operates on a simple principle- by analyzing the Signal Data, we can infer the properties of the jamming protocol being used. This can be used to ultimately predict the jamming technique, which we need to find a workaround.
The authors compare 2 possible ways to figure out the technique- vanilla DNNs that extract features and LSTMs that directly take in the signals.
It’s worth covering the feature extraction process. It relies on signal processing and Math (2 Skills that are extremely useful for any ML Engineer). We first compute the 4 basic statistical descriptors that can be computed from any sequence- the mean, standard deviation, skewness, and kurtosis.
Next come the sophisticated measures- autocorrelation coefficient(s), framing, Fast Fourier transform (FFT), mel filter bank, log function, and inverse FFT. “In the framing step, the sequence is divided into segments of equal size. FFT is applied to a subsequence of each segment to convert to the frequency domain, and then the power spectrum is obtained. The mel spectrum is obtained by applying mel filters to the power spectrum, and the coefficients are obtained by inverse FFT after applying the log function to the mel spectrum. These coefficients are called MFCCs. All or some of the coefficients obtained in each segment can be used as the feature values.”
The LSTM used in the paper has the following architecture-
After reading all the steps that the Feature Extraction approach goes through, looking at the LSTM approach made me chuckle. Below is the real clip of how that conversation went-
This difference only becomes funnier, when we look at the results- the LSTM outperforms the vanilla approach (although it is slower). While they both do fine on the known types, the LSTMs show better generalization to unknown types (although that’s not always true here).
We are not done with signal jamming yet. Direct your attention to “Towards an AI-Driven Universal Anti-Jamming Solution with Convolutional Interference Cancellation Network”. This approach is great because it is …universal and only requires a double antenna receiver (DAR).
Let’s say we had a DAR about to get funky with the signals, but its night gets ruined by a jammer. If it was a puny single-antenna receiver, it would have no choice but to go back home. But DARs are different. By analyzing the shifts in both signals received, it can figure out how to stop the jammer from third-wheeling (by knowing the shift, we can create a wave to cancel out the jamming signals). To figure out the phase shift (w/o knowing anything about the jammer) would be difficult, but CNNs turn out to be very handy there. “When the communication is interfered by jammer, estimating the phase shift Δ𝜙𝐽 (in Equation (5)) is challenging without explicit information about the jamming signal. Firstly, the received signal introduces an entanglement of legitimate and jamming signals. Secondly, the constructive and destructive effects of multi-path propagation on the signal’s phase is typically unpredictable. Our approach to address this challenge centers around a fast and accurate convolutional neural network (CNN) which can estimate the phase shift precisely as well as recognize the current state of the channel”
Their results are worth checking out
For both the approaches outlined, I would be curious to see the performance of Hybrid Neural Networks and Complex Wave Functions would hold up. The use of Complex-Valued Functions instead of the lower levels of CNNs did show great results in stability, costs, performance, and robustness. Their performance in feature extraction despite noise is particularly promising here-
Now let’s move on to the final topic. And one that will be directly relevant to all of you. A lot of different groups are trying to use your data. Let’s discuss how you can protect your data and punish people who try to steal it. Specifically, we know that:
Governments/Contractors are scraping the internet to monitor internet activity and conduct mass surveillance.
Social Media Companies use your data to build profiles of you, to sell you more things, and to target your cognitive blindspots.
AI Art Companies often use copyrighted data without paying the artists.
By poisoning the well, you can make life harder for these groups- disincentivizing the development of these systems. If enough people participate in this, it will become infeasible to develop these systems.
Protect your Privacy by Poisoning the Well
To those of you who spend time online and are into AI Ethics, the term “Poison the Well” might be familiar. It has been making the rounds recently. Poisoning the Well is a term given to adversarial perturbations (APs) that artists/people can apply to their images to disrupt any AI Art/Photo Generators training on their data. It’s a good idea, but it has a major problem. To understand it, let’s first talk about how it works.
The excellent MIT paper, “Adversarial Examples Are Not Bugs, They Are Features” (which we broke down here) posited that classifiers tend to extract two kinds of useful features (features that help a model predict)- robust and non-robust features. Robust Features can tank APs with no problems, but Non-Robust Features fall over quicker than Pure-BJJ ‘athletes’ butt-scooching in competitions. APs attack these Non-Robust Features, allowing them to cause misclassification while staying imperceptible to humans. The most extreme is the One Pixel Attack, which breaks classifiers by only changing one pixel.
Unfortunately, these changes are often customized to a particular architecture and don’t carry over super well. The problem gets worse, when we go from CNNs (which at least have similar inductive biases) to ViTs (the attack “Patch-Fool: Are Vision Transformers Always Robust Against Adversarial Perturbations?” attacks ViTs well). And since the ones stealing our data/monitoring our activities aren’t nice enough to tell us what architectures they’re using we hit a teensy bit of a pickle. Based on my experiments, Universal APs are promising, but not fully there yet. For now, we can rely on attacking CNNs (since they are the most scalable and thus most likely model used by a majority of Vision people), but we need to stay vigilant for the future.
This only becomes worse when we look into GPT-4 and other multi-modal models. Multi-modality can significantly increase the robustness of a model, and this means that it might serve as a great one-stop shop for surveillance needs. Fortunately, they haven’t become available at scale yet, but we need to keep an eye out in the future. I’ve been experimenting with GPT-4, and so far it’s been able to thwart my AP attacks (not sure if I should give kudos to OpenAI for making great strides or curse them for making my life so much harder). But there is some hope there.
As I was looking into exploiting an autoregressive models inherent unreliability, I uploaded the following picture of myself to ChatGPT and asked it some questions. It did a great job answering the questions.
When I asked it to describe me, it called me “athletic” (which was nice). What’s interesting is the reasoning it provided- it pointed to my well-defined calves. Not only is only one calf visible, but it’s also not particularly well-defined in the picture. My guess is that it gets confused by additional information (such as the other person’s calf).
I tested this out with a few experiments, by embedding confusing information into the image. The results have been promising, but to be honest, a lot of it feels like I’m grasping at straws. I can’t predict if my inputs will work, nor am I anywhere close to an algorithm for it. This becomes even harder b/c this is not my full-time job and I’m trying to break a model created by a hundred-billion-dollar company (whose parent company is a Trillion Dollar Tech Giant older than I am). But these are the kinds of fights I like picking, and I consider myself God's Gift to the Universe, so I’ll eventually work something out.
If any of you would like to work on this, I’ve jotted down the basic requirements below:
The silver lining to this whole project is that we don’t have to completely destroy classifiers. Even bringing them down by 20–30% should reduce the commercial viability a lot. As with all important things in life, the final disrupting protocol doesn’t have to be great, just good enough.
If you liked this article and wish to share it, please refer to the following guidelines.
That is it for this piece. I appreciate your time. As always, if you’re interested in working with me or checking out my other work, my links will be at the end of this email/post. And if you found value in this write-up, I would appreciate you sharing it with more people. It is word-of-mouth referrals like yours that help me grow.
Reach out to me
Use the links below to check out my other content, learn more about tutoring, reach out to me about projects, or just to say hi.
Small Snippets about Tech, AI and Machine Learning over here
AI Newsletter- https://artificialintelligencemadesimple.substack.com/
My grandma’s favorite Tech Newsletter- https://codinginterviewsmadesimple.substack.com/
Check out my other articles on Medium. : https://rb.gy/zn1aiu
My YouTube: https://rb.gy/88iwdd
Reach out to me on LinkedIn. Let’s connect: https://rb.gy/m5ok2y
My Instagram: https://rb.gy/gmvuy9
My Twitter: https://twitter.com/Machine01776819
This was absolutely fascinating. I have never even considered many of the topics you covered in this post. A brilliant and timely post as the tendrils of censorship seem to be lengthening daily.
Great Quote to open an excellent article. Minor sp. point: atonement [with the first "e"]. I had to search for the quote, I think it comes from Endless Legend, the Amplitude Studios / Iceberg Interactive fantasy-strategy game. "They are weak. They are soft. They writhe in supposed discomfort at minor penances that the faithful would not even notice. Their sloth will be their undoing; without the pain, they cannot touch the power. Use your eyes! Or I will remove them in atonement. See that Auriga herself writhes in anguish -- as we all should. Join me in contrition and mortification such as Auriga has never seen. Together we will embrace the agony, and make it our servant."
[via https://en.wikiquote.org/wiki/Endless_Legend] & Endless Legend - Major Factions - The Ardent Mages (2 July 2014) [https://www.youtube.com/watch?v=b3J7WznDqSQ]