Interesting Content in AI, Software, Business, and Tech- 01/31/2024 [Updates]
Content to help you keep up with Machine Learning, Deep Learning, Data Science, Software Engineering, Finance, Business, and more
Hey, it’s Devansh 👋👋
In issues of Updates, I will share interesting content I came across. While the focus will be on AI and Tech, the ideas might range from business, philosophy, ethics, and much more. The goal is to share interesting content with y’all so that you can get a peek behind the scenes into my research process.
I put a lot of effort into creating work that is informative, useful, and independent from undue influence. If you’d like to support my writing, consider becoming a premium subscriber to my sister publication Tech Made Simple to support my crippling chocolate milk addiction. Use the button below for a lifetime 50% discount (5 USD/month, or 50 USD/year).
A lot of people reach out to me for reading recommendations. I figured I’d start sharing whatever AI Papers/Publications, interesting books, videos, etc I came across each week. Some will be technical, others not really. I will add whatever content I found really informative (and I remembered throughout the week). These won’t always be the most recent publications- just the ones I’m paying attention to this week. Without further ado, here are interesting readings/viewings for 01/31/2024. If you missed last week’s readings, you can find it here.
Reminder- We started an AI Made Simple Subreddit. Come join us over here- https://www.reddit.com/r/AIMadeSimple/. If you’d like to stay on top of community events and updates, join the discord for our cult here: https://discord.com/invite/EgrVtXSjYf.
Community Spotlight: RitvikMath
RitvikMath is one of my all-time favorite sources for developing a mathematical intuition to Data Science concepts. I love that his focus is on telling the story around the equations, helping you understand not just the math, but also why we do things the way we do. The latter is key to developing your judgment for deploying and modifying these tools. Educators like him deserve heaps of recognition for all the good they do.
If you're doing interesting work and would like to be featured in the spotlight section, just drop your introduction in the comments/by reaching out to me. There are no rules- you could talk about a paper you've written, an interesting project you've worked on, some personal challenge you're working on, ask me to promote your company/product, or anything else you consider important. The goal is to get to know you better, and possibly connect you with interesting people in our chocolate milk cult. No costs/obligations are attached.
Highly Recommended
These are pieces that I feel are particularly well done. If you don't have much time, make sure you at least catch these works.
Propaganda or Science: Open Source AI and Bioterrorism Risk
People claiming that open-source LLMs will lead to bioterrorism might have been exaggerating slightly? Shocking. Thank you Filippo Marino for sharing this. Given the state of AI Regulation, this is not an issue we want to overlook.
I examined all the biorisk-relevant citations from a policy paper arguing that we should ban powerful open source LLMs.
None of them provide good evidence for the paper's conclusion. The best of the set is evidence from statements from Anthropic -- which rest upon data that no one outside of Anthropic can even see, and on Anthropic's interpretation of that data. The rest of the evidence cited in this paper ultimately rests on a single extremely questionable "experiment" without a control group.
In all, citations in the paper provide an illusion of evidence ("look at all these citations") rather than actual evidence ("these experiments are how we know open source LLMs are dangerous and could contribute to biorisk").
A recent further paper on this topic (published after I had started writing this review) continues this pattern of being more advocacy than science.
Depth Anything, Vision Mamba, Self-Extending LLMs, and more...
Sairam Sundaresan does really good explanations of concepts in Data and AI. He's compiled some really interesting resources, which you should definitely check out.
These are some of the most interesting resources I found over the past week covering a range of topics in computer vision and NLP.
Transfer Learning for Text Diffusion Models
Super interesting implications for alignment and quality control.
In this report, we explore the potential for text diffusion to replace autoregressive (AR) decoding for the training and deployment of large language models (LLMs). We are particularly interested to see whether pretrained AR models can be transformed into text diffusion models through a lightweight adaptation procedure we call ``AR2Diff''. We begin by establishing a strong baseline setup for training text diffusion models. Comparing across multiple architectures and pretraining objectives, we find that training a decoder-only model with a prefix LM objective is best or near-best across several tasks. Building on this finding, we test various transfer learning setups for text diffusion models. On machine translation, we find that text diffusion underperforms the standard AR approach. However, on code synthesis and extractive QA, we find diffusion models trained from scratch outperform AR models in many cases. We also observe quality gains from AR2Diff -- adapting AR models to use diffusion decoding. These results are promising given that text diffusion is relatively underexplored and can be significantly faster than AR decoding for long text generation.
🦅 Eagle 7B : Soaring past Transformers with 1 Trillion Tokens Across 100+ Languages (RWKV-v5)
It can be easy to forget that there are non-transformer models for LLMs. We covered research by the RWKV group a lil while ago, but they just made a massive splash with a green, multi-lingual model. Did I mention, it's commercially available? Their Wiki will get you very excited: "RWKV (pronounced as RwaKuv) is an RNN with GPT-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable)."
Eagle 7B is a 7.52B parameter model that:
Built on the RWKV-v5 architecture (a linear transformer with 10-100x+ lower inference cost)
Trained on 1.1 Trillion Tokens across 100+ languages
Outperforms all 7B class models in multi-lingual benchmarks
Approaches Falcon (1.5T), LLaMA2 (2T), Mistral (>2T?) level of performance in English evals
Trade blows with MPT-7B (1T) in English evals
Is a foundation model, with a very small instruct tune - further fine-tuning is required for various use cases!
Prevalence of neural collapse during the terminal phase of deep learning training
The Geometry of Deep Learning is something I've been studying more recently. This is a great read (courtesy of Manny Ko).
Modern deep neural networks for image classification have achieved superhuman performance. Yet, the complex details of trained networks have forced most practitioners and researchers to regard them as black boxes with little that could be understood. This paper considers in detail a now-standard training methodology: driving the cross-entropy loss to zero, continuing long after the classification error is already zero. Applying this methodology to an authoritative collection of standard deepnets and datasets, we observe the emergence of a simple and highly symmetric geometry of the deepnet features and of the deepnet classifier, and we document important benefits that the geometry conveys—thereby helping us understand an important component of the modern deep learning training paradigm.
No Power Grid? This African Village Mines Bitcoin for Electricity
A crypto company provided renewable energy to a village because they needed space to set up their mining operations. A good indication of how aligning stake-holder incentives can lead to great outcomes. Thank you Jean-Pierre Bianchi for the share.
But things changed when Gridless — a Kenyan company that designs, builds, and operates Bitcoin mining sites — installed a micro-hydro mini-grid in the village. The mini-grid harnesses the power of water to generate electricity, which is then used to mine Bitcoin.
By mining Bitcoin, Gridless not only earns income to sustain the mini-grid but also provides affordable electricity to over 1,800 homes in Bondo. Erik Hersman, the CEO of Gridless, said: "We are not just mining Bitcoin, we are mining hope."
Airtel will beat JIO? What no one is Telling you about the Jio vs Airtel vs Tata Telecom Wars!
Even if you have no interest in the Indian Telecom markets, this is a great video on how keeping focus on a core group of customers and never deviating from a solid plan can help you fight back against obscene heaps of money.
In the past 7 years, the Indian telecom space has seen the most competitive business wars in its history!! After Jio came out, 2 giant players merged together to become Vodafone idea, BSNL is still on the ventilator and players like Telenor and Aircel have just vanished from the market!!! But as we saw, while all these players were struggling to survive, the only player who fought back and is still fighting the battle is Airtel!!! How exactly is Airtel bouncing back after the telecom shock of 2016? What are the business strategies that helped Airtel beat Jio in this fierce competition? And finally, What lessons can we learn from the return of airtel/ business wars?
Computations with p-adic numbers
In my brief introduction to Number Bases in software engineering and computer science, I covered how p-adic numbers have some very interesting uses in software engineering. I found this paper, while doing research for that. The more mathematical amongst you might find this interesting. Integrating ideas like this is key to making breakthroughs, and discovering new fields of study.
This document contains the notes of a lecture I gave at the “Journ´ees Nationales du Calcul Formel1 ” (JNCF) on January 2017. The aim of the lecture was to discuss low-level algorithmics for p-adic numbers. It is divided into two main parts: first, we present various implementations of p-adic numbers and compare them and second, we introduce a general framework for studying precision issues and apply it in several concrete situations.
AI Content
A Comparison of LSTM and GRU Networks for Learning Symbolic Sequences
We explore the architecture of recurrent neural networks (RNNs) by studying the complexity of string sequences that it is able to memorize. Symbolic sequences of different complexity are generated to simulate RNN training and study parameter configurations with a view to the network’s capability of learning and inference. We compare Long Short-Term Memory (LSTM) networks and gated recurrent units (GRUs). We find that an increase in RNN depth does not necessarily result in better memorization capability when the training time is constrained. Our results also indicate that the learning rate and the number of units per layer are among the most important hyper-parameters to be tuned. Generally, GRUs outperform LSTM networks on low-complexity sequences while on high-complexity sequences LSTMs perform better.
XLGen: Cluster Guided Label Generation in XC
Manish Gupta makes great AI Research content. Check him out.
For extreme multi-label classification (XMC), existing classification-based models poorly perform for tail labels and often ignore the semantic relations among labels, like treating “Wikipedia” and “Wiki” as independent and separate labels. XMC can be cast as a generation task (XLGen), so as to benefit from pre-trained text-to-text models. However, generating labels from the extremely large label space is challenging without any constraints or guidance. Label generation is therefore guided using label cluster information to hierarchically generate lower-level labels. Frequency-based label ordering and using decoding ensemble methods are critical factors for the improvements in XLGen. XLGen with cluster guidance significantly outperforms the classification and generation baselines on tail labels, and also generally improves the overall performance in four popular XMC benchmarks. In human evaluation, XLGen generates unseen but plausible labels. In this video, I will talk about the following: How can we use T5 for generating labels in XMC? How does XLGen perform?
WARM: On the Benefits of Weight Averaged Reward Models
Aligning large language models (LLMs) with human preferences through reinforcement learning (RLHF) can lead to reward hacking, where LLMs exploit failures in the reward model (RM) to achieve seemingly high rewards without meeting the underlying objectives. We identify two primary challenges when designing RMs to mitigate reward hacking: distribution shifts during the RL process and inconsistencies in human preferences. As a solution, we propose Weight Averaged Reward Models (WARM), first fine-tuning multiple RMs, then averaging them in the weight space. This strategy follows the observation that fine-tuned weights remain linearly mode connected when sharing the same pre-training. By averaging weights, WARM improves efficiency compared to the traditional ensembling of predictions, while improving reliability under distribution shifts and robustness to preference inconsistencies. Our experiments on summarization tasks, using best-of-N and RL methods, shows that WARM improves the overall quality and alignment of LLM predictions; for example, a policy RL fine-tuned with WARM has a 79.4% win rate against a policy RL fine-tuned with a single RM.
The Step-by-Step Guide to Becoming a Machine Learning Engineer
Machine Learning For Everyone is a repository on GitHub to make it easily accessible to anyone with an internet connection, practical to keep continually updated, and easy to include machine learning practice examples for anyone wanting to train their own model (coming soon). It’s a way for consumers, engineers, and techies to learn whatever they want about ML.
Currently, the guide contains 5 distinct learning paths, which I'll touch on here:
A path to gain the skills necessary to become a machine learning engineer.
A path for anyone interested in AI research who wants to learn about machine learning models.
A path for developers who want to take advantage of the machine learning tools at their disposal in the applications they build.
A path for consumers wanting to understand how machine learning will affect them.
A path for companies who want to use machine learning in their business.
I'm going to touch on each of these paths: who they're for, what they consist of, and where you can find them.
If you liked this article and wish to share it, please refer to the following guidelines.
Reach out to me
Use the links below to check out my other content, learn more about tutoring, reach out to me about projects, or just to say hi.
Small Snippets about Tech, AI and Machine Learning over here
AI Newsletter- https://artificialintelligencemadesimple.substack.com/
My grandma’s favorite Tech Newsletter- https://codinginterviewsmadesimple.substack.com/
Check out my other articles on Medium. : https://rb.gy/zn1aiu
My YouTube: https://rb.gy/88iwdd
Reach out to me on LinkedIn. Let’s connect: https://rb.gy/m5ok2y
My Instagram: https://rb.gy/gmvuy9
My Twitter: https://twitter.com/Machine01776819
Thanks for including me here! :) I'm hoping to make it easy for anyone to learn whatever they want about ML.
Sorry for the late comment - just getting through a huge backlog of reading after a work trip.
Thanks!