Interesting Content in AI, Software, Business, and Tech- 03/27/2024[Updates]

Content to help you keep up with Machine Learning, Deep Learning, Data Science, Software Engineering, Finance, Business, and more

Mar 27, 2024

Hey, it’s Devansh 👋👋

In issues of Updates, I will share interesting content I came across. While the focus will be on AI and Tech, the ideas might range from business, philosophy, ethics, and much more. The goal is to share interesting content with y’all so that you can get a peek behind the scenes into my research process.

I put a lot of effort into creating work that is informative, useful, and independent from undue influence. If you’d like to support my writing, please consider becoming a paid subscriber to this newsletter. Doing so helps me put more effort into writing/research, reach more people, and supports my crippling chocolate milk addiction.

Help me buy chocolate milk

PS- We follow a “pay what you can” model, which allows you to support within your means. Check out this post for more details and to find a plan that works for you.

A lot of people reach out to me for reading recommendations. I figured I’d start sharing whatever AI Papers/Publications, interesting books, videos, etc I came across each week. Some will be technical, others not really. I will add whatever content I found really informative (and I remembered throughout the week). These won’t always be the most recent publications- just the ones I’m paying attention to this week. Without further ado, here are interesting readings/viewings for 03/27/2024. If you missed last week’s readings, you can find it here.

Reminder- We started an AI Made Simple Subreddit. Come join us over here- https://www.reddit.com/r/AIMadeSimple/. If you’d like to stay on top of community events and updates, join the discord for our cult here: https://discord.com/invite/EgrVtXSjYf.

Community Spotlight: Tasmia Ansari

Tasmia is a journalist based in India covering technology. Her work focuses on how technology, especially AI, impacts human rights, politics, and society. She writes about companies that are trying to achieve their pipe dreams through underpaid labour and stolen data.

Selected work:

Email: ansaritasmia1@gmail.com • LinkedIn • Resume:

https://cyan-amata-3.tiiny.site/

If you’re doing interesting work and would like to be featured in the spotlight section, just drop your introduction in the comments/by reaching out to me. There are no rules- you could talk about a paper you’ve written, an interesting project you’ve worked on, some personal challenge you’re working on, ask me to promote your company/product, or anything else you consider important. The goal is to get to know you better, and possibly connect you with interesting people in our chocolate milk cult. No costs/obligations are attached.

Previews

Curious about what articles I’m working on? Here are the previews for the next planned articles-

Tech Made Simple

The UFC recently settled a massive anti-trust lawsuit for 335 Million Dollars.

There are some interesting game theory that we can learn from these settlements, and how they relate to labor settlements and wealth inequality.

AI Made Simple

Cyber Security and AI. That's it.

Join 150K+ tech leaders and get insights on the most important ideas in AI straight to your inbox through my free newsletter- AI Made Simple

Highly Recommended

These are pieces that I feel are particularly well done. If you don’t have much time, make sure you at least catch these works.

Prompt Injection Attacks with SVAM's Devansh

Not to pat myself on the back too much, but I recently appeared on the Partially Redacted Podcast, to talk about LLM vulnerability, DeepMind's Poem attack, and more. Check it out, I'd like to think it came out pretty well.

"In this episode, we dive deep into the world of prompt injection attacks in Large Language Models (LLMs) with the Devansh, AI Solutions Lead at SVAM. We discuss the attacks, existing vulnerabilities, real-world examples, and the strategies attackers use. Our conversation sheds light on the thought process behind these attacks, their potential consequences, and methods to mitigate them."

A New Code for AI | Vilas Dhar | TEDxPaloAlto

A great talk on the need to take AI back to the grassroots and ensure that the benefits of AI reach people at all rungs of society. Massive respect to Vilas Dhar. I appreciate the non-rage baity approach to AI Ethics (too many AI Ethisitics only make the problem worse by focusing on outrage instead of solutions).

"In his thought-provoking talk, AI expert, scholar, and philanthropist Vilas Dhar shares how artificial intelligence is transforming the human experience and presents a compelling vision for a new ethical code guiding its development. Through stories that highlight global majority experiences, he articulates a paradigm shift in the ethical framework that underpins AI technologies. Dhar envisions a future where a new code for AI ensures that these powerful tools benefit humanity as a whole. Vilas Dhar's exploration of A New Code for AI serves as a rallying call for a more humane and equitable integration of artificial intelligence into our rapidly evolving world. Vilas Dhar is an artificial intelligence policy expert, human rights advocate, and a champion for equity in a tech-enabled world. He has held a lifelong commitment to creating human-centered institutions, and his experiences as a lawyer, technologist, investor, and philanthropist shape his approach to enabling communities to build solutions to the world’s most difficult challenges: climate and disaster impacts, last mile health delivery, and digital autonomy and dignity. He has been appointed to the United Nations High-Level Advisory Body on AI and serves as the President of the Patrick J. McGovern Foundation, where he oversees one of the largest philanthropic campaigns to build a thriving AI enabled future. This talk was given at a TEDx event using the TED conference format but independently organized by a local community. Learn more at https://www.ted.com/tedx "

Devin Has Exposed a Major Issue with Software Engineering

An exceptionally well-written piece on all the empty noise on Devin replacing developers. The analysis presented here will apply to a lot of other fields. Also, now that Logan has started dropping fire memes (and transitioned to writing about ML), my job is in serious trouble.

The supposed most techno-literate group of society has become the most fear-based regarding AI advancement. There are certain tasks AI is good at and they'll replace those tasks within jobs first. This isn't exclusive to software engineering. There are aspects of all jobs that will be automated soon and many more that will take a longer time for AI to do so. By the time software engineers are fully replaced, there are many other jobs that will already have been replaced. Why software engineers are the group panicking most about AI taking jobs, I will never understand.

The release of Devin has uncovered two things about most software engineers that are way more interesting to me than the autonomous engineer:

Most engineers have no idea what their job actual consists of and why they’re paid well.
Most engineers seriously lack an understanding of machine learning.

Mixture-of-Experts (MoE): The Birth and Rise of Conditional Computation

Cameron R. Wolfe, Ph.D. writes something. I put it in my highly recommended. It's tradition at this point.

"Mixture-of-Experts (MoE) layers are simple and allow us to increase the size or capacity of a language model without a corresponding increase in compute. We just replace certain layers of the model with multiple copies of the layer—called “experts”—that have their own parameters. Then, we can use a gating mechanism to (sparsely) select the experts used to process each input. This idea has its roots in research on conditional computation in the early 1990s [15, 30] and allows us to train massive models in a tractable manner, which is helpful in domains—such as language modeling—that benefit from models with extra capacity. Here, we will study the MoE, its origins, and how it has evolved over the last two decades."

767: Open-Source LLM Libraries and Techniques — with Dr. Sebastian Raschka

Everything Sebastian does is super insightful, and this conversation with Jon Krohn was no exception. Just two great contributors in the field going over some pretty interesting questions

🦅 EagleX 1.7T : Soaring past LLaMA 7B 2T in both English and Multi-lang evals (RWKV-v5)

A few days after I did my look into RWKV, the community pulled through and dropped their upgraded Eagle LLM. One thing I really love about this writeup (even more than the results tbh) is the way that the writers of this blogpost took the time to explain why every benchmark they used is important. Given it's inference efficiency and god-tier multi-linguality: Eagle would be my front-runner for inclusive LLMs. Fantastic work by Eugene Cheah

EagleX 1.7T is a early research release of our 7.52B parameter model training that:

Is part of a larger 2T model training
Is built on the RWKV-v5 architecture (a linear transformer with 10-100x+ lower inference cost)
Is continuation based on the original Eagle 7B model
Ranks as the world’s greenest 7B model (per token)
Trained on 1.7 Trillion tokens across 100+ languages
Outperforms all 7B class models in multi-lingual benchmarks
Passes LLaMA2 (2T) in multiple English evals, approaches Mistral (>2T?)
All while being an “Attention-Free Transformer”

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

In case you missed it, Apple took a rare W when they shared MM1. Absolute gold-mine for Multimodal LLMs.

In this work, we discuss building performant Multimodal Large Language Models (MLLMs). In particular, we study the importance of various architecture components and data choices. Through careful and comprehensive ablations of the image encoder, the vision language connector, and various pre-training data choices, we identified several crucial design lessons. For example, we demonstrate that for large-scale multimodal pre-training using a careful mix of image-caption, interleaved image-text, and text-only data is crucial for achieving state-of-the-art (SOTA) few-shot results across multiple benchmarks, compared to other published pre-training results. Further, we show that the image encoder together with image resolution and the image token count has substantial impact, while the vision-language connector design is of comparatively negligible importance. By scaling up the presented recipe, we build MM1, a family of multimodal models up to 30B parameters, including both dense models and mixture-of-experts (MoE) variants, that are SOTA in pre-training metrics and achieve competitive performance after supervised fine-tuning on a range of established multimodal benchmarks. Thanks to large-scale pre-training, MM1 enjoys appealing properties such as enhanced in-context learning, and multi-image reasoning, enabling few-shot chain-of-thought prompting.

A.I. Companies Are Losing A LOT Of Money

Great video on the business side of Gen AI. Things are not looking pretty. The numbers do paint an interesting picture about the rise of hardware though. Had me thinking about how much money efficiency based services could make (imagine how much Google or MS would pay to people that can reduce their training data needs or inference costs).

In this video we take a deep dive into the generative AI industry and question its economic viability.

CoreWeave: The Underdog Powering Generative AI's Explosion

I've been following Ksenia's work for close to 3 years now, and in that time, I've never been disappointed.

From its humble beginnings as a cryptocurrency mining operation, CoreWeave has emerged as a leading player in the world of cloud computing for AI. Last December, their valuation climbed from $2 billion to $7 billion after a minority investment of $642 million led by Fidelity Management and Research Co.

Their strategic pivot and early access to NVIDIA's cutting-edge GPUs made this happen. CoreWeave's ability to provide powerful and cost-effective GPU resources is fueling the current generative AI revolution. But they don’t forget their roots and previous partners: on March 7, 2024, CoreWeave has entered into a multi-year contract worth up to $100 million to lease 16MW of data center space from Core Scientific, a Bitcoin mining and digital infrastructure provider.

Let's explore CoreWeave's pivotal history, their strategies, its claims of being picks and shovels for all AI applications, its vast technical infrastructure offerings, dependencies challenges, and what it means to be dancing between the feet of elephants.