Content Recommendations- 12/4/2024 [Updates]
Interesting Content in AI, Software, Business, and Tech- 12/4/2024
Hey, it’s Devansh 👋👋
In issues of Updates, I will share interesting content I came across. While the focus will be on AI and Tech, the ideas might range from business, philosophy, ethics, and much more. The goal is to share interesting content with y’all so that you can get a peek behind the scenes into my research process.
I put a lot of effort into creating work that is informative, useful, and independent from undue influence. If you’d like to support my writing, please consider becoming a paid subscriber to this newsletter. Doing so helps me put more effort into writing/research, reach more people, and supports my crippling chocolate milk addiction. Help me democratize the most important ideas in AI Research and Engineering to over 100K readers weekly. You can use the following for an email template.
PS- We follow a “pay what you can” model, which allows you to support within your means, and support my mission of providing high-quality technical education to everyone for less than the price of a cup of coffee. Check out this post for more details and to find a plan that works for you.
A lot of people reach out to me for reading recommendations. I figured I’d start sharing whatever AI Papers/Publications, interesting books, videos, etc I came across each week. Some will be technical, others not really. I will add whatever content I found really informative (and I remembered throughout the week). These won’t always be the most recent publications- just the ones I’m paying attention to this week. Without further ado, here are interesting readings/viewings for 12/4/2024. If you missed last week’s readings, you can find it here.
Reminder- We started an AI Made Simple Subreddit. Come join us over here- https://www.reddit.com/r/AIMadeSimple/. If you’d like to stay on top of community events and updates, join the discord for our cult here: https://discord.com/invite/EgrVtXSjYf. Lastly, if you’d like to get involved in our many fun discussions, you should join the Substack Group Chat Over here.
Community Spotlight: betaworks Startup Program
Betaworks has just opened applications for its AI Applications Accelerator a.k.a Camp. Camp is a thematic investment and in-residence program for startups building in frontier technologies. Founders receive a guaranteed investment of up to $500k, access to the Betaworks space in NYC, and support from our rich network. Betaworks has been investing at the forefront of ML/AI since the launch of its first fund in 2016, with portfolio companies that include HuggingFace (first check), Nomic, Flower, Granola, Browser Company and more. To learn more about why they are doubling down on the application layer read their blog post here.
Thank you to Nicole Ripka for highlighting this wonderful opportunity and I hope y’all take great advantage of this.
If you’re doing interesting work and would like to be featured in the spotlight section, just drop your introduction in the comments/by reaching out to me. There are no rules- you could talk about a paper you’ve written, an interesting project you’ve worked on, some personal challenge you’re working on, ask me to promote your company/product, or anything else you consider important. The goal is to get to know you better, and possibly connect you with interesting people in our chocolate milk cult. No costs/obligations are attached.
Previews
Curious about what articles I’m working on? Here are the previews for the next planned articles-
-
Why Agents are Key for Vertical AI (an extension to the principles here).
I provide various consulting and advisory services. If you‘d like to explore how we can work together, reach out to me through any of my socials over here or reply to this email.
Highly Recommended
These are pieces that I feel are particularly well done. If you don’t have much time, make sure you at least catch these works.
MTV vs Rohan Cariappa- and Tech Risks
Not content, but very important to talk about. Rohan Cariappa an excellent rap commentator is facing significant controversy with MTV Hustle after he posted videos criticizing the show’s judging standards for favoritism and a lack of technical understanding. In response to his critiques, MTV issued over 50 copyright strikes against his YouTube channel, which will cause his entire channel to be deleted on December 7. They’re also threatening to sue RC for Defamation.
Contrary to most things we discuss- there are no grey areas here. The makers of MTV Hustle are acting like thugs, and this is an attack on the freedom of expression. RC’s work is always made in good faith, and is in no way a violation of copyright.
I also think this is a good illustration of a concept we’ve flirted with wrt to AI Risks. Imo, one of the biggest risks associated with AI (and all Tech) is it’s ability to operate at scale at efficiency. When misapplied, this means that we can worsen the existing imbalances in processes. The creators of MTV Hustle are able to use misuse their resources to run paid PR campaigns against RC and game YouTube’s automated copyright checkers to shut down a smaller independent voice like RC.
The same scale that makes things easier to operate can be misapplied to oppress people.
AI in Healthcare: Friend or Foe
Thank you to Dr Terence Tan for setting up an excellent conversation about what we need for AI in Healthcare. Good balance of perspectives, including yours truly. Go check it out.
I don’t know what happened, but Burak Buyukdemir has really embraced violence and sharing some truth bombs. Highly, highly recommend this one.
We introduce Best-of-N (BoN) Jailbreaking, a simple black-box algorithm that jailbreaks frontier AI systems across modalities. BoN Jailbreaking works by repeatedly sampling variations of a prompt with a combination of augmentations — such as random shuffling or capitalization for textual prompts — until a harmful response is elicited. We find that BoN Jailbreaking achieves high attack success rates (ASRs) on closed-source language models, such as 89% on GPT-4o and 78% on Claude 3.5 Sonnet when sampling 10,000 augmented prompts. Further, it is similarly effective at circumventing state-of-the-art open-source defenses like circuit breakers. BoN also seamlessly extends to other modalities: it jailbreaks vision language models (VLMs) such as GPT-4o and audio language models (ALMs) like Gemini 1.5 Pro, using modality-specific augmentations. BoN reliably improves when we sample more augmented prompts. Across all modalities, ASR, as a function of the number of samples (N), empirically follows power-law-like behavior for many orders of magnitude. BoN Jailbreaking can also be composed with other black-box algorithms for even more effective attacks — combining BoN with an optimized prefix attack achieves up to a 35% increase in ASR. Overall, our work indicates that, despite their capability, language models are sensitive to seemingly innocuous changes to inputs, which attackers can exploit across modalities.
RAFT: Adapting Language Model to Domain Specific RAG
Got around to reading through some of my backlogs and found this gem. Shoutout to Cameron R. Wolfe, Ph.D. for finding this over here. Boy is an absolute fiend for AI Research and I will never stop plugging his work.
Pretraining Large Language Models (LLMs) on large corpora of textual data is now a standard paradigm. When using these LLMs for many downstream applications, it is common to additionally bake in new knowledge (e.g., time-critical news, or private domain knowledge) into the pretrained model either through RAG-based-prompting, or fine-tuning. However, the optimal methodology for the model to gain such new knowledge remains an open question. In this paper, we present Retrieval Augmented FineTuning (RAFT), a training recipe that improves the model’s ability to answer questions in a “open-book” in-domain settings. In RAFT, given a question, and a set of retrieved documents, we train the model to ignore those documents that don’t help in answering the question, which we call, distractor documents. RAFT accomplishes this by citing verbatim the right sequence from the relevant document that would help answer the question. This coupled with RAFT’s chain-of-thought-style response helps improve the model’s ability to reason. In domain-specific RAG, RAFT consistently improves the model’s performance across PubMed, HotpotQA, and Gorilla datasets, presenting a post-training recipe to improve pre-trained LLMs to in-domain RAG. RAFT’s code and demo are open-sourced at this http URL.
MBA-RAG: a Bandit Approach for Adaptive Retrieval-Augmented Generation through Question Complexity
The multi-hop performance seems promising. I have to test the ROIs, but I think this is a great example for a principle that Demetrios Brinkmann discussed wrt to the costs for solutions going up (look next).
Retrieval Augmented Generation (RAG) has proven to be highly effective in boosting the generative performance of language model in knowledge-intensive tasks. However, existing RAG framework either indiscriminately perform retrieval or rely on rigid single-class classifiers to select retrieval methods, leading to inefficiencies and suboptimal performance across queries of varying complexity. To address these challenges, we propose a reinforcement learning-based framework that dynamically selects the most suitable retrieval strategy based on query complexity. % our solution Our approach leverages a multi-armed bandit algorithm, which treats each retrieval method as a distinct ``arm’’ and adapts the selection process by balancing exploration and exploitation. Additionally, we introduce a dynamic reward function that balances accuracy and efficiency, penalizing methods that require more retrieval steps, even if they lead to a correct result. Our method achieves new state of the art results on multiple single-hop and multi-hop datasets while reducing retrieval costs. Our code are available at this https URL .
Price per token is going down. Price per answer is going up.
A masterpiece by Demetrios Brinkmann , one of the best minds in MLOps right now. He makes a simple but extremely important point- we don’t pay for an API call; we pay for answers. LLMs are best for more complex answers (where efficiency gains increase ROI) but this also requires more setting up (engineering costs, evals etc)- which will increase your costs. The cheaper API calls are a way to convince you to buy more, which will increase your overall costs. Insights like this is why I called Dem one of the best.
Our expectations and use cases from LLMs expand so the cost to us to use them (as well as the cost to train the models and operate the services) is actually going up even as cost/token goes down.
Forget about reasoning for a moment. Forget about o1 that abstracts away extra LLM calls on the back end and charges the end user a higher price. Forget about LLMs as a judge and all that fancy stuff. We will get to that later.
For now, look at me as the user of an AI system.
I am increasingly asking more complex questions of my AI. I am expecting it to “do” more complex tasks for me. Much more than just asking questions to chatgpt. I want to get various data points summarize them, send it to colleagues.
I am expecting more from AI. I turn to it with increasingly more complex questions.
My expectation translates to asking questions I wouldnt have thought AI could handle a year ago. My expectation translates into longer and longer prompts using more and more of the context window.
This expectation of AI being able to handle complexity translates to system complexity on the backend. Whereas before common practice was to make one LLM call, get a response, and call it a day. Things have changed. I am not saying that use case no longer exists, but I am saying it’s less common.
But that is not the AI revolution we were promised. That is not the big productivity gain this hype wave was built on. Companies giving their employees an enterprise license to use Open AI/Anthropic is not what will boost the nations GDP.
What is valuable is AI that ‘does stuff’. So when we expect ‘actions’ we talk agents.
You know what agents come with?
You guessed it, more LLM calls.
This Week’s Story: Chinese AI models surpass the most advanced US counterparts
Really loving Aristotelis Xenofontos ‘s insights on AI Developments. Different to what I do, which really helps me round out my thinking. Aris is an investor with a background in engineering, which allows him to make very accurate assessments about developments and their impacts on the space.
Code Quality in the Age of AI ✅
Good overview of the role of coding assistants in Software Engineering by 🌀 Luca Rossi . There is a lot more nuance to the subject than what the headlines make it seem, and personally I think we need more explorations like this. I think Logan Thorneloe would have a lot of good things to say here.
Here is the agenda for today:
📖 What makes code easy to change — let’s explore the fundamentals of maintainable code, from readability to testing, and why they matter more than ever.
🤖 Coding in the Age of AI — how AI is changing the way we write and maintain code, and why this makes quality even more crucial.
🔄 The Lifecycle of Quality — a practical framework for building systems that consistently produce good code, from early practices to final reviews.
Let’s dive in!
The Meaning of Intelligence and ARC-AGI
We need more “philosophy” discussions like this.
The discourse on AI is often wrapped around definitions. We vaguely define terms such as reasoning, intelligence, AI, AGI, and Agents, relying on intuition instead of precision. Terms end up overloaded, and disagreement about meanings of terms ends up driving over AI capabilities, whether LLMs can really reason, and when AGI will arrive.
The proliferation of terms and confusion about definitions reflects uncertainty and lack of knowledge. It’s hard to understand among a confusion of terms…
Can we clarify our language, establish clearer definitions, and thereby establish better common ground? One hopes that is possible, but we must start with the core term, intelligence, and ask the question:
What is intelligence?…
Perhaps asking “What is AI?” and the hunt for “real” intelligence is the wrong question. Perhaps Collet is right about what intelligence really means, but misses the real question, which is “What does AI actually do?”
Why Scientists Embrace Being Wrong
Austin Morrissey has been dropping real heat with his thoughts on science and the scientific process (he’s also poetic, which is not something I expect from a science writer + hardcore researcher, but it’s very cool to see).
Science initiates through loss — not the simple loss of illusions, but the harder loss of our comfortable deceptions. The novice researcher arrives wealthy in borrowed knowledge, fluent in a language they haven’t earned, surrounded by instruments whose principles they recite but don’t grasp. Each day they perform rituals of science without understanding their meaning, like priests reciting prayers in forgotten tongues. The first real experiment is not with chemicals or cells, but with honesty: try to derive what you think you know, trace the path of discovery while pretending not to know its end.
This revelation of ignorance is a hazing rite. It is individual, internal, humiliating, and humorous. Only once complete ignorance is recognized — when utter incompetence is admitted — can scientific training begin. For the primary task of the scientist, secondary only to experimentation itself, is to observe, report, and interpret findings. Self-deception undermines the entire chain of inquiry as it undermines the investigator. For who can be honest with the world but not with themselves?
Fantastic overview of how scientists are utilizing AI. I wish more attention was given to stuff like this. Thank you Conor Griffin Hanna Schieve and Donald Wallace for taking the time to write this up.
A quiet revolution is brewing in labs around the world, where scientists’ use of AI is growing exponentially. One in three postdocs now use large language models to help carry out literature reviews, coding, and editing. In October, the creators of our AlphaFold 2 system, Demis Hassabis and John Jumper, became Nobel Laureates in Chemistry for using AI to predict the structure of proteins, alongside the scientist David Baker, for his work to design new proteins. Society will soon start to feel these benefits more directly, with drugs and materials designed with the help of AI currently making their way through development.
In this essay, we take a tour of how AI is transforming scientific disciplines from genomics to computer science to weather forecasting. Some scientists are training their own AI models, while others are fine-tuning existing AI models, or using these models’ predictions to accelerate their research. These scientists are using AI as a scientific instrument to help tackle important problems, such as designing proteins that bind more tightly to disease targets, but are also gradually transforming how science itself is practised.
Artificial Intelligence: Friend or Foe?
Great overview that hopefully helps you rethink fears around AI
-Anti-Yudkowsky, by the pseudonymous author “Harmless,” presents a case for a much more optimistic future with AI, one that does not end in human extinction or Butlerian Jihad,1 but rather in “AI Harmony.” It is a response to the ideas of Eliezer Yudkowsky, a very influential internet writer, AI safety researcher, and founder of LessWrong, among other things. Yudkowsky’s views on the future of AI technology are notoriously bleak and extreme, including calls for airstrikes on rogue data centers to mitigate the risk of human extinction.
Anti-Yudkowsky is a book with quite a range — providing critiques of Rationalist ideology, Bayesian reasoning, game theory, utilitarianism, evolutionary psychology, and more. The critiques of game theory and the exploration of the history of its application are particularly interesting. “Harmless” highlights deep similarities between the viewpoints of people like the father of the modern computer John Von Neumann and philosopher Bertrand Russell on the subject of the Cold War, and the attitudes of Yudkowsky and friends on the subject of AI. He shows how the strategic vision of thinkers like von Neumann and Russell at the time of first-strike nuclear attacks was well-informed in theory, but in retrospect would have resulted in catastrophe if it had actually been implemented.
Instead of saying too much, I’m going to highlight this quote from Eric Flaningam and tell you that he has a lot of these. Bro’s rate of dropping these gems is more consistent than my sleep schedule.
If we look at the history of disruption, it occurs when a new product offers less functionality at a much lower price that an incumbent can’t compete with. Mainframes gave way to minicomputers, minicomputers gave way to PCs, and PCs gave way to smartphones.
The key variable that opened the door for those disruptions was an oversupply of performance. Top-end solutions solved problems that most people didn’t have. Many computing disruptions came from decentralizing computing because consumers didn’t need the extra performance.
With AI, I don’t see that oversupply of performance yet. ChatGPT is good, but it’s not great yet. Once it becomes great, then the door is opened for AI at the edge. Small language models and NPUs will usher in that era. The question then becomes when, not if, AI happens at the edge.
Other Content
Benchmarking Uncertainty Disentanglement: Specialized Uncertainties for Specialized Tasks
Looks like S-Tier work, but I want to read this more before making any judgements.
Uncertainty quantification, once a singular task, has evolved into a spectrum of tasks, including abstained prediction, out-of-distribution detection, and aleatoric uncertainty quantification. The latest goal is disentanglement: the construction of multiple estimators that are each tailored to one and only one source of uncertainty. This paper presents the first benchmark of uncertainty disentanglement. We reimplement and evaluate a comprehensive range of uncertainty estimators, from Bayesian over evidential to deterministic ones, across a diverse range of uncertainty tasks on ImageNet. We find that, despite recent theoretical endeavors, no existing approach provides pairs of disentangled uncertainty estimators in practice. We further find that specialized uncertainty tasks are harder than predictive uncertainty tasks, where we observe saturating performance. Our results provide both practical advice for which uncertainty estimators to use for which specific task, and reveal opportunities for future research toward task-centric and disentangled uncertainties. All our reimplementations and Weights & Biases logs are available at this https URL.
Same comment as above. Looks like all my lobbying for transparency is pushing some changes-
Gemma Scope is a research tool for analyzing and understanding the inner workings of the Gemma 2 generative AI models. The tool allows you to examine the behavior of individual AI model layers of Gemma 2 models, while the model is processing requests. Researchers can apply this technique to examine and help address critical concerns such as hallucinations, biases, and manipulation, ultimately leading to safer and more trustworthy AI systems.
This tool provides researchers with a suite of sparse autoencoders for examination of the features and representations learned by Gemma 2 base models. You use the tool by instrumenting a Gemma 2 model with the provided autoencoders, which allow you to examine the behavior of individual AI model layers, while processing requests. For more information on how to analyze Gemma 2 models with this tool, see the Gemma Scope guide.
Genghis Khan: How a C-tier General became an S-Tier Conqueror
Let’s talk top-tier military geniuses. Julius Caesar? The guy outsmarted his enemies while outnumbered, deep in hostile territory. Alexander the Great? The dude’s entire strategy was invade anything that’s East and never lose — and then he never lost. Genghis Khan? His track record ain’t quite so flawless. Did he unite the Mongols? Eventually, after getting sent back to square one a couple times. Did he conquer a kingdom or two? Sure, but they were going down anyway. The Mongol Empire that stretched from China to Eastern Europe? A lot of that was his kids and grandkids.
But there’s the catch: Caesar conquered Gaul, six years later he’s bleeding out on the Senate floor. Alexander the Great? Yolo’d his way across Persia — the second he died, so did his empire. Genghis Khan? When he died, the Mongols were just getting started. The last descendant of Genghis Khan to rule part of the Mongol Empire, get this, died in 1930. Genghis Khan’s secret spice wasn’t tactics, it was loyalty.
AI/ML tools and startups for preclinical drug discovery
Another great overview from Marina T Alamanou, PhD -
An overview of AI/ML startups 🚀 transforming preclinical drug discovery
If you liked this article and wish to share it, please refer to the following guidelines.
Reach out to me
Use the links below to check out my other content, learn more about tutoring, reach out to me about projects, or just to say hi.
Small Snippets about Tech, AI and Machine Learning over here
AI Newsletter- https://artificialintelligencemadesimple.substack.com/
My grandma’s favorite Tech Newsletter- https://codinginterviewsmadesimple.substack.com/
My (imaginary) sister’s favorite MLOps Podcast (over here)-
Check out my other articles on Medium. : https://rb.gy/zn1aiu
My YouTube: https://rb.gy/88iwdd
Reach out to me on LinkedIn. Let’s connect: https://rb.gy/m5ok2y
My Instagram: https://rb.gy/gmvuy9
My Twitter: https://twitter.com/Machine01776819