Content Recommendations- 11/20/24 [Updates]
Interesting Content in AI, Software, Business, and Tech- 11/20/2024
Hey, it’s Devansh 👋👋
In issues of Updates, I will share interesting content I came across. While the focus will be on AI and Tech, the ideas might range from business, philosophy, ethics, and much more. The goal is to share interesting content with y’all so that you can get a peek behind the scenes into my research process.
I put a lot of effort into creating work that is informative, useful, and independent from undue influence. If you’d like to support my writing, please consider becoming a paid subscriber to this newsletter. Doing so helps me put more effort into writing/research, reach more people, and supports my crippling chocolate milk addiction. Help me democratize the most important ideas in AI Research and Engineering to over 100K readers weekly. You can use the following for an email template.
PS- We follow a “pay what you can” model, which allows you to support within your means, and support my mission of providing high-quality technical education to everyone for less than the price of a cup of coffee. Check out this post for more details and to find a plan that works for you.
A lot of people reach out to me for reading recommendations. I figured I’d start sharing whatever AI Papers/Publications, interesting books, videos, etc I came across each week. Some will be technical, others not really. I will add whatever content I found really informative (and I remembered throughout the week). These won’t always be the most recent publications- just the ones I’m paying attention to this week. Without further ado, here are interesting readings/viewings for 11/20/2024. If you missed last week’s readings, you can find it here.
Reminder- We started an AI Made Simple Subreddit. Come join us over here- https://www.reddit.com/r/AIMadeSimple/. If you’d like to stay on top of community events and updates, join the discord for our cult here: https://discord.com/invite/EgrVtXSjYf. Lastly, if you’d like to get involved in our many fun discussions, you should join the Substack Group Chat Over here.
Community Spotlight:
I’ve gotten a few comments on my articles being very long and detailed. Most recently, got this message- “…Also, more and shorter posts might be helpful, since your (truly exemplary) content takes work and time to swallow and digest. Normally, I don’t favor “bite-sized” content, however I find myself saving your articles “for later” and sometimes not getting back around to them…”
This is something I’ve been thinking about for a while, but I’m not sure what I should do here. My work is long for 2 reasons-
I think there are a lot nuances and important details in AI that people need to understand (which are pretty much always missed in most online discussions of the topics). In my mind, sharing the nuances will enable all of you to make better AI-related decisions. I’ve seen a lot of “AI Bites” kind of content, which is fine for news- but their analysis is almost always worthless. I don’t want to do that.
I don’t want to email you too often (it works better for both me and I’ve been told that daily emails can be a bit annoying).
So far, my first instinct has been to use other platforms (LinkedIn, Threads, my other newsletter, etc) for snippets so that people can keep finding older work. But I am aware that it doesn’t really have the same experience to a reader. I’ve had some requests for starting a podcast so that you can listen while doing something else. What is the best way to make the content better for you? Any ideas would be appreciated.
If you’re doing interesting work and would like to be featured in the spotlight section, just drop your introduction in the comments/by reaching out to me. There are no rules- you could talk about a paper you’ve written, an interesting project you’ve worked on, some personal challenge you’re working on, ask me to promote your company/product, or anything else you consider important. The goal is to get to know you better, and possibly connect you with interesting people in our chocolate milk cult. No costs/obligations are attached.
Previews
Curious about what articles I’m working on? Here are the previews for the next planned articles-
-
Mixtures of Experts Unlock Parameter Scaling for Deep RL
I provide various consulting and advisory services. If you‘d like to explore how we can work together, reach out to me through any of my socials over here or reply to this email.
Highly Recommended
These are pieces that I feel are particularly well done. If you don’t have much time, make sure you at least catch these works.
7 Days of agent framework anatomy from first-principles: Day 1
I love the no-framework, code sharing, and insights. I have to re-read this a bunch of times to fully integrate w/ the insights, but there were lots of great things to think about. I’ll be reading through the series with a lot of excitement.
This is a series of articles on interacting with large language models and building (data intensive) agent systems from first principles. Im using several different models for comparison (Claude, GPT, LLama3.1, Gemini) and setting some constraints. The general approach i’m taking is to not use any frameworks and get as close to the APIs as possible sending only messages and functions. I do not use specific features of APIs at this stage. Im building in Python. We will build an agentic RAG framework from scratch….
I described what I think is the core part of an agent system by evaluating several language models. I abstracted the message stack and function stack into a Runner and reduced the agent to an “OOG” Markdown entity.
Here are some other take-aways;
To build an agent system using any of the foundation models you need an executor loop that can manage a dynamic function and message stack
All agents e.g. for RAG based systems consist of an overall “system” prompt, available functions (optional)and structured output (optional) — in funkyprompt this is referred to as OOG
Pydantic and Markdown are two very useful representations of OOG entities and can be used to generate system prompts to guide agents
Dynamic function calling means that functions can be discovered and activated during the execution loop (which we turn to in the next article)
We have tried four models. LLama(80b) did not perform well in the reliable structured output test.
Working with Gemini was a little bit frustrating because the API is too googley and the model is maybe a bit too googley too and requires me to work harder.
My recommendation to all these API developers offering large language would be to always offer the following so we can remove a lot of boilerplate code that is used just to talk to the language models;
For tools, a “from_function” implementation that maps to a valid OpenAPI Json Schema format for their API when given Python (or any language) annotated functions
A consistent standard for message stacks and roles with system vs other prompts since these seems like just semantics.
What it’s Like to Work in AI and Advice from 10 AI Professionals
An excellent community effort by Logan Thorneloe to poll the AI community on Substack on what they do (ofcourse I also shared my thoughts). Really loved Sergei Polevikov, ABD, MBA, MS, MA 🇮🇱🇺🇦 ‘s comment in particular-
What advice would you give to someone wanting to work in ML/what other important things would you like to share with readers?
My advice to those interested in AI is the same as my general life philosophy: learn from smart people but think independently. Even the most brilliant minds can’t predict the future, so instead of chasing fads hyped on social media, invest in building a broad base of both technical and common-sense skills. It’s crucial to stay curious and learn something new every day, even if it seems unrelated at first. Over time, these experiences will contribute to a rewarding and fulfilling career — and life.
An excellent piece by Michael Woudenberg on Cognitive Blindspots, which hamper our perceptions. Always worth remembering the limits of our perception.
Today’s topic takes a different look at our brain, how it operates, and what this means for us in work, life, and play. Hang tight as we explore the things we see and don’t see, and the fictions we create to make sense of the world around us. Don’t worry though, at the end, you’ll also have some tools on hand to check your blind spots and reframe your reality to ensure a solid foundation.
Throughout this essay are optical illusions to enjoy. Many work because of the blind spot in our eyes due to the optic nerve that our brain ‘smooths’ over. Others work because our brain forces a contextualization that does not exist in the image. These illuminate how what we see is being filtered through our own perceptions.
Steroids Are NOT Functional… They’re LAME
Here is a bit of personal information about me: I used to compete in combat sports (fairly seriously). Mostly in the underground promotions w/ no drug testing (and a lot of my fights were also open weight, with multiple bouts in one night). Roids were easily accessible, and it’s always tempting to try them when you hear about how much they can boost recovery, break your Genetic Limits etc etc. Never went down that path, but I know the temptation very well. A video like this would have been extremely helpful back then, and hopefully, it helps someone here (roids are a lot more popular than people realize).
In this video, I explain why I think that steroids are lame. That is to say: I don’t just think they’re a bad idea for your health. I also think they expose fundamental insecurities and ultimately do little to make you a more impressive athlete.
We hear all the time that steroids are bad for you. They damage your liver, endanger your heart, lower natural testosterone production, trigger hairless and gyna… None of these things seem to be enough to deter guys from taking them, though. Why? Because they’re obsessed with the idea of being stronger and more “alpha.”
That’s why I wanted to make this video. To point out that you’re not ACTUALLY more alpha, more of a specimen, or anything if you take steroids. You’re just more lame. They’re bad for your cardio and can put you at risk of collapsing when you exert yourself. They make you dumber and more emotional. They can lead to the imbalanced development of muscle and increase your chances of injury. You’re not a badass if you take steroids… you’re just being a very silly boy.
The Artificial Investor — Issue 37: Is this the end of the current AI wave?
Interesting analysis of the whole scaling debates, from a market/investor perspective. Wonderful work by Aristotelis Xenofontos .
The most interesting story of last week was leaked news about the fact that OpenAI and Google researchers working on the next version of their respective LLMs realised that the new models are not better than the latest versions. These can be the first signs that the “AI Scaling Law” is not working anymore.
What does this mean about the latest AI wave? Are we entering another AI winter? Is this the end of the road in our pursuit to super intelligence?
Distinguishing Ignorance from Error in LLM Hallucinations
Large language models (LLMs) are susceptible to hallucinations-outputs that are ungrounded, factually incorrect, or inconsistent with prior generations. We focus on close-book Question Answering (CBQA), where previous work has not fully addressed the distinction between two possible kinds of hallucinations, namely, whether the model (1) does not hold the correct answer in its parameters or (2) answers incorrectly despite having the required knowledge. We argue that distinguishing these cases is crucial for detecting and mitigating hallucinations. Specifically, case (2) may be mitigated by intervening in the model’s internal computation, as the knowledge resides within the model’s parameters. In contrast, in case (1) there is no parametric knowledge to leverage for mitigation, so it should be addressed by resorting to an external knowledge source or abstaining. To help distinguish between the two cases, we introduce Wrong Answer despite having Correct Knowledge (WACK), an approach for constructing model-specific datasets for the second hallucination type. Our probing experiments indicate that the two kinds of hallucinations are represented differently in the model’s inner states. Next, we show that datasets constructed using WACK exhibit variations across models, demonstrating that even when models share knowledge of certain facts, they still vary in the specific examples that lead to hallucinations. Finally, we show that training a probe on our WACK datasets leads to better hallucination detection of case (2) hallucinations than using the common generic one-size-fits-all datasets. The code is available at this https URL .
AI Data Centers, Part 2: Energy
Meant to share this earlier, but here is an excellent overview of the market by Eric Flaningam
Among the many bottlenecks for AI data centers, energy might be the most important and the most difficult to address. IF estimates of data center energy consumption turn out to be true (or even in the vicinity of truth), our current energy infrastructure will not be able to support those demands.
Before the AI boom, data center power consumption was expected to grow consistently. Compute demands would continue to grow; data center centers would grow to meet that demand.
However, with the addition of AI and its power-hungry architectures, estimates are up and to the right!
A thought provoking piece by Andrew Smith
When you’re a grownup at work, you’re not supposed to play around, but who among us doesn’t find a way to sneak little games in every now and then? Maybe it’s a little video game, or maybe it’s scrolling through a social media feed. Wait… are you supposed to be working right now? I mean, nothing.
We have this idea ingrained into us at a very early age, and it stays with us all the way through our entire adult lives. Work is something to be praised, while play is to be minimalized, the idea goes.
Is that really true, though?
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Large language models (LLMs) are expensive to deploy. Parameter sharing offers a possible path towards reducing their size and cost, but its effectiveness in modern LLMs remains fairly limited. In this work, we revisit “layer tying” as form of parameter sharing in Transformers, and introduce novel methods for converting existing LLMs into smaller “Recursive Transformers” that share parameters across layers, with minimal loss of performance. Here, our Recursive Transformers are efficiently initialized from standard pretrained Transformers, but only use a single block of unique layers that is then repeated multiple times in a loop. We further improve performance by introducing Relaxed Recursive Transformers that add flexibility to the layer tying constraint via depth-wise low-rank adaptation (LoRA) modules, yet still preserve the compactness of the overall model. We show that our recursive models (e.g., recursive Gemma 1B) outperform both similar-sized vanilla pretrained models (such as TinyLlama 1.1B and Pythia 1B) and knowledge distillation baselines — and can even recover most of the performance of the original “full-size” model (e.g., Gemma 2B with no shared parameters). Finally, we propose Continuous Depth-wise Batching, a promising new inference paradigm enabled by the Recursive Transformer when paired with early exiting. In a theoretical analysis, we show that this has the potential to lead to significant (2–3x) gains in inference throughput.
FOD#71: Matryoshka against Transformers
we explore the new Matryoshka State Space Model, its advantages over Transformers, and offer a carefully curated list of recent news and papers
New nuclear clean energy agreement with Kairos Power
Since pioneering the first corporate purchase agreements for renewable electricity over a decade ago, Google has played a pivotal role in accelerating clean energy solutions, including the next generation of advanced clean technologies. Today, we’re building on these efforts by signing the world’s first corporate agreement to purchase nuclear energy from multiple small modular reactors (SMRs) to be developed by Kairos Power. The initial phase of work is intended to bring Kairos Power’s first SMR online quickly and safely by 2030, followed by additional reactor deployments through 2035. Overall, this deal will enable up to 500 MW of new 24/7 carbon-free power to U.S. electricity grids and help more communities benefit from clean and affordable nuclear power.
This agreement is important for two reasons:
The grid needs new electricity sources to support AI technologies that are powering major scientific advances, improving services for businesses and customers, and driving national competitiveness and economic growth. This agreement helps accelerate a new technology to meet energy needs cleanly and reliably, and unlock the full potential of AI for everyone.
Nuclear solutions offer a clean, round-the-clock power source that can help us reliably meet electricity demands with carbon-free energy every hour of every day. Advancing these power sources in close partnership with supportive local communities will rapidly drive the decarbonization of electricity grids around the world.
Short list coz I’ve been traveling this week (thank you to all the cultists in Boston for all the love- even though we had a fairly last-minute announcement on the GC <3).
If you liked this article and wish to share it, please refer to the following guidelines.
Reach out to me
Use the links below to check out my other content, learn more about tutoring, reach out to me about projects, or just to say hi.
Small Snippets about Tech, AI and Machine Learning over here
AI Newsletter- https://artificialintelligencemadesimple.substack.com/
My grandma’s favorite Tech Newsletter- https://codinginterviewsmadesimple.substack.com/
Check out my other articles on Medium. : https://rb.gy/zn1aiu
My YouTube: https://rb.gy/88iwdd
Reach out to me on LinkedIn. Let’s connect: https://rb.gy/m5ok2y
My Instagram: https://rb.gy/gmvuy9
My Twitter: https://twitter.com/Machine01776819
What about keeping the long format and running something like notebook lm audio overview for 'more digestible?' Some of the nuance and depth you provide may be missed but that's better than having digested none at all?
<3