Content Recommendations- 5/21/2025 [Updates]
What you should know in AI, Software, Business, and Tech
It takes time to create work that’s clear, independent, and genuinely useful. If you’ve found value in this newsletter, consider becoming a paid subscriber. It helps me dive deeper into research, reach more people, stay free from ads/hidden agendas, and supports my crippling chocolate milk addiction. We run on a “pay what you can” model—so if you believe in the mission, there’s likely a plan that fits (over here).
Every subscription helps me stay independent, avoid clickbait, and focus on depth over noise, and I deeply appreciate everyone who chooses to support our cult.
PS – Supporting this work doesn’t have to come out of your pocket. If you read this as part of your professional development, you can use this email template to request reimbursement for your subscription.
Every month, the Chocolate Milk Cult reaches over a million Builders, Investors, Policy Makers, Leaders, and more. If you’d like to meet other members of our community, please fill out this contact form here (I will never sell your data nor will I make intros w/o your explicit permission)- https://forms.gle/Pi1pGLuS1FmzXoLr6
A lot of people reach out to me for reading recommendations. I figured I’d start sharing whatever AI Papers/Publications, interesting books, videos, etc I came across each week. Some will be technical, others not really. I will add whatever content I found really informative (and I remembered throughout the week). These won’t always be the most recent publications- just the ones I’m paying attention to this week. Without further ado, here are interesting readings/viewings for 5/21/2025. If you missed last time’s readings, you can find it here.
Reminder- We started an AI Made Simple Subreddit. Come join us over here- https://www.reddit.com/r/AIMadeSimple/. If you’d like to stay on top of community events and updates, join the discord for our cult here: https://discord.com/invite/EgrVtXSjYf. Lastly, if you’d like to get involved in our many fun discussions, you should join the Substack Group Chat Over here. Working on something fun and want to meet people from our cult? Fill this form out form.
Community Spotlight: Voyage AI by MongoDB
Voyage AI by MongoDB is a very cool AI startup with the best multi-modal embedding models I’ve used so far. The pricing is solid, they give you lots of free tokens, have a strong research team that’s making products better. If you’re building retrieval-based products, then it’s a great tool to try out. PS- This recommendation is not sponsored (I don’t do ANY sponsored shoutouts/recommendations on this newsletter) and I have no commercial affiliations with them.
If you’re doing interesting work and would like to be featured in the spotlight section, just drop your introduction in the comments/by reaching out to me. There are no rules- you could talk about a paper you’ve written, an interesting project you’ve worked on, some personal challenge you’re working on, ask me to promote your company/product, or anything else you consider important. The goal is to get to know you better, and possibly connect you with interesting people in our chocolate milk cult. No costs/obligations are attached.
Previews
Curious about what articles I’m working on? Here are the previews for the next planned articles-
— -
How I write this newsletter
I provide various consulting and advisory services. If you‘d like to explore how we can work together, reach out to me through any of my socials over here or reply to this email.
Highly Recommended
These are pieces that I feel are particularly well done or important. If you don’t have much time, make sure you at least catch these works.
Gemini Diffusion is our new experimental research model.
Gemini is switching from Auto-Regression to Diffusion? Comes as a shock to most people, but not to the kids that studied our AI Market Research for April 2025- The AI Infrastructure Phase Has Begun. We specifically covered how Diffusion models are primed to be the next trend and how businesses should start factoring that in.

“We’re always working on new approaches to improve our models, including making them more efficient and performant. Our latest research model, Gemini Diffusion, is a state-of-the-art text diffusion model that learns to generate outputs by converting random noise into coherent text or code, like how our current state-of-the-art models in image and video generation work.
The experimental demo of Gemini Diffusion released today generates content significantly faster than our fastest model so far, while matching its coding performance. If you’re interested in getting access to the demo, please sign up for the waitlist.
We’ll continue our work on different approaches to lowering latency in all our Gemini models, with a faster 2.5 Flash Lite coming soon.”
Everything Google Announced at I/O 2025
Great coverage by WIRED. A few major comments/concerns I have-
With Gen AI search- what happens to publishers? Isn’t AI ruining itself by removing the incentives for great writers to publish things? All the work we put in will be used in an AI summary, with almost noone reading the piece. And no clicks, views, web-hits all negatively impact publishers. Will be see a reshaping of the way publishers monetize their work? What does this do to non-traditional writing?
I’d love to see the reasoning and model layers detached from each other so that we can analyse the ROI from both. This is likely a place we’re headed to anyway, but would love a speedup.
The AI film maker- wonder how well it will be received when so many artists I know hate AI. I think some of their arguments are valid, others not so much, but this will be an interesting time to see how people stick to their guns vs bowing to the march of tech.
How do the next OSs/data ordering get defined when Agents are to do browsing/interactions with the systems?
Think Only When You Need with Large Hybrid-Reasoning Models
Really cool share by Aleksa Gordić . The trend of dynamically adjusting compute is a huge one to look through as things proceed.
“Recent Large Reasoning Models (LRMs) have shown substantially improved reasoning capabilities over traditional Large Language Models (LLMs) by incorporating extended thinking processes prior to producing final responses. However, excessively lengthy thinking introduces substantial overhead in terms of token consumption and latency, which is particularly unnecessary for simple queries. In this work, we introduce Large Hybrid-Reasoning Models (LHRMs), the first kind of model capable of adaptively determining whether to perform thinking based on the contextual information of user queries. To achieve this, we propose a two-stage training pipeline comprising Hybrid Fine-Tuning (HFT) as a cold start, followed by online reinforcement learning with the proposed Hybrid Group Policy Optimization (HGPO) to implicitly learn to select the appropriate thinking mode. Furthermore, we introduce a metric called Hybrid Accuracy to quantitatively assess the model’s capability for hybrid thinking. Extensive experimental results show that LHRMs can adaptively perform hybrid thinking on queries of varying difficulty and type. It outperforms existing LRMs and LLMs in reasoning and general capabilities while significantly improving efficiency. Together, our work advocates for a reconsideration of the appropriate use of extended thinking processes and provides a solid starting point for building hybrid thinking systems.”
The $2 Billion Vodka Lie: How Grey Goose Fooled the World
Great video on positioning and communications to establish brands in competitive spaces. Going to be VERY relevant to AI going forward. Apple kind of did it with their devices, curious to see how this keeps advancing.
“Why is Grey Goose seen as the ultimate luxury vodka — even though blind taste tests say otherwise? In this deep dive, we reveal the real story behind the rise of Grey Goose, the marketing mastermind Sidney Frank, and how one French bottle flipped the entire spirits industry on its head.
From psychological pricing to influencer seeding before it had a name, this is the story of how a vodka with no heritage outsold brands with centuries of history. Whether you’re a marketer, entrepreneur, or just a curious drinker — this breakdown will change how you think about branding forever.”
This thread on NVIDIA’s most recent drops
While we wait for Austin Lyons to do a 3 hour breakdown of the 300 angles that we plebs missed from Nvidia’s announcements, please read this great summary by Shruti Mishra
“BREAKING: NVIDIA JUST announced roadmap for physical AI, robotics and national-scale AI factories.
Here’s a breakdown of the top important announcements: 🧵👇
1. DeepSeek R1 is now 4x faster, setting the standard for AI in inference and reasoning.
2. NVIDIA is building the GPT of humanoid robots. They just launched Isaac GR00T N1.5 — a foundation model for general purpose robotics. Here’s how it works: → A human demos the task once → Cosmos (their physics AI model) generates 1,000s of variations → Omniverse simulates the motions in high fidelity → The robot trains entirely in simulation → Then fine-tunes itself in the real world Robots can now learn general skills across tasks, tools, even body types with just one human demo. AI isn’t just thinking in text anymore. It’s perceiving. Reasoning. Moving. Physical AI is here and it’s training itself.
3. Open-source physics engine for robotics (July launch) Built with Disney + DeepMind. → GPU-accelerated → High-fidelity soft + rigid body simulation → Differentiable → Real-time training Will become the training ground for physical AI — humanoid robotics, drones, self-driving systems, and more.
4. Grace Blackwell is the new industrial brain → 1.5x inference speed → 2x networking bandwidth → 1.5x memory → Fully liquid-cooled rack-scale system → 40 PFLOPS per node — replacing supercomputers like Sierra Already live at xAI, CoreWeave, Oracle, and others.
5. CUDA-X is quietly eating every domain CUDA isn’t just about graphics anymore. It’s powering: → 6G radios (Aerial) → Genomics (Parabricks) → Weather sim (Earth-2) → Quantum computing (cuQuantum) → Chip lithography (cuLitho) → Supply chains (cuOpt) → Medical imaging (MONAI) → Sparse simulation (cuSPARSE) CUDA-X is now the OS layer for accelerated science.
… “(there’s a few more, please look at the original thread for it).
How to Get Your First 1,000 Players: White Knuckle Edition
Ryan K. Rigney doing Rigney tings by dropping this slept on banger. Rigs has some of the best insights on gaming, social media, and the and community. As a reader, you can clearly feel his love for this space, and it makes all his articles brilliant- even to someone like me who has no interest in the fields themselves.
“I talk to a lot of early-stage game developers as part of my day job.
One of the biggest problems they all face early on is “how to build a community” as a new studio. This conversation usually starts around hiring: Who should we hire to run community for us? But if you dig into the question, what a lot of people really want to know is something like: How do I get anybody to care about my game? And the desperate hope is that maybe that’s a problem they can delegate to somebody.
Everybody wants a million players. But how do you get that initial traction?…”
Rethinking Risk in Healthcare AI
A brilliant dissection by Sarah Gebauer, MD. This idea extends way beyond healthcare- it's not about AI safety. It’s about shifting the frame: from model explainability to systemic fault tolerance
“I’ve always been a terrible surfer. The handful of times I’ve tried, I ended up mostly slamming into the ocean while all my kids somehow glided effortlessly over the waves. It’s not for lack of trying; I’ve watched videos and even taken lessons. Despite my complete lack of aptitude, I appreciate the sport for its clear analogy to life’s unpredictability. No matter how much you know what you’re supposed to do, you can’t predict exactly when the next swell will hit or how big it will be. All you can do is learn to recognize the patterns, adjust your stance, and be ready for whatever comes.
Healthcare AI is much the same. Hospitals are starting to implement healthcare AI applications. But when it comes time for vendors to answer basic governance questions like “What training data did you use?” or “Is the model explainable?”, the responses are often unsatisfying and vague.
Why?
Because these questions are the wrong ones. In a complex, evolving system like healthcare AI, focusing solely on the technology itself is like trying to predict every single wave in the ocean. Instead, we need to understand the underlying currents — the risk categories and domains that can fundamentally disrupt clinical practice.”
BREAKING: UnitedHealth Bleeds. CEO Witty Steps Down.
It’s Sergei going from blood. What more do you want? Remind me not to piss him off.
Some very good finds here for more paper recs. I particularly liked “Cost-Efficient, Low-Latency Vector Search”
Vector indexing enables semantic search over diverse corpora and has become an important interface to databases for both users and AI agents. Efficient vector search requires deep optimizations in database systems. This has motivated a new class of specialized vector databases that optimize for vector search quality and cost. Instead, we argue that a scalable, high-performance, and cost-efficient vector search system can be built inside a cloud-native operational database like Azure Cosmos DB while leveraging the benefits of a distributed database such as high availability, durability, and scale. We do this by deeply integrating DiskANN, a state-of-the-art vector indexing library, inside Azure Cosmos DB NoSQL. This system uses a single vector index per partition stored in existing index trees, and kept in sync with underlying data. It supports < 20ms query latency over an index spanning 10 million of vectors, has stable recall over updates, and offers nearly 15x and 41x lower query cost compared to Zilliz and Pinecone serverless enterprise products. It also scales out to billions of vectors via automatic partitioning. This convergent design presents a point in favor of integrating vector indices into operational databases in the context of recent debates on specialized vector databases, and offers a template for vector indexing in other databases.
Other Content
Earth’s Pettiest Hero | The Life & Times of Cicero
Are Boomers The Most Selfish Generation In History?
“Why is Gen Z stuck in a world drowning in wealth? Homes, families, and financial freedom — milestones older generations nailed, but young folks are falling behind! Are boomers hogging the riches, or is inequality the sneaky villain? From crushing student debt to tech-creating billionaire gaps, we uncover the shocking truth! Could Gen Z still become the richest and poorest generation?”
Turns out we have a gig-economy version of the Yakuza now. Wild wild times.
“The Yakuza is collapsing, destroyed by Gen Z and the cultural and generational tension they caused. I explore how legal crackdowns, economic shifts, and a cultural clash with Japan’s youngest generation have pushed Japan’s most infamous organized crime syndicates to the brink how “Tokuryu”, a new form of decentralized, digital-native organized crime is rising to take its place. This is a case study of generational tension that’s not just applicable to gangs, but work, family, maybe even the future itself.”
LLMs Get Lost In Multi-Turn Conversation
If this was a few years ago, this would be the top of the reading list and have it’s own breakdown. But now I think this is just an idea we understand very well. Still an interesting read, but not as surprising any more.
“Large Language Models (LLMs) are conversational interfaces. As such, LLMs have the potential to assist their users not only when they can fully specify the task at hand, but also to help them define, explore, and refine what they need through multi-turn conversational exchange. Although analysis of LLM conversation logs has confirmed that underspecification occurs frequently in user instructions, LLM evaluation has predominantly focused on the single-turn, fully-specified instruction setting. In this work, we perform large-scale simulation experiments to compare LLM performance in single- and multi-turn settings. Our experiments confirm that all the top open- and closed-weight LLMs we test exhibit significantly lower performance in multi-turn conversations than single-turn, with an average drop of 39% across six generation tasks. Analysis of 200,000+ simulated conversations decomposes the performance degradation into two components: a minor loss in aptitude and a significant increase in unreliability. We find that LLMs often make assumptions in early turns and prematurely attempt to generate final solutions, on which they overly rely. In simpler terms, we discover that when LLMs take a wrong turn in a conversation, they get lost and do not recover.”
If you liked this article and wish to share it, please refer to the following guidelines.
Reach out to me
Use the links below to check out my other content, learn more about tutoring, reach out to me about projects, or just to say hi.
Small Snippets about Tech, AI and Machine Learning over here
My grandma’s favorite Tech Newsletter-
My (imaginary) sister’s favorite MLOps Podcast-
Check out my other articles on Medium. : https://rb.gy/zn1aiu
My YouTube: https://rb.gy/88iwdd
Reach out to me on LinkedIn. Let’s connect: https://rb.gy/m5ok2y
My Instagram: https://rb.gy/gmvuy9
My Twitter: https://twitter.com/Machine01776819