Content Recommendations- 12/4/2024 [Updates]
Interesting Content in AI, Software, Business, and Tech- 12/4/2024
Hey, it’s Devansh 👋👋
In issues of Updates, I will share interesting content I came across. While the focus will be on AI and Tech, the ideas might range from business, philosophy, ethics, and much more. The goal is to share interesting content with y’all so that you can get a peek behind the scenes into my research process.
I put a lot of effort into creating work that is informative, useful, and independent from undue influence. If you’d like to support my writing, please consider becoming a paid subscriber to this newsletter. Doing so helps me put more effort into writing/research, reach more people, and supports my crippling chocolate milk addiction. Help me democratize the most important ideas in AI Research and Engineering to over 100K readers weekly. You can use the following for an email template.
PS- We follow a “pay what you can” model, which allows you to support within your means, and support my mission of providing high-quality technical education to everyone for less than the price of a cup of coffee. Check out this post for more details and to find a plan that works for you.
It's been a while since we've done this. My apologies; I was learning how to do backflips and lost track of time. To make up for it, I will bring some very elite recs this week. I wish I could be more regular with these lists- but I have a lot going on, and I tend to sacrifice these lists because these aren't as much fun to write .
If regular lists are something you want, cultist, ML Engineer, and writer- Logan Thorneloe does some excellent compilations that will keep you updated with the field. I'll still do these lists when I have the space b/c I enjoy the community aspect of it- but my priority is the research/market breakdowns and I think I have a lot more value to add through those.
A lot of people reach out to me for reading recommendations. I figured I’d start sharing whatever AI Papers/Publications, interesting books, videos, etc I came across each week. Some will be technical, others not really. I will add whatever content I found really informative (and I remembered throughout the week). These won’t always be the most recent publications- just the ones I’m paying attention to this week. Without further ado, here are interesting readings/viewings for 1/8/2025. If you missed last week’s readings, you can find it here.
Reminder- We started an AI Made Simple Subreddit. Come join us over here- https://www.reddit.com/r/AIMadeSimple/. If you’d like to stay on top of community events and updates, join the discord for our cult here: https://discord.com/invite/EgrVtXSjYf. Lastly, if you’d like to get involved in our many fun discussions, you should join the Substack Group Chat Over here.
Community Spotlight: Burak Buyukdemir
Burak Buyukdemir is the founder of the Startup Istanbul community. He regularly shares some massive truth bombs for founders and entrepreneurs (look at the first article we recommend today for example). If you’re looking for real advice from an investor whose guided thousands, then Burak is one of best sources for it.
If you’re doing interesting work and would like to be featured in the spotlight section, just drop your introduction in the comments/by reaching out to me. There are no rules- you could talk about a paper you’ve written, an interesting project you’ve worked on, some personal challenge you’re working on, ask me to promote your company/product, or anything else you consider important. The goal is to get to know you better, and possibly connect you with interesting people in our chocolate milk cult. No costs/obligations are attached.
Previews
Curious about what articles I’m working on? Here are the previews for the next planned articles-
-
As I’ve mentioned a bit recently, AI Made Simple is about to hit its 2nd birthday. We will be hosting an AMA (ask me anything) to celebrate this. You can ask your questions in the comments, message me on any of the social media links, reply to this email, or drop any questions you have in this anonymous Google Form- https://docs.google.com/forms/d/1RkIJ6CIO1w7K77tt0krCGMjxrVp1tAqFy5_O2FSv80E/
Highly Recommended
These are pieces that I feel are particularly well done. If you don’t have much time, make sure you at least catch these works.
Never Give Advisor Equity- Why Most Startup Advisors Are Worthless (and How to Avoid Them)
Man woke up and chose violence. No other words needed.
Unpopular opinion that might get me in trouble, but it needs to be said:
“ Most startup advisors are worthless. “
Let’s talk about a critical, often overlooked aspect of building a startup: advisors. We’re told they’re necessary for success, but the truth is, many are simply a waste of valuable time and equity. It’s time we called out the fluff and focused on what actually moves the needle.
This isn’t about being cynical. It’s about being strategic. This is about protecting your dream, your effort, your cap table. Before you hand over a piece of your company, you must be vigilant.
Let’s break down why most advisors are worthless and, more importantly, how to find the right ones.
Some very heavy allegations here. Normally I would stay away from talking about AI Drama, but the person making these is Jürgen Schmidhuber who has made some very deep contributions to AI (he was also cited in the background to the Physics Nobel Prize 2024, so I would take these pretty seriously). I wonder what we can do to ensure such stuff never happens? We need better ways to both prove/disprove allegations and maybe some tools for helping scientific discovery (so people can find and compare matches to previous works).
Modern AI is based on what’s now called “deep learning” with artificial neural networks (NNs).[DL1][DLH] The Nobel Prize in Physics 2024 was awarded for “foundational discoveries that enable machine learning with artificial NNs.”[Nob24a] Unfortunately, however, the awardees did not make any such foundational discoveries. Instead they republished methodologies developed in Ukraine and Japan in the 1960s and 1970s, as well as other techniques, without citing the original inventors. None of the important algorithms for modern AI were invented by them.
Why Most ML Teams will never Produce Business Impact
This list seems like it’s a bunch of people embracing Kalesh. That was not intentional (I promise). This thread by the insightful Damien Benveniste, PhD is an excellent analysis of how you can spot teams that do ML for the sake of it.
Most Machine Learning will never produce any business impact! There are many job openings for ML/AI engineers, but most of them will never lead to anything productive, usually thanks to leadership not understanding how to implement performant ML teams! When you interview for a job, there are a few red flags to look out for:
E-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts
Results look promising, but having done a lot of agent discussions over the past few months, one trend I’ve noticed is that Agents rely a bit on the users using them the right way. Humans matching things on the time gap is an interesting outcome, which I think creates a lot of questions that should be looked into. Pretty interesting stuff. Credit Ethan Mollick for the find.
Frontier AI safety policies highlight automation of AI research and development (R&D) by AI agents as an important capability to anticipate. However, there exist few evaluations for AI R&D capabilities, and none that are highly realistic and have a direct comparison to human performance. We introduce RE-Bench (Research Engineering Benchmark, v1), which consists of 7 challenging, openended ML research engineering environments and data from 71 8-hour attempts by 61 distinct human experts. We confirm that our experts make progress in the environments given 8 hours, with 82% of expert attempts achieving a non-zero score and 24% matching or exceeding our strong reference solutions. We compare humans to several public frontier models through best-of-k with varying time budgets and agent designs, and find that the best AI agents achieve a score 4× higher than human experts when both are given a total time budget of 2 hours per environment. However, humans currently display better returns to increasing time budgets, narrowly exceeding the top AI agent scores given an 8-hour budget, and achieving 2× the score of the top AI agent when both are given 32 total hours (across different attempts). Qualitatively, we find that modern AI agents possess significant expertise in many ML topics — e.g. an agent wrote a faster custom Triton kernel than any of our human experts’ — and can generate and test solutions over ten times faster than humans, at much lower cost. We open-source the evaluation environments, human expert data, analysis code and agent trajectories to facilitate future research.1
The Existential AI Threat & Game Theory
Hits the nail on the head wrt to AI risks. Mixes some tech history, AI issues, and Game Theory (a field more people should think about). Excellent work as always Tobias Jensen
Moloch is a symbol of bad incentive structures that lead to dysfunctional systems that reward self-serving actions and lead to a worse outcome for all. The basic game theory behind Moloch can be explained like this:
Let’s say you play a board game with your friends. Moloch emerges when one player in the game is slightly bending a rule and it gives that player a small advantage. If he or she gets away with it a new norm has been created. Soon, all the other players are forced to follow the same norm to stay competitive.
A few things to notice:
The cheating player has to gain an actual advantage by bending a rule. This will force the other players to change their game plan accordingly.
The rules of the game only change if the other players are aware of the rule-breaking and allow it.
The Moloch dynamic creates a race to the bottom. The game would have been fairer and more fun for all the players had they just stuck by the original rules.
In a board game, the rules are narrow and pre-determined. In a real-world scenario, the ruleset is infinitely more complex and oftentimes ambiguous.
…
The existential risk from AI is not pending doom and total destruction but the widening gap between the platforms owning and controlling the technology vs the users. If we think of AI as “more intelligence” and it accelerates the current structure of the so-called attention economy, we are heading towards a dark place.
Scaling Laws for LLMs: From GPT-3 to o3
After reading that conclusion (bolded), I fell (deeper) in love with Cameron R. Wolfe, Ph.D. and his brilliant mind. His article was a great historical overview of scaling and LLMs- and it left me with a lot of interesting tangents and ideas to explore.
For years, scaling laws have been a predictable North Star for AI research. In fact, the success of early frontier labs like OpenAI has even been credited to their religious level of belief in scaling laws. However, the continuation of scaling has recently been called into question by reports1 claiming that top research labs are struggling to create the next generation of better LLMs. These claims might lead us to wonder: Will scaling hit a wall and, if so, are there other paths forward?
This overview will answer these questions from the ground up, beginning with an in-depth explanation of LLM scaling laws and the surrounding research. The idea of a scaling law is simple, but there are a variety of public misconceptions around scaling — the science behind this research is actually very specific. Using this detailed understanding of scaling, we will then discuss recent trends in LLM research and contributing factors to the “plateau” of scaling laws. Finally, we will use this information to more clearly illustrate the future of AI research, focusing on a few key ideas — including scaling — that could continue to drive progress.
…
We now have a clearer view of scaling laws, their impact on LLMs, and the future directions of progress for AI research. As we have learned, there are many contributing factors to the recent criticism of scaling laws:
The natural decay in scaling laws.
The high variance in expectations of LLM capabilities.
The latency of large-scale, inter-disciplinary engineering efforts.
These issues are valid, but none of them indicate that scaling is not still working as expected. Investments into large-scale pretraining will (and should) continue, but improvements will become exponentially harder over time. As a result, alternative directions of progress (e.g., agents and reasoning) will become more important. As we invest into these new areas of research, however, the fundamental idea of scaling will continue to play a massive role. Whether scaling will continue is not a question. The true question is what we will scale next.
Chain-of-Thought Reasoning Without Prompting
Between this paper, Coconut (next), DGLM, Speculative Decoding, and others- I’m starting to think that traditional AR decoding (predict next token based on previous) might just fall out of fashion soon. Let me know you think. Credit to Rohan Paul for the paper recommendation.
In enhancing the reasoning capabilities of large language models (LLMs), prior research primarily focuses on specific prompting techniques such as few-shot or zero-shot chain-of-thought (CoT) prompting. These methods, while effective, often involve manually intensive prompt engineering. Our study takes a novel approach by asking: Can LLMs reason effectively without prompting? Our findings reveal that, intriguingly, CoT reasoning paths can be elicited from pre-trained LLMs by simply altering the \textit{decoding} process. Rather than conventional greedy decoding, we investigate the top-k alternative tokens, uncovering that CoT paths are frequently inherent in these sequences. This approach not only bypasses the confounders of prompting but also allows us to assess the LLMs’ \textit{intrinsic} reasoning abilities. Moreover, we observe that the presence of a CoT in the decoding path correlates with a higher confidence in the model’s decoded answer. This confidence metric effectively differentiates between CoT and non-CoT paths. Extensive empirical studies on various reasoning benchmarks show that the proposed CoT-decoding effectively elicits reasoning capabilities from language models, which were previously obscured by standard greedy decoding.
Training Large Language Models to Reason in a Continuous Latent Space
Same as above.
Large language models (LLMs) are restricted to reason in the “language space”, where they typically express the reasoning process with a chain-of-thought (CoT) to solve a complex reasoning problem. However, we argue that language space may not always be optimal for reasoning. For example, most word tokens are primarily for textual coherence and not essential for reasoning, while some critical tokens require complex planning and pose huge challenges to LLMs. To explore the potential of LLM reasoning in an unrestricted latent space instead of using natural language, we introduce a new paradigm Coconut (Chain of Continuous Thought). We utilize the last hidden state of the LLM as a representation of the reasoning state (termed “continuous thought”). Rather than decoding this into a word token, we feed it back to the LLM as the subsequent input embedding directly in the continuous space. Experiments show that Coconut can effectively augment the LLM on several reasoning tasks. This novel latent reasoning paradigm leads to emergent advanced reasoning patterns: the continuous thought can encode multiple alternative next reasoning steps, allowing the model to perform a breadth-first search (BFS) to solve the problem, rather than prematurely committing to a single deterministic path like CoT. Coconut outperforms CoT in certain logical reasoning tasks that require substantial backtracking during planning, with fewer thinking tokens during inference. These findings demonstrate the promise of latent reasoning and offer valuable insights for future research.
We Looked at 78 Election Deepfakes. Political Misinformation is not an AI Problem.
A while back, we did a mini-series on Deepfake Detection. Part 3 was my discussion on whether Deepfakes would be the massive problem people pretend they are. My argument was simple- any deepfake making drastic enough claims to completely change the perception of a public figure could easily be fact-checked (“Did X really say that”). Thus, the misinformation risk of Deepfakes was a lot lower than it might seem (read more at- Deepfakes Part 3: How Deepfakes Will Impact Society). Analysis from Arvind Narayanan and the AI Snake Oil team seems to back that up-
AI-generated misinformation was one of the top concerns during the 2024 U.S. presidential election. In January 2024, the World Economic Forum claimed that “misinformation and disinformation is the most severe short-term risk the world faces” and that “AI is amplifying manipulated and distorted information that could destabilize societies.” News headlines about elections in 2024 tell a similar story:
In contrast, in our past writing, we predicted that AI would not lead to a misinformation apocalypse. When Meta released its open-weight large language model (called LLaMA), we argued that it would not lead to a tidal wave of misinformation. And in a follow-up essay, we pointed out that the distribution of misinformation is the key bottleneck for influence operations, and while generative AI reduces the cost of creating misinformation, it does not reduce the cost of distributing it. A few other researchers have made similar arguments.
Which of these two perspectives better fits the facts?
Fortunately, we have the evidence of AI use in elections that took place around the globe in 2024 to help answer this question. Many news outlets and research projects have compiled known instances of AI-generated text and media and their impact. Instead of speculating about AI’s potential, we can look at its real-world impact to date.
We analyzed every instance of AI use in elections collected by the WIRED AI Elections Project, which tracked known uses of AI for creating political content during elections taking place in 2024 worldwide. In each case, we identified what AI was used for and estimated the cost of creating similar content without AI.
We find that (1) half of AI use isn’t deceptive, (2) deceptive content produced using AI is nevertheless cheap to replicate without AI, and (3) focusing on the demand for misinformation rather than the supply is a much more effective way to diagnose problems and identify interventions.
This video will change how you see Eren
Very interesting video on Eren’s story writing (and a great excuse to rewatch AoT). Thank you Izu Eneh
In this video I talk about Eren, his conclusion, and the ways it has been misunderstood.
Stop Thinking About Science (Start Doing It)
Austin Morrissey is a relative new comer to Substack writing, but he has been doing some excellent meta works on science and the process of research. Highly recommend following him.
Science, viewed through fresh eyes, appears as a purely cerebral pursuit. But science, at its heart, is an art of doing — and like any art, if left in the mind it withers, forever unrealized. Just as painters must paint and sculptors must sculpt, scientists must do. The doing — messy, immediate, real — is where discovery lives.
Young scientists are especially susceptible to overthinking. They arrive at the bench armed with theories and expectations, having labored to pursue a profession of thought. In academia, science lives only in the mind. You learn about the practice by talking about the ideas — it’s a thinking person’s game, and the more you think, the better you do.
The Ancient History of Indian Pariah Dog
Not many of us know that the Indian Street Dog has more than 5300 years of continuous history & one of the oldest & strongest dog breeds in the world…
This documentary sheds light on the history & heritage of Indian Street Dog — something you may have never seen or known before…
HOW TO START WRITING — Terrible Writing Advice
Examples are about fiction, but principles apply to everyone. I follow every advice given by the Sage TWA religiously.
Terrible Writing Advice goes back to basics with a video on how to start writing! And what better way to start than to get wrapped up in fantasies of grandeur, fighting with naysayers on the internet, and then just never getting around to that writing part. Writing is super easy after all, that’s why I keep putting it off.
The Economics of Underground Rap.
Some very interesting discussions on pricing, branding, and the PR of underground Rappers. I’m generally pretty interested in how different people make money, so if you come across interesting work on business models/economics, I would love to read them.
Other Content
ClashEval: Quantifying the tug-of-war between an LLM’s internal prior and external evidence
Not profound, but interesting if you’re interested in RAG.
Retrieval augmented generation (RAG) is frequently used to mitigate hallucinations and provide up-to-date knowledge for large language models (LLMs). However, given that document retrieval is an imprecise task and sometimes results in erroneous or even harmful content being presented in context, this raises the question of how LLMs handle retrieved information: If the provided content is incorrect, does the model know to ignore it, or does it recapitulate the error? Conversely, when the model’s initial response is incorrect, does it always know to use the retrieved information to correct itself, or does it insist on its wrong prior response? To answer this, we curate a dataset of over 1200 questions across six domains (e.g., drug dosages, Olympic records, locations) along with content relevant to answering each question. We further apply precise perturbations to the answers in the content that range from subtle to blatant errors. We benchmark six top-performing LLMs, including GPT-4o, on this dataset and find that LLMs are susceptible to adopting incorrect retrieved content, overriding their own correct prior knowledge over 60% of the time. However, the more unrealistic the retrieved content is (i.e. more deviated from truth), the less likely the model is to adopt it. Also, the less confident a model is in its initial response (via measuring token probabilities), the more likely it is to adopt the information in the retrieved content. We exploit this finding and demonstrate simple methods for improving model accuracy where there is conflicting retrieved content. Our results highlight a difficult task and benchmark for LLMs — namely, their ability to correctly discern when it is wrong in light of correct retrieved content and to reject cases when the provided content is incorrect.
Building a key-value store part 1
Have you ever wondered how Redis is so fast? How does it actually persist data to disk? How does it store your keys in memory?
As always, at the end of the post, you can find the link to the repository.
If you follow this publication you probably already guessed that Redis is one of my favorite toys. It can do so many things. But it lacks one thing: storing tree-like data structures such as a binary tree. Technically it can be done but it’s complicated and inefficient.
What’s the logical thing to do? Building our own in-memory key-value tree store that works similarly to Redis. Let’s call it Tedis.
I’m going to play with this library to see how good this is, but it looks interesting. Shoutout Meta for their open source work. Share your opinions/experiments with this library.
SPDL (Scalable and Performant Data Loading) is a research project to explore the design of fast data loading for ML training with free-threaded (a.k.a no-GIL) Python, but brings its benefits to the current ML systems.
SPDL implements an abstraction that facilitates building performant data processing pipelines that utilizes multi-threading.
Oftentimes, the bottleneck of data loading is in media decoding and pre-processing. So, in addition to the pipeline abstraction, SPDL also provides an I/O module for multimedia (audio, video and image) processing. This I/O module was designed from scratch to achieve high performance and high throughput.
Deep DNA Storage: Scalable and Robust DNA-based Storage via Coding Theory and Deep Learning
Looks very cool, but I have to wrap my mind around it before pushing it. There’s a lot I don’t know about this space, so I don’t want to make random assumptions.
DNA-based storage is an emerging technology that enables digital information to be archived in DNA molecules. This method enjoys major advantages over magnetic and optical storage solutions such as exceptional information density, enhanced data durability, and negligible power consumption to maintain data integrity. To access the data, an information retrieval process is employed, where some of the main bottlenecks are the scalability and accuracy, which have a natural tradeoff between the two. Here we show a modular and holistic approach that combines Deep Neural Networks (DNN) trained on simulated data, Tensor-Product (TP) based Error-Correcting Codes (ECC), and a safety margin mechanism into a single coherent pipeline. We demonstrated our solution on 3.1MB of information using two different sequencing technologies. Our work improves upon the current leading solutions by up to x3200 increase in speed, 40% improvement in accuracy, and offers a code rate of 1.6 bits per base in a high noise regime. In a broader sense, our work shows a viable path to commercial DNA storage solutions hindered by current information retrieval processes.
Credit- Hugo Rauch
👋 Hey everyone, I’m Hugo, the founder of VCo2, the media platform that makes you a better impact investor and climate tech entrepreneur.
Today I’m sharing the show notes prepared by Thomas Bajas when he came on the podcast. If you want to understand the energy markets? This is your 101 course.
Can AI automate computational reproducibility?
I would rate this higher, but I feel like such work is almost preaching to the choir.
Last month, Sakana AI released an “AI scientist”, which the company called “the first comprehensive system for fully automatic scientific discovery”. It was touted as being able to accelerate science without suffering from human limitations.
Unfortunately, the “AI Scientist” has many shortcomings. It has no checks for novelty, so generated papers could rehash earlier work. And Sakana did not perform any human review (let alone expert “peer” review) of the generated papers — so it is unclear if the papers are any good (apparently they are not). While these flaws are particularly flagrant in Sakana’s case, the lack of good evaluation affects most AI agents, making it hard to measure their real-world impact.
Today, we introduce a new benchmark for measuring how well AI can reproduce existing computational research. We also share how this project has changed our thinking about “general intelligence” and the potential economic impact of AI. Read the paper.
If you liked this article and wish to share it, please refer to the following guidelines.
Reach out to me
Use the links below to check out my other content, learn more about tutoring, reach out to me about projects, or just to say hi.
Small Snippets about Tech, AI and Machine Learning over here
AI Newsletter- https://artificialintelligencemadesimple.substack.com/
My grandma’s favorite Tech Newsletter- https://codinginterviewsmadesimple.substack.com/
My (imaginary) sister’s favorite MLOps Podcast-
Check out my other articles on Medium. : https://rb.gy/zn1aiu
My YouTube: https://rb.gy/88iwdd
Reach out to me on LinkedIn. Let’s connect: https://rb.gy/m5ok2y
My Instagram: https://rb.gy/gmvuy9
My Twitter: https://twitter.com/Machine01776819
I know this is an unpopular opinion, but keep in mind that I come from Uppsala University (PhD wise, my undergraduate research and studies were done at GA Tech), a prestigious Swedish and European university (ranked 25th in Europe by QS), which every year in December hosts lectures by the Nobel Laureates (which I never attended in almost a decade as a visiting researcher and PhD student there, because I don't believe in the Nobel Prize, I think it rewards the wrong people about 80% of the time, but that's a story for another day). I remember this American professor (a Caucasian American professor) sarcastically saying one day during a regular chemistry lecture that all you need to win a Nobel Prize is to be white and American, from an American university, lol.
The Nobel Prize is heavily biased toward American and traditional Western (i.e., including Northern) European countries and researchers. Imagine all the undergraduate and masters and PhD students, visiting researchers and postdocs (who in many cases are people of color, who by the way almost never get professorships, maybe because they do not know how to play the political game, or are seen as outsiders and foreigners and so on) who make advances for many of these "famous professors" and whose work is never referenced in relation to the Nobel prize. People are also strangely proud of "having worked with a Nobel laureate", and in many cases that's enough to advance their own careers, even if they didn't publish anything "relevant" with such a professor lol.
In any case, the winners are scientists or researchers like you and me. The Nobel Prize perpetuates the "genius myth". Most of these guys are good people, and many of them are normal smart or very smart people, but almost never wicked smart or geniuses (the smartest guys I met were my undergraduate advisor Seth Marder and Northwestern professor Joseph Hupp, and my half-Latino guy Daniel Nocera; Chinese American Zhenan Bao was also very impressive intellectually, none of whom have won the Nobel Prize yet). Lots of times people joked that if you came to Sweden every year and were sufficiently famous that that would help you too. There is a lot of bureaucracy and shady political stuff going on.
In many cases, it feels like the discoveries are also hyped (especially in the 20% of cases where the discovery is "rightly deserved" for some eureka moment or sudden discovery, which is ironic).
I think the Nobel Prize should be given to a **tree of people** or a **group** instead of 2 or 3 vertical individuals. For example, instead of giving the prize to Hassabis and the other guy, they should have given it to Google/DeepMind Research and partners who worked on developing the protein folding technologies. Instead of giving the prize to a single professor who worked on something for 20 years, give it to the tree of researchers who worked on it and published either peer-reviewed or preprints relevant to the prize (e.g., top 75th percentile papers based on impact or citations of that professor and partners, for example).
Nice recommendations. Cheers.