Interesting Content in AI, Software, Business, and Tech- 07/24/2024 [Updates]

Content to help you keep up with Machine Learning, Deep Learning, Data Science, Software Engineering, Finance, Business, and more

Jul 25, 2024

Hey, it’s Devansh 👋👋

In issues of Updates, I will share interesting content I came across. While the focus will be on AI and Tech, the ideas might range from business, philosophy, ethics, and much more. The goal is to share interesting content with y’all so that you can get a peek behind the scenes into my research process.

I put a lot of effort into creating work that is informative, useful, and independent from undue influence. If you’d like to support my writing, please consider becoming a paid subscriber to this newsletter. Doing so helps me put more effort into writing/research, reach more people, and supports my crippling chocolate milk addiction. Help me democratize the most important ideas in AI Research and Engineering to over 100K readers weekly.

Help me buy chocolate milk

PS- We follow a “pay what you can” model, which allows you to support within your means, and support my mission of providing high-quality technical education to everyone for less than the price of a cup of coffee. Check out this post for more details and to find a plan that works for you.

A lot of people reach out to me for reading recommendations. I figured I’d start sharing whatever AI Papers/Publications, interesting books, videos, etc I came across each week. Some will be technical, others not really. I will add whatever content I found really informative (and I remembered throughout the week). These won’t always be the most recent publications- just the ones I’m paying attention to this week. Without further ado, here are interesting readings/viewings for 07/24/2024. If you missed last week’s readings, you can find it here.

Reminder- We started an AI Made Simple Subreddit. Come join us over here- https://www.reddit.com/r/AIMadeSimple/. If you’d like to stay on top of community events and updates, join the discord for our cult here: https://discord.com/invite/EgrVtXSjYf. Lastly, if you’d like to get involved in our many fun discussions, you should join the Substack Group Chat Over here.

Community Spotlight: Sergei Polevikov, ABD, MBA, MS, MA 🇮🇱🇺🇦

Sergei Polevikov, ABD, MBA, MS, MA 🇮🇱🇺🇦 publishes super insightful and informative reports on AI, Healthcare, and Medicine as a business. His newsletter, AI Health Uncut, is one of my favorite sources for understanding the complicated dynamics of the space, and I think it’s worth reading for anyone interested in the space.

If you’re doing interesting work and would like to be featured in the spotlight section, just drop your introduction in the comments/by reaching out to me. There are no rules- you could talk about a paper you’ve written, an interesting project you’ve worked on, some personal challenge you’re working on, ask me to promote your company/product, or anything else you consider important. The goal is to get to know you better, and possibly connect you with interesting people in our chocolate milk cult. No costs/obligations are attached.

Previews

Curious about what articles I’m working on? Here are the previews for the next planned articles-

Tech Made Simple

Nothing. Important Announcement there coming soon

AI Made Simple-

Deepfake Part 3. Exploring the true dangers of AI-generated misinformation.

Highly Recommended

These are pieces that I feel are particularly well done. If you don’t have much time, make sure you at least catch these works.

Writing & Teaching the World about AI | with Devansh

I was featured on the Futuristic Lawyer podcast with Tobias Jensen, where we discussed topics such as what it would take for me to get an Elon Musk tattoo, AI Regulation, my writing, and AI Art. I had a lot of fun, and you guys would appreciate it.

Devansh is an AI consultant and professional newsletter writer on Substack. On a weekly basis, Devansh reaches +100.000 people with his newsletter “Artificial Intelligence Made Simple” in which he breaks down complex topics related to AI while promoting a realistic and balanced understanding of the technology. In this conversation, we touch on a wide array of topics, including how Devansh is supporting his crippling chocolate milk addiction, and how he will get an “Elon” tattoo across his chest if AI leads to universal basic income.

Using LLMs for Evaluation

Our boy Cameron R. Wolfe, Ph.D. continues his streak of masterpieces with this absurdly in-depth look into using LLMs for evaluation. All of his work is detailed, comprehensive, and well-structured, and Cam never shies away from sharing details. This article is no exception-

As large language models (LLMs) have become more and more capable, one of the most difficult aspects of working with these models is determining how to properly evaluate them. Many powerful models exist, and they each solve a wide variety of complex, open-ended tasks. As a result, discerning differences in performance between these models can be difficult. The most reliable method of evaluating LLMs is with human feedback, but collecting data from humans is noisy, time consuming, and expensive. Despite being a valuable and necessary source of truth for measuring model capabilities, human evaluation — when used in isolation — impedes our ability to iterate quickly during model development. To solve this problem, we need an evaluation metric that is quick, cost effective, and simple but maintains a high correlation with the results of human evaluation.

“While human evaluation is the gold standard for assessing human preferences, it is exceptionally slow and costly. To automate the evaluation, we explore the use of state-of-the-art LLMs, such as GPT-4, as a surrogate for humans.” — from [17]

Ironically, the ever-increasing capabilities of LLMs have produced a potential solution to this evaluation problem. We can use the LLM itself for evaluation, an approach commonly referred to as LLM-as-a-Judge [17]. This technique was originally explored after the release of GPT-4 — the first LLM that was capable of evaluating the quality of other models’ output. Since then, a variety of publications have analyzed LLM-as-a-Judge, uncovering best practices for its implementation and outlining important sources of bias of which we should be aware. Throughout the course of this overview, we will take a look at many of these publications and build a deep, practical understanding of LLM evaluations.

How to Keep Stakeholders Aligned 📣

Looking to get better with Tech Leadership/management? 🌀 Luca Rossi is your man. His posts are amazing, the art he creates is perfect (he’s inspired me to start learning how to up my game), and he pulls from his community to give OTG insights.

Technically, a project is a singular, shared entity, so this alignment problem doesn’t feel too hard. In reality, it most often is, and engineering leaders (among others) often get caught in convoluted balancing acts to deliver tailored, timely, and correct information to all parties.

The impact of getting this right vs wrong can be immense:

🟢 Good communication — people know where to look for what they need, they trust the data, and take action in a timely manner.
🔴 Bad communication — people don’t trust tools, overly rely on meetings to pass information, and make slow and uninformed calls.

So, over time I stole developed a rich set of strategies for this, with ideas on how to use tools, meetings, and how to serve everyone efficiently.

So here’s what we’ll cover:

🔀 Understanding stakeholder needs — who you should account for.
🤝 Meetings vs async tools — turning status updates into async workflows.
🛠️ Single tool vs multiple tools — and strategies to make both approaches work.
🎯 How to choose for your team — practical heuristics and steps to make the right call.

Let’s dive in!

NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?

Interesting research covers the limitations of complex retrieval and reasoning in long contexts. That’s why my recommendation is to keep tasks and commands simple and use LLMs for orchestration instead of naked computations wherever possible.

In evaluating the long-context capabilities of large language models (LLMs), identifying content relevant to a user’s query from original long documents is a crucial prerequisite for any LLM to answer questions based on long text. We present NeedleBench, a framework consisting of a series of progressively more challenging tasks for assessing bilingual long-context capabilities, spanning multiple length intervals (4k, 8k, 32k, 128k, 200k, 1000k, and beyond) and different depth ranges, allowing the strategic insertion of critical data points in different text depth zones to rigorously test the retrieval and reasoning capabilities of models in diverse contexts. We use the NeedleBench framework to assess how well the leading open-source models can identify key information relevant to the question and apply that information to reasoning in bilingual long texts. Furthermore, we propose the Ancestral Trace Challenge (ATC) to mimic the complexity of logical reasoning challenges that are likely to be present in real-world long-context tasks, providing a simple method for evaluating LLMs in dealing with complex long-context situations. Our results suggest that current LLMs have significant room for improvement in practical long-context applications, as they struggle with the complexity of logical reasoning challenges that are likely to be present in real-world long-context tasks. All codes and resources are available at OpenCompass: this https URL.

Vision language models are blind

A good demonstration of how little LLMs understand.

Large language models with vision capabilities (VLMs), e.g., GPT-4o and Gemini 1.5 Pro are powering countless image-text applications and scoring high on many vision-understanding benchmarks. We propose BlindTest, a suite of 7 visual tasks absurdly easy to humans such as identifying (a) whether two circles overlap; (b) whether two lines intersect; © which letter is being circled in a word; and (d) counting the number of circles in a Olympic-like logo. Surprisingly, four state-of-the-art VLMs are, on average, only 56.20% accurate on our benchmark, with \newsonnet being the best (73.77% accuracy). On BlindTest, VLMs struggle with tasks that requires precise spatial information and counting (from 0 to 10), sometimes providing an impression of a person with myopia seeing fine details as blurry and making educated guesses. Code is available at: this https URL

AI’s $600B Question

Turns out a lot of the massive GPU purchase agreements and data center acquisitions were misguided and investing without a clear long-term vision and no understanding of revenue has lead to no ROI. Who saw that coming?

In September 2023, I published AI’s $200B Question. The goal of the piece was to ask the question: “Where is all the revenue?”

At that time, I noticed a big gap between the revenue expectations implied by the AI infrastructure build-out, and actual revenue growth in the AI ecosystem, which is also a proxy for end-user value. I described this as a “$125B hole that needs to be filled for each year of CapEx at today’s levels.”

This week, Nvidia completed its ascent to become the most valuable company in the world. In the weeks leading up to this, I’ve received numerous requests for the updated math behind my analysis. Has AI’s $200B question been solved, or exacerbated?

If you run this analysis again today, here are the results you get: AI’s $200B question is now AI’s $600B question.

Mumbai’s Crazy-Efficient, 99.9999% Accurate Food Delivery System

Given how much food delivery has been struggling, Mumbai’s Dhabawala service presents an interesting case study of what is required to make things profitable.

Groq, and the Hardware of AI — Intuitively and Exhaustively Explained

Some of this article is behind a paywall, but I’m impressed by how much detail there is in this article. I’m going to be following Daniel Warfield closely from now on and I’m very excited to see what he continues to put out.

This article discusses Groq, a new approach to computer hardware that’s revolutionizing the way AI is applied to real world problems.

Before we talk about Groq, we’ll break down what AI fundamentally is, and explore some of the key components of computer hardware used to run AI models. Namely; CPUs, GPUs, and TPUs. We’ll explore these critical pieces of hardware by starting in 1975 with the Z80 CPU, then we’ll build up our understanding to modern systems by exploring some of the critical evolutions in computer hardware.

Armed with an understanding of some of the fundamental concepts and tradeoffs in computer hardware, we’ll use that understanding to explore what Groq is, how it’s revolutionizing the way AI computation is done, and why that matters.

Naturally there’s a lot to cover between early CPUs and a cutting edge billion dollar AI startup. Thus, this is a pretty long article. Buckle up, it’ll be worth it.

Dostoevsky’s Radical Philosophy of Love

If you’re looking for an easy entry into philosophy, Unsolicited Advice is a great channel. Joe is a great speaker, he picks relevant topics, and he’s very good at giving details without overwhelming you.

Dostoevsky is considered a great thinker on many topics. His ideas about God, Theology, Morality, and Social Reform are famous throughout the world. But I want to take a step back from all that and examine what I take to be a cornerstone of his philosophy: his ideas about love. Because all at once his ideas on love are hopeful, optimistic, demanding, and terrifying. And I am excited to look at them with you today.

A Primer on Semiconductor Capital Equipment

When it comes to the business side of Tech, Eric Flaningam is my new favorite resource. I study every article religiously, not just for the knowledge but also as a writer to see how I can improve my work. Follow him, if you haven’t already.

The semiconductor capital equipment (semicap) industry is one of the most important industries on the planet and one that doesn’t get much love (outside of ASML). The industry has a number of structural advantages leading towards attractive investment characteristics for long-term, quality-focused investors: deep competitive advantages, technical differentiation, strong returns on capital, and high amounts of cash returns to shareholders.

For those who have been reading my work for a long time, you know I’ve been heavily influenced by the value-investing school of thought. One of my beliefs is that looking for the highest-quality technology companies with deep competitive advantages will yield outperformance over time.

The large semicap companies have some of the deepest competitive advantages in technology, leading to them operating in monopolies or oligopolies. The primary risks are cyclicality, geopolitical concerns, and the analytical rigor involved with investigating those two variables. The industry isn’t for those who lack conviction.

My goal for this article is to provide an introductory piece on the semiconductor manufacturing process, the semicap markets, and the trends diving this industry. I’ll then dive deeper into individual companies in future articles.

I’ll be structuring the article as follows:

Background on the Semiconductor Industry
An Overview of the Semiconductor Manufacturing Process
An Overview of the Semicap Market
Final thoughts on the Space

Hackers and Painters

An old, but still interesting, article shared by my friend Ryan Xie. An interesting analysis by Paul Graham on the similarities between hackers and painters.

When I finished grad school in computer science I went to art school to study painting. A lot of people seemed surprised that someone interested in computers would also be interested in painting. They seemed to think that hacking and painting were very different kinds of work — that hacking was cold, precise, and methodical, and that painting was the frenzied expression of some primal urge.

Both of these images are wrong. Hacking and painting have a lot in common. In fact, of all the different types of people I’ve known, hackers and painters are among the most alike.What hackers and painters have in common is that they’re both makers. Along with composers, architects, and writers, what hackers and painters are trying to do is make good things. They’re not doing research per se, though if in the course of trying to make good things they discover some new technique, so much the better.

I’m pretty busy with the Deepfake Part 3, Work stuff, and other things so this week’s reading list is relatively short.

If you liked this article and wish to share it, please refer to the following guidelines.