Interesting Content in AI, Software, Business, and Tech- 04/03/2024 [Updates]

Content to help you keep up with Machine Learning, Deep Learning, Data Science, Software Engineering, Finance, Business, and more

Apr 03, 2024

Hey, it’s Devansh 👋👋

In issues of Updates, I will share interesting content I came across. While the focus will be on AI and Tech, the ideas might range from business, philosophy, ethics, and much more. The goal is to share interesting content with y’all so that you can get a peek behind the scenes into my research process.

I put a lot of effort into creating work that is informative, useful, and independent from undue influence. If you’d like to support my writing, please consider becoming a paid subscriber to this newsletter. Doing so helps me put more effort into writing/research, reach more people, and supports my crippling chocolate milk addiction. Help me democratize the most important ideas in AI Research and Engineering to over 100K readers weekly.

Help me buy chocolate milk

PS- We follow a “pay what you can” model, which allows you to support within your means. Check out this post for more details and to find a plan that works for you.

A lot of people reach out to me for reading recommendations. I figured I’d start sharing whatever AI Papers/Publications, interesting books, videos, etc I came across each week. Some will be technical, others not really. I will add whatever content I found really informative (and I remembered throughout the week). These won’t always be the most recent publications- just the ones I’m paying attention to this week. Without further ado, here are interesting readings/viewings for 03/27/2024. If you missed last week’s readings, you can find it here.

Reminder- We started an AI Made Simple Subreddit. Come join us over here- https://www.reddit.com/r/AIMadeSimple/. If you’d like to stay on top of community events and updates, join the discord for our cult here: https://discord.com/invite/EgrVtXSjYf.

Community Spotlight: Tasting History with Max Miller

"Tasting History with Max Miller" is a super interesting YouTube channel that digs through history by going through recipes in old manuscripts. It's always super interesting to see Max go into how those recipes teach us things about the that particular time period and geography. Personally, I don't even care about the food aspects: the historical deep dives into how the culture has evolved is what keeps me subbed. If you're a history nerd, check it out. I'll share a video from them in this reading list.

If you’re doing interesting work and would like to be featured in the spotlight section, just drop your introduction in the comments/by reaching out to me. There are no rules- you could talk about a paper you’ve written, an interesting project you’ve worked on, some personal challenge you’re working on, ask me to promote your company/product, or anything else you consider important. The goal is to get to know you better, and possibly connect you with interesting people in our chocolate milk cult. No costs/obligations are attached.

Previews

Curious about what articles I’m working on? Here are the previews for the next planned articles-

Tech Made Simple

How do you generate the following:

AI Made Simple

Flooding x AI

Highly Recommended

These are pieces that I feel are particularly well done. If you don’t have much time, make sure you at least catch these works.

AutoBNN: Probabilistic time series forecasting with compositional bayesian neural networks

I'll have to study this in more detail, but the idea is definitely very interesting. In the meantime, would love to hear from TSF experts like Valeriy Manokhin, PhD, MBA, CQF

Time series problems are ubiquitous, from forecasting weather and traffic patterns to understanding economic trends. Bayesian approaches start with an assumption about the data's patterns (prior probability), collecting evidence (e.g., new time series data), and continuously updating that assumption to form a posterior probability distribution. Traditional Bayesian approaches like Gaussian processes (GPs) and Structural Time Series are extensively used for modeling time series data, e.g., the commonly used Mauna Loa CO2 dataset. However, they often rely on domain experts to painstakingly select appropriate model components and may be computationally expensive. Alternatives such as neural networks lack interpretability, making it difficult to understand how they generate forecasts, and don't produce reliable confidence intervals.

To that end, we introduce AutoBNN, a new open-source package written in JAX. AutoBNN automates the discovery of interpretable time series forecasting models, provides high-quality uncertainty estimates, and scales effectively for use on large datasets. We describe how AutoBNN combines the interpretability of traditional probabilistic approaches with the scalability and flexibility of neural networks.

Mamba Explained

A very in-depth explanation of the Mamba architecture that might replace Transformers. Another great writeup by the people at The Gradient

Mamba, however, is one of an alternative class of models called State Space Models (SSMs). Importantly, for the first time, Mamba promises similar performance (and crucially similar scaling laws) as the Transformer whilst being feasible at long sequence lengths (say 1 million tokens). To achieve this long context, the Mamba authors remove the “quadratic bottleneck” in the Attention Mechanism. Mamba also runs fast - like “up to 5x faster than Transformer fast”

...

Here we’ll discuss:

The advantages (and disadvantages) of Mamba (🐍) vs Transformers (🤖),
Analogies and intuitions for thinking about Mamba, and
What Mamba means for Interpretability, AI Safety and Applications.

What it was like to visit a Medieval Tavern

The aforementioned recommendation from Max's channel.

The Unreasonable Ineffectiveness of the Deeper Layers

Given how many people are exploring efficient LLM training, this is worth reading

We empirically study a simple layer-pruning strategy for popular families of open-weight pretrained LLMs, finding minimal degradation of performance on different question-answering benchmarks until after a large fraction (up to half) of the layers are removed. To prune these models, we identify the optimal block of layers to prune by considering similarity across layers; then, to "heal" the damage, we perform a small amount of finetuning. In particular, we use parameter-efficient finetuning (PEFT) methods, specifically quantization and Low Rank Adapters (QLoRA), such that each of our experiments can be performed on a single A100 GPU. From a practical perspective, these results suggest that layer pruning methods can complement other PEFT strategies to further reduce computational resources of finetuning on the one hand, and can improve the memory and latency of inference on the other hand. From a scientific perspective, the robustness of these LLMs to the deletion of layers implies either that current pretraining methods are not properly leveraging the parameters in the deeper layers of the network or that the shallow layers play a critical role in storing knowledge.

The 3 Species That Break Genetics

Scientists have discovered a group of three closely related flowers that seem to break the laws of genetics. These mountain beardtongues are pollinated by either bees or butterflies, but not both, and that's the key to an incredibly weird quirk of natural selection.

Simplest explanation on hierarchical softmax

As a tree supremacist, this is the kind of development that gets me hot and bothered. Very cool research into improving the inference of LLMs by focusing cutting down how many words are looked at. This will be pretty interesting to combine with fine-tuning and possibly RAG to nudge LLMs towards certain directions. Great writeup by Dhruvil Karani

When computing the full softmax, the resulting probability distribution is usually skewed. This means that out of thousands of possible words, only a handful are plausible choices, which is logical. Most English words don’t fit in the blank - I love to play ____. Yet, we compute the probabilities for the entire vocabulary. This is suboptimal.

Can we avoid computing the probability of obviously unlikely words? The answer is yes, and this is what hierarchical softmax achieves.

Complete Summary of Absolute, Relative and Rotary Position Embeddings!

Great compilation of the research done by a member of our cult. Aziz does great research summaries, so check him out if you're looking for more technical/research-focused resources.

Position embeddings have been used a lot in recent LLMs. In this article, I explore the concept behind them and discuss the different types of position embeddings and their differences.

March 2024 - AI Tidbits Monthly Roundup

Welcome to the March edition of AI Tidbits Monthly, where we uncover the latest and greatest in AI. This month has been filled with groundbreaking announcements from industry leaders and exciting progress in open-source AI, showcasing the rapid advancements in the field.

Interest clubs, maintenance cycles, and personal work 💡

🌀 Luca Rossi writes some of my favorite productivity, Software Engineering, and Leadership content out there. You should check him out.

The Extinction of GoPro

An amazing case-study into how GoPro ruined their first mover advantage. The most important lesson is at minute 34- GoPro ignored its core market (adventure sports people) to instead focus on a group that didn't really need them (mass-market that already had cell phones)/ Misunderstanding your customer/market can wipe out any technical advantage, moat, or advantage in resources.

In the 2010s, there was one startup who by the measures of Silicon Valley and Wall Street, seemed destined to be the next big billion-dollar consumer brand. That company was GoPro. GoPro took the world by storm with its game-changing cameras. With radically compact design, tiny form factor, high portability, rugged waterproof exteriors, and reasonable picture quality - GoPro cameras were able to capture never before seen action and perspectives. GoPro was category-leading and category-defining - the company had effectively created and owned an entire category of cameras. It was the pioneer, golden standard, and household name as GoPro was not just the name of the product and company, but also became the unofficial label for any small, portable, action camera on the market.

Yet fast forward to present-day in 2023, less than a decade later, and GoPro’s stock has dropped 95%. How could a company who had all the right ingredients from the measures of Silicon Valley and Wall Street, squander it all in such a short period of time? How could having a market-defining, category-leading product be worth so little? In this episode, we’ll cover the rise and fall of GoPro into 3 eras, their failures in strategy, and how the company’s collapse serves as a crucial lesson on the importance of knowing your market.

Artificial Intelligence Made Simple

Discussion about this post