Content Recommendations- 11/13/24 [Updates]
Content to help you keep up with Machine Learning, Deep Learning, Data Science, Software Engineering, Finance, Business, and more
Hey, it’s Devansh 👋👋
In issues of Updates, I will share interesting content I came across. While the focus will be on AI and Tech, the ideas might range from business, philosophy, ethics, and much more. The goal is to share interesting content with y’all so that you can get a peek behind the scenes into my research process.
I put a lot of effort into creating work that is informative, useful, and independent from undue influence. If you’d like to support my writing, please consider becoming a paid subscriber to this newsletter. Doing so helps me put more effort into writing/research, reach more people, and supports my crippling chocolate milk addiction. Help me democratize the most important ideas in AI Research and Engineering to over 100K readers weekly. You can use the following for an email template.
PS- We follow a “pay what you can” model, which allows you to support within your means, and support my mission of providing high-quality technical education to everyone for less than the price of a cup of coffee. Check out this post for more details and to find a plan that works for you.
Before we begin, our cult has established itself in 190 countries. Our plan for world domination is taking place, and I really appreciate all of you. This newsletter would not be possible without your support, so thank you for always reading, sharing, commenting, and more. A special thank you to my paid subscriptions and consulting clients- who fund my research and allow me to share the most important ideas with the world.
A lot of people reach out to me for reading recommendations. I figured I’d start sharing whatever AI Papers/Publications, interesting books, videos, etc I came across each week. Some will be technical, others not really. I will add whatever content I found really informative (and I remembered throughout the week). These won’t always be the most recent publications- just the ones I’m paying attention to this week. Without further ado, here are interesting readings/viewings for 11/13/2024. If you missed last week’s readings, you can find it here.
Reminder- We started an AI Made Simple Subreddit. Come join us over here- https://www.reddit.com/r/AIMadeSimple/. If you’d like to stay on top of community events and updates, join the discord for our cult here: https://discord.com/invite/EgrVtXSjYf. Lastly, if you’d like to get involved in our many fun discussions, you should join the Substack Group Chat Over here.
Community Spotlight: Unsolicited Advice
Unsolicited Advice (Joe Folley )runs an excellent YouTube channel where he shares his lessons from various thinkers and philosophers. Even if I don’t always agree with everything he says, since different works of philosophy have different takeaways for different people. Joe is very well-spoken and very well-read, which allows him to make connections across thinkers, especially in the more modern context. This makes him very interesting watch. His work is very interesting, whether you’re looking for an introduction to a certain thinker or you’re looking for new discussions on an old favorite.
If you’re doing interesting work and would like to be featured in the spotlight section, just drop your introduction in the comments/by reaching out to me. There are no rules- you could talk about a paper you’ve written, an interesting project you’ve worked on, some personal challenge you’re working on, ask me to promote your company/product, or anything else you consider important. The goal is to get to know you better, and possibly connect you with interesting people in our chocolate milk cult. No costs/obligations are attached.
Previews
Curious about what articles I’m working on? Here are the previews for the next planned articles-
-
GPT vs Claude vs Gemini as base models for building agents.
Highly Recommended
These are pieces that I feel are particularly well done. If you don’t have much time, make sure you at least catch these works.
Navigating AI’s Moral Maze with Devansh
Podcast host and Senior Software SWE at Bloomberg- Alexa Griffith — had me on her podcast, “Alexa’s Input (AI)”. We talked a bunch of things, mainly related to ethics, morally aligned LLMs, and what needs to be done to ensure that tech works for us. I had a great time speaking to Alexa, and she is looking for more guests- so reach out to her if you think you have something interesting to share.
Devansh, the writer behind the popular Substack Artificial Intelligence Made Simple (@chocolatemilkcultleader), joins on this episode for an in-depth discussion on the pressing issues in AI ethics. Devansh talks about his experiences advocating for safer social platforms, his controversial takes on ‘morally aligned’ LLMs, and the underlying ethical issues in tech that often go unnoticed. An insightful episode for anyone interested in AI, tech policy, or the intersection of technology and society.
Sexism in Venture Capital: Why ‘Penis’ Is OK, But ‘Vagina’ Is Taboo
A lot of very condemning stats about Gender Bias in the VC space by
. My question to those familiar with the space-How much does this reflect your experiences?
What causes the disparity?
Why are the numbers for women vs men VCs different? First instinct is selection bias (the women VCs have to jump through more hoops so the ones left standing are the ones that were better), but I haven’t thought about this enough (or know the space enough) to say anything meaningful.
What can be done?
It’s always good to have discussions on these topics. Shoot me your thoughts about this article/topic, whether you agree or disagree (are the stats misrepresenting reality).
🚫 98% of all venture capital dollars flow into male-founded startups. (Sources: Pitchbook, Bloomberg Technology.)
🚫 Men represent 85% of VC-backed startup founders, while only 13% are women. (Source: PlanBeyond’s ‘Bias In Venture Capital Funding’ Report.)
🚫 73% of all VC-backed founding teams are composed exclusively of men. (Source: PlanBeyond’s ‘Bias In Venture Capital Funding’ Report.)
🚫 60% of founding teams are exclusively white. (Source: PlanBeyond’s ‘Bias In Venture Capital Funding’ Report.)
…
🚫 75% of femtech companies are founded by women. Yet, they raise, on average, 23% less capital than those femtech companies founded by men. (Source: The Guardian.)
🚫 Only 15% of private equity’ institutional partners and managing directors are women. Venture capital has similar figures. (Sources: Strategex, Leslie Schrock on Second Opinion.)
🚫 Female CEOs are 45% more likely to get fired versus their male counterparts. (Source: Gupta, V. K., Mortal, S. C., Silveri, S., Sun, M., & Turban, D. B. (2020). You’re Fired! Gender Disparities in CEO Dismissal. Journal of Management, 46(4), 560–582.)
Facts:
✅ Companies with women only founders are more than twice as likely than men only teams to develop companies that improve society. (Source: PlanBeyond’s ‘Bias In Venture Capital Funding’ Report.)
✅ Female entrepreneurs have been shown to deliver more than double the revenue per dollar invested compared to their male counterparts, and they tend to exit on average a year faster. (Source: Bloomberg Technology.)
✅ Female VC partners tend to back female-led startups at twice the rate of male partners, contributing to an ecosystem where diverse founders have more equitable access to early-stage funding. (Source: All Raise.)
✅ VC firms that increased their hiring of women partners by just 10% saw an average increase of 1.5% in overall fund returns and gained 9.7% more profitable assets. (Source: All Raise.)
✅ 69% of top-quartile VC firms feature women decision-makers. (Source: All Raise.)
✅ Female founders rated value-add as twice as important than males. (Source: Steve Ardire on LinkedIn.)
🌁#75: What is Metacognitive AI
An excellent collection by the legendary Ksenia Se . The first part about Meta-cognition is very interesting, and has interesting parallels to William Lambos and his work on AI vs Intelligent, and his emphasis on agency and adaptiveness as a key differentiator of the two.
we discuss questions of cognition, consciousness, and eventually treating AI as something possessing morality, plus the usual collection of interesting articles, relevant news, and research papers.
Economies of scale for foundational AI models
One question about the economies of scale here- what about the impact of loyalty and ease of switching. One reason SoMe platforms benefit from platform plays is that once you're on them- there are huge costs to switching. This is not something platforms like Uber and/or LLMs can replicate yet- which means they're much more susceptible to price wars.
said that memory/personalization of LLMs to help them generate more customized things for the user, increasing the friction for switching. This is a great point, but I personally haven't been too impressed by the memory/personalization features so far. I think there might be higher ROI options, but am curious what y'all think.Hungry generative AI models drive major tech companies to pursue high-stakes data partnerships, exemplified by OpenAI's agreements with TIME magazine and Reddit and Meta's strategic alliance with Reuters to secure premium training content.
This post explores how economies of scale in the context of data apply to AI and generative models, focusing on three key areas: software vs. hardware, humanoid robots, and large language models.
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
Research like this is opening whole new levels for AI. Very exciting stuff.
The development of large language models (LLMs) has expanded to multi-modal systems capable of processing text, images, and speech within a unified framework. Training these models demands significantly larger datasets and computational resources compared to text-only LLMs. To address the scaling challenges, we introduce Mixture-of-Transformers (MoT), a sparse multi-modal transformer architecture that significantly reduces pretraining computational costs. MoT decouples non-embedding parameters of the model by modality — including feed-forward networks, attention matrices, and layer normalization — enabling modality-specific processing with global self-attention over the full input sequence. We evaluate MoT across multiple settings and model scales. In the Chameleon 7B setting (autoregressive text-and-image generation), MoT matches the dense baseline’s performance using only 55.8\% of the FLOPs. When extended to include speech, MoT reaches speech performance comparable to the dense baseline with only 37.2\% of the FLOPs. In the Transfusion setting, where text and image are trained with different objectives, a 7B MoT model matches the image modality performance of the dense baseline with one third of the FLOPs, and a 760M MoT model outperforms a 1.4B dense baseline across key image generation metrics. System profiling further highlights MoT’s practical benefits, achieving dense baseline image quality in 47.2\% of the wall-clock time and text quality in 75.6\% of the wall-clock time (measured on AWS p4de.24xlarge instances with NVIDIA A100 GPUs).
“A Connection to the Canvas”: An Interview with Bianca Raffaella
No real insightful comments from me, just found this to be a very cool story. Kudos
This week, I’m very excited to share this interview with Bianca Raffaella, an extremely talented, registered blind artist, designer, activist, and public speaker based in the UK. She aims to demonstrate to the fully sighted that visual impairment is no restriction to being an artist, using painting and collage to challenge preconceptions about what a visually impaired artist can create and perceive.
Five ways the ABACUS label advances nature-based carbon removal
Looks to be very cool work by Amazon Science and I hope to see more of this. I’m always a bit worried about Greenwashing, especially when Verra gets involved, but fingers crossed here (if any of you climate experts have thoughts on this- please share them).
While the voluntary carbon market has the potential to bring billions of dollars of finance to restoration projects, less than 3% of credits issued to date come from nature-based carbon removal. This is due to the voluntary carbon market’s prices’ falling below the costs of high-quality nature-based restoration.
That’s where ABACUS comes in. ABACUS is a set of principles and requirements, codified within Verra’s Verified Carbon Standard, that helps advance the integrity of restoration projects within the voluntary carbon market. ABACUS was developed by a working group of expert practitioners, conservation professionals, and scientists — including Amazon’s own carbon neutralization scientists — in an effort to raise the quality bar for agroforestry and native-restoration projects. The ABACUS label has already begun to raise the quality bar for leading buyers.
You don’t have to read everything (I haven’t yet), but I already found some very interesting gems. A great share by Logan Thorneloe on his excellent reading list that he shared here.
Large language models (LLM) have demonstrated emergent abilities in text generation, question answering, and reasoning, facilitating various tasks and domains. Despite their proficiency in various tasks, LLMs like LaPM 540B and Llama-3.1 405B face limitations due to large parameter sizes and computational demands, often requiring cloud API use which raises privacy concerns, limits real-time applications on edge devices, and increases fine-tuning costs. Additionally, LLMs often underperform in specialized domains such as healthcare and law due to insufficient domain-specific knowledge, necessitating specialized models. Therefore, Small Language Models (SLMs) are increasingly favored for their low inference latency, cost-effectiveness, efficient development, and easy customization and adaptability. These models are particularly well-suited for resource-limited environments and domain knowledge acquisition, addressing LLMs’ challenges and proving ideal for applications that require localized data handling for privacy, minimal inference latency for efficiency, and domain knowledge acquisition through lightweight fine-tuning. The rising demand for SLMs has spurred extensive research and development. However, a comprehensive survey investigating issues related to the definition, acquisition, application, enhancement, and reliability of SLM remains lacking, prompting us to conduct a detailed survey on these topics. The definition of SLMs varies widely, thus to standardize, we propose defining SLMs by their capability to perform specialized tasks and suitability for resource-constrained settings, setting boundaries based on the minimal size for emergent abilities and the maximum size sustainable under resource constraints. For other aspects, we provide a taxonomy of relevant models/methods and develop general frameworks for each category to enhance and utilize SLMs effectively.
The 2024 Nobel Prize in Economics: Explained
It would be interesting to see how important stability is in AI Solutions.
Why do some nations flourish while others remain trapped in poverty? This year’s Nobel Prize in Economics goes to three economists whose groundbreaking work explores this very question. Join us as we dive into the theories and insights of Daron Acemoglu, Simon Johnson, and James A. Robinson, who have helped shape policy across the globe with their research on economic growth and inequality.
The Geometry of Concepts: Sparse Autoencoder Feature Structure
Have developed a bit of an interest in Geometry b/c I think exploiting that will be a great avenue for future optimizations/breakthroughs.
Sparse autoencoders have recently produced dictionaries of high-dimensional vectors corresponding to the universe of concepts represented by large language models. We find that this concept universe has interesting structure at three levels: 1) The “atomic” small-scale structure contains “crystals” whose faces are parallelograms or trapezoids, generalizing well-known examples such as (man-woman-king-queen). We find that the quality of such parallelograms and associated function vectors improves greatly when projecting out global distractor directions such as word length, which is efficiently done with linear discriminant analysis. 2) The “brain” intermediate-scale structure has significant spatial modularity; for example, math and code features form a “lobe” akin to functional lobes seen in neural fMRI images. We quantify the spatial locality of these lobes with multiple metrics and find that clusters of co-occurring features, at coarse enough scale, also cluster together spatially far more than one would expect if feature geometry were random. 3) The “galaxy” scale large-scale structure of the feature point cloud is not isotropic, but instead has a power law of eigenvalues with steepest slope in middle layers. We also quantify how the clustering entropy depends on the layer.
The CAP Theorem of Clustering: Why Every Algorithm Must Sacrifice Something
An excellent overview by one of my favorites Abhinav Upadhyay
As software engineers, we use clustering algorithms all the time. Whether it’s grouping similar users, categorizing content, or detecting patterns in data, clustering seems deceptively simple: just group similar things together, right? You might have used k-means, DBSCAN, or agglomerative clustering, thinking you just need to pick the right algorithm for your use case.
But here’s what most tutorials won’t tell you: every clustering algorithm you choose is fundamentally flawed. Not because of poor implementation or wrong parameters, but because of the math itself. In 2002, Jon Kleinberg (in a paper published at NIPS 2002) proved something that should make every developer pause: it’s impossible for any clustering algorithm to have all three properties we’d naturally want it to have.
Think of it as the CAP theorem of clustering. Just as distributed systems force you to choose between consistency, availability, and partition tolerance, Kleinberg showed that clustering algorithms force you to pick between scale invariance, richness, and consistency. You can’t have all three — ever, it’s a mathematical impossibility.
Before you deploy your next clustering solution in production, you need to understand what you’re giving up. Let’s dive into these three properties and see why you’ll always have to choose what to sacrifice.
Encoding word order in complex embeddings
Another excellent share by Manny Ko. This seems like it should’ve been a huge deal, so I’m not sure why it never popped off. Does anyone have any comments? What am I missing here?
Sequential word order is important when processing text. Currently, neural networks (NNs) address this by modeling word position using position embeddings. The problem is that position embeddings capture the position of individual words, but not the ordered relationship (e.g., adjacency or precedence) between individual word positions. We present a novel and principled solution for modeling both the global absolute positions of words and their order relationships. Our solution generalizes word embeddings, previously defined as independent vectors, to continuous word functions over a variable (position). The benefit of continuous functions over variable positions is that word representations shift smoothly with increasing positions. Hence, word representations in different positions can correlate with each other in a continuous function. The general solution of these functions is extended to complex-valued domain due to richer representations. We extend CNN, RNN and Transformer NNs to complex-valued versions to incorporate our complex embedding (we make all code available). Experiments on text classification, machine translation and language modeling show gains over both classical word embeddings and position-enriched word embeddings. To our knowledge, this is the first work in NLP to link imaginary numbers in complex-valued representations to concrete meanings (i.e., word order).
Other Good Content
Stable Anisotropic Regularization
Looks very cool, but honestly I have to really dig through a lot more before I can be impressed (there are a lot of very promising ideas that end up not as hard-hitting as expected). I have a lot going on though, so maybe I just haven’t had the time to look through the research in enough depth. Please read it and share your thoughts.
Given the success of Large Language Models (LLMs), there has been considerable interest in studying the properties of model activations. The literature overwhelmingly agrees that LLM representations are dominated by a few “outlier dimensions” with exceedingly high variance and magnitude. Several studies in Natural Language Processing (NLP) have sought to mitigate the impact of such outlier dimensions and force LLMs to be isotropic (i.e., have uniform variance across all dimensions in embedding space). Isotropy is thought to be a desirable property for LLMs that improves model performance and more closely aligns textual representations with human intuition. However, many of the claims regarding isotropy in NLP have been based on the average cosine similarity of embeddings, which has recently been shown to be a flawed measure of isotropy. In this paper, we propose I-STAR: IsoScore*-based STable Anisotropic Regularization, a novel regularization method that can be used to increase or decrease levels of isotropy in embedding space during training. I-STAR uses IsoScore*, the first accurate measure of isotropy that is both differentiable and stable on mini-batch computations. In contrast to several previous works, we find that decreasing isotropy in contextualized embeddings improves performance on the majority of tasks and models considered in this paper.
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
Spec decoding is a huge trend in inference optimization in LLMs right now. Worth monitoring-
We present LayerSkip, an end-to-end solution to speed-up inference of large language models (LLMs). First, during training we apply layer dropout, with low dropout rates for earlier layers and higher dropout rates for later layers, and an early exit loss where all transformer layers share the same exit. Second, during inference, we show that this training recipe increases the accuracy of early exit at earlier layers, without adding any auxiliary layers or modules to the model. Third, we present a novel self-speculative decoding solution where we exit at early layers and verify and correct with remaining layers of the model. Our proposed self-speculative decoding approach has less memory footprint than other speculative decoding approaches and benefits from shared compute and activations of the draft and verification stages. We run experiments on different Llama model sizes on different types of training: pretraining from scratch, continual pretraining, finetuning on specific data domain, and finetuning on specific task. We implement our inference solution and show speedups of up to 2.16x on summarization for CNN/DM documents, 1.82x on coding, and 2.0x on TOPv2 semantic parsing task. We open source our code and checkpoints at this https
How Amazon Improved Graph-based Similarity Search by 60%
A more beginner friendly introduction to an overview we had done a while back. Good if you’re looking to work with Graphs.
Jensen’s Inequality appears multiple times in any rigorous machine learning textbook. It’s essential for the key principles and foundational algorithms that make this field so productive. In this video, I state what it is, explain why it’s important and show why it’s true.
Learning to Extract Structured Entities Using Language Models
Recent advances in machine learning have significantly impacted the field of information extraction, with Language Models (LMs) playing a pivotal role in extracting structured information from unstructured text. Prior works typically represent information extraction as triplet-centric and use classical metrics such as precision and recall for evaluation. We reformulate the task to be entity-centric, enabling the use of diverse metrics that can provide more insights from various perspectives. We contribute to the field by introducing Structured Entity Extraction and proposing the Approximate Entity Set OverlaP (AESOP) metric, designed to appropriately assess model performance. Later, we introduce a new Multistage Structured Entity Extraction (MuSEE) model that harnesses the power of LMs for enhanced effectiveness and efficiency by decomposing the extraction task into multiple stages. Quantitative and human side-by-side evaluations confirm that our model outperforms baselines, offering promising directions for future advancements in structured entity extraction. Our source code and datasets are available at this https URL.
Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics
Do large language models (LLMs) solve reasoning tasks by learning robust generalizable algorithms, or do they memorize training data? To investigate this question, we use arithmetic reasoning as a representative task. Using causal analysis, we identify a subset of the model (a circuit) that explains most of the model’s behavior for basic arithmetic logic and examine its functionality. By zooming in on the level of individual circuit neurons, we discover a sparse set of important neurons that implement simple heuristics. Each heuristic identifies a numerical input pattern and outputs corresponding answers. We hypothesize that the combination of these heuristic neurons is the mechanism used to produce correct arithmetic answers. To test this, we categorize each neuron into several heuristic types-such as neurons that activate when an operand falls within a certain range-and find that the unordered combination of these heuristic types is the mechanism that explains most of the model’s accuracy on arithmetic prompts. Finally, we demonstrate that this mechanism appears as the main source of arithmetic accuracy early in training. Overall, our experimental results across several LLMs show that LLMs perform arithmetic using neither robust algorithms nor memorization; rather, they rely on a “bag of heuristics”.
AFlow: Automating Agentic Workflow Generation
Large language models (LLMs) have demonstrated remarkable potential in solving complex tasks across diverse domains, typically by employing agentic workflows that follow detailed instructions and operational sequences. However, constructing these workflows requires significant human effort, limiting scalability and generalizability. Recent research has sought to automate the generation and optimization of these workflows, but existing methods still rely on initial manual setup and fall short of achieving fully automated and effective workflow generation. To address this challenge, we reformulate workflow optimization as a search problem over code-represented workflows, where LLM-invoking nodes are connected by edges. We introduce AFlow, an automated framework that efficiently explores this space using Monte Carlo Tree Search, iteratively refining workflows through code modification, tree-structured experience, and execution feedback. Empirical evaluations across six benchmark datasets demonstrate AFlow’s efficacy, yielding a 5.7% average improvement over state-of-the-art baselines. Furthermore, AFlow enables smaller models to outperform GPT-4o on specific tasks at 4.55% of its inference cost in dollars. The code will be available at this https URL.
I provide various consulting and advisory services. If you‘d like to explore how we can work together, reach out to me through any of my socials over here or reply to this email.
If you liked this article and wish to share it, please refer to the following guidelines.
Reach out to me
Use the links below to check out my other content, learn more about tutoring, reach out to me about projects, or just to say hi.
Small Snippets about Tech, AI and Machine Learning over here
AI Newsletter- https://artificialintelligencemadesimple.substack.com/
My grandma’s favorite Tech Newsletter- https://codinginterviewsmadesimple.substack.com/
Check out my other articles on Medium. : https://rb.gy/zn1aiu
My YouTube: https://rb.gy/88iwdd
Reach out to me on LinkedIn. Let’s connect: https://rb.gy/m5ok2y
My Instagram: https://rb.gy/gmvuy9
My Twitter: https://twitter.com/Machine01776819
Glad you enjoyed the interview 😊