Content Recommendations- 10/09/24 [Updates]
Content to help you keep up with Machine Learning, Deep Learning, Data Science, Software Engineering, Finance, Business, and more
Hey, it’s Devansh 👋👋
In issues of Updates, I will share interesting content I came across. While the focus will be on AI and Tech, the ideas might range from business, philosophy, ethics, and much more. The goal is to share interesting content with y’all so that you can get a peek behind the scenes into my research process.
I put a lot of effort into creating work that is informative, useful, and independent from undue influence. If you’d like to support my writing, please consider becoming a paid subscriber to this newsletter. Doing so helps me put more effort into writing/research, reach more people, and supports my crippling chocolate milk addiction. Help me democratize the most important ideas in AI Research and Engineering to over 100K readers weekly. You can use the following for an email template.
PS- We follow a “pay what you can” model, which allows you to support within your means, and support my mission of providing high-quality technical education to everyone for less than the price of a cup of coffee. Check out this post for more details and to find a plan that works for you.
A lot of people reach out to me for reading recommendations. I figured I’d start sharing whatever AI Papers/Publications, interesting books, videos, etc I came across each week. Some will be technical, others not really. I will add whatever content I found really informative (and I remembered throughout the week). These won’t always be the most recent publications- just the ones I’m paying attention to this week. Without further ado, here are interesting readings/viewings for 10/09/2024. If you missed last week’s readings, you can find it here.
Reminder- We started an AI Made Simple Subreddit. Come join us over here- https://www.reddit.com/r/AIMadeSimple/. If you’d like to stay on top of community events and updates, join the discord for our cult here: https://discord.com/invite/EgrVtXSjYf. Lastly, if you’d like to get involved in our many fun discussions, you should join the Substack Group Chat Over here.
Community Spotlight: Dave Farley
Dave Farley runs the excellent YouTube channel Continuous Delivery where he shares a lot of great insights on software engineering and how to improve our processes. Unlike a lot of the other gurus, Dave’s videos are deep, actually teach you deep technical skills, and his thoughts on designing systems/evaluating frameworks always teach me something new. A lot of people tend to only appreciate AI from the research perspective, which heavily discounts the software engineering component of it (no matter how good your ideas are, you won’t benefit if you can’t build a system to serve them to end-users).
If you’re doing interesting work and would like to be featured in the spotlight section, just drop your introduction in the comments/by reaching out to me. There are no rules- you could talk about a paper you’ve written, an interesting project you’ve worked on, some personal challenge you’re working on, ask me to promote your company/product, or anything else you consider important. The goal is to get to know you better, and possibly connect you with interesting people in our chocolate milk cult. No costs/obligations are attached.
Previews
Curious about what articles I’m working on? Here are the previews for the next planned articles-
-
How o-1 was RedTeamed
Highly Recommended
These are pieces that I feel are particularly well done. If you don’t have much time, make sure you at least catch these works.
Why the Training of Generative AI Models Is a Violation of Copyright Law
An interesting argument by Tobias Jensen . The article is well-researched and presents the arguments well. Completely agree with the statement- “In my view, the authors present a convincing case for why AI companies should either compensate or seek permission from copyright holders to use their data for training purposes.”
I do disagree with a large part of the conclusion (that Gen AI would strictly reduce creativity and that it would strictly harm original copyright holders). I think with tweaks, we can ensure that GenAI models that meet the -the “legitimate interests of the right holder are not unduly violated.” I’ve been thinking about this ever since I first wrote against the Data Laundering in AI Art (not paying or crediting artists whose copyrighted work was used to create the training data for AI Art Generators). I think the deeper conversation to be had is who we want the Generative AI to benefit primarily. In my opinion, the biggest problem with Gen AI as a whole is that the biggest beneficiaries are the techno-elite, while the people who suffer the worst outcomes are not part of this group. That’s an article for another time though, and only tangentially relevant here.
Check Tobias’s article out, though; it’s a different way of looking at things, and I think we need to start these conversations-
Based on my summary of the legal analysis by Dornis & Stober, it looks like training generative AI models involves several copyright infringements that are not mandated by EU law. We are now back to the point I raised at the beginning of this post, why does it matter?
Dornis & Stober make a couple of astute observations about the role of legislators that I fully agree with. In summary:
We can expect that human creativity will increasingly be suppressed by AI. For this reason, legislators cannot sit idle by while AI technology is further developed and distributed to the public.
Contrary to current forecasts, AI will likely not cause an increase in the creative production by humans. Rather, we can expect that the results of genuinely human creativity in many professional groups and industries will be replaced to a considerable extent by generative AI output.
The EU legislators should consider that ensuring uncompromising safeguarding of regulatory minimum standards is not about preventing AI innovation but rather about fair competitive conditions and appropriate compensation for the resources used.
Addition is All You Need for Energy-efficient Language Models
Always excited to see innovations in energy efficiency for AI. This is a key field, especially for general purpose foundation models where the most elite performance isn’t key. Particularly interested in seeing how this ties into agentic architectures, and using LMs as controllers (in my experience the biggest delta between major closed and open models has been their ability to act as orchestrators and route more complex tasks).
Large neural networks spend most computation on floating point tensor multiplications. In this work, we find that a floating point multiplier can be approximated by one integer adder with high precision. We propose the linear-complexity multiplication L-Mul algorithm that approximates floating point number multiplication with integer addition operations. The new algorithm costs significantly less computation resource than 8-bit floating point multiplication but achieves higher precision. Compared to 8-bit floating point multiplications, the proposed method achieves higher precision but consumes significantly less bit-level computation. Since multiplying floating point numbers requires substantially higher energy compared to integer addition operations, applying the L-Mul operation in tensor processing hardware can potentially reduce 95% energy cost by element-wise floating point tensor multiplications and 80% energy cost of dot products. We calculated the theoretical error expectation of L-Mul, and evaluated the algorithm on a wide range of textual, visual, and symbolic tasks, including natural language understanding, structural reasoning, mathematics, and commonsense question answering. Our numerical analysis experiments agree with the theoretical error estimation, which indicates that L-Mul with 4-bit mantissa achieves comparable precision as float8_e4m3 multiplications, and L-Mul with 3-bit mantissa outperforms float8_e5m2. Evaluation results on popular benchmarks show that directly applying L-Mul to the attention mechanism is almost lossless. We further show that replacing all floating point multiplications with 3-bit mantissa L-Mul in a transformer model achieves equivalent precision as using float8_e4m3 as accumulation precision in both fine-tuning and inference.
AI instructed brainwashing effectively nullifies conspiracy beliefs
I have a feeling I’m going to put off a lot of you with my stance on this one. The article talks about a research paper that demonstrates “AI is surprisingly effective in countering conspiracy beliefs. It was demonstrated to be effective even against the true believers, individuals normally considered unwavering in their beliefs due to their connection of beliefs to their perceived identity.”
More than the paper, the positive reception of the paper is deeply concerning to me. As the author rightly points out- “Not sure which is more frightening, the research paper or its gleeful embrace by everyone who imagines how effective this will be at changing the worldview of others they believe to have the wrong view. This is astoundingly Orwellian in its very nature. No one should have this power. It is an irresistible tool of authoritarianism and will certainly be used in such capacity.”
Conspiracy theories are a problem. But I don’t think brainwashing is the answer to tackle them. Having AI impart a sense of skepticism and critical thinking is still defensible. But to build systems that push people away from certain beliefs into other ones is just begging for totalitarianism. The research is important, so not going to criticize that, but anyone gleefully pushing AI-enabled brainwashing should really think hard about the consequences of their actions.
Also keep in mind- if AI can push you to believe A, it can also push you to believe the opposite. All the problems we’re facing with growing social divides and extremism- this technology will make it worse.
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
In a similar vein to our analysis of o-1’s limitations for medical diagnosis when it came to key computational ability, this is another great investigation into the limitations, but in a lot more depth. Highly recommend reading this-
Recent advancements in Large Language Models (LLMs) have sparked interest in their formal reasoning capabilities, particularly in mathematics. The GSM8K benchmark is widely used to assess the mathematical reasoning of models on grade-school-level questions. While the performance of LLMs on GSM8K has significantly improved in recent years, it remains unclear whether their mathematical reasoning capabilities have genuinely advanced, raising questions about the reliability of the reported metrics. To address these concerns, we conduct a large-scale study on several SOTA open and closed models. To overcome the limitations of existing evaluations, we introduce GSM-Symbolic, an improved benchmark created from symbolic templates that allow for the generation of a diverse set of questions. GSM-Symbolic enables more controllable evaluations, providing key insights and more reliable metrics for measuring the reasoning capabilities of this http URL findings reveal that LLMs exhibit noticeable variance when responding to different instantiations of the same question. Specifically, the performance of all models declines when only the numerical values in the question are altered in the GSM-Symbolic benchmark. Furthermore, we investigate the fragility of mathematical reasoning in these models and show that their performance significantly deteriorates as the number of clauses in a question increases. We hypothesize that this decline is because current LLMs cannot perform genuine logical reasoning; they replicate reasoning steps from their training data. Adding a single clause that seems relevant to the question causes significant performance drops (up to 65%) across all state-of-the-art models, even though the clause doesn’t contribute to the reasoning chain needed for the final answer. Overall, our work offers a more nuanced understanding of LLMs’ capabilities and limitations in mathematical reasoning.
The Dirty Business of Weight Loss
Good video, highlighting the problematic incentive structure with and the dodgy record of companies selling appetite suppressants.
Losing weight is mental as much as it is physical. But now, with a single injection, anyone can drop weight without having to change what they eat or how they live. With seemingly minimal side effects and impressive testimonials, weight loss drugs like Ozempic, Zepbound, Wegovy, and Mounjaro have become all the rage. Just like how tech startups have hitched their wagon to AI, drug manufacturers have hitched their valuations on obesity. But despite their altruistic missions and manufactured nobility, Big Pharma through history have demonstrated that they’re not to be trusted. As the opioid epidemic showed, if you give pharmaceutical companies an inch and they’ll take a mile. In their world, drugs are the hammer and everything is a nail. Their goal is to get as many people on as many drugs at as high of a dose and frequency as possible to keep profits up. If Big Pharma succeeds in classifying obesity as a disease just like Oxycontin once classified pain as a disease, they would be able to monetize the greatest patient pool in the world. As the testimonials have continued, the world has also forgotten that the manufacturers of these weight loss drugs are the same companies who endangered millions and killed thousands over decades with insulin price-gouging. In this episode, we’re diving into the dirty business of Big Pharma and weight loss drugs from the perspectives of the two biggest players, Eli Lilly and Novo Nordisk.
CPython Runtime Internals: Key Data Structures & Runtime Bootstrapping
Abhinav Upadhyay always scares me with how much detail he puts into every piece. This is a long deep-dive, and there’s a lot I don’t understand, but I’m interested in the way the system manages all the states, especially when threading gets involved. I want to study this in more detail, because I think it might have some interesting insights on algorithmic optimization.
The runtime of a programming language is the crucial piece which orchestrates code execution by integrating various components such as the virtual machine, object system, memory allocators and the garbage collector. It initializes and manages the state of these systems, to do this the runtime maintains few key data structures which are initialized during the startup of the Python process.
In this article, we will look at the definition of the data structures which form the CPython runtime, how they are intialized and understand what is their role in Python code execution.
How Swarms Solve Impossible Problems
I’ve had a personal interest in collective-intelligence AI. I’ll always recommended good content that talks about it.
How 1 Software Engineer Outperforms 138 — Lichess Case Study
Excellent case-study, tons of phenomenal insights.
This is a case study on the Lichess product. A free and open source chess platform with over 4 million monthly active users and 1 core developer. The stack he uses is Scala, MongoDB and Snabbdom hosted on bare metal. This is an incredibly interesting project that I enjoyed studying to understand productivity.
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
I think this paper adds to my earlier thesis about more refined control (not just prompting) over LLM generations being the next big avenue. Good stuff, and definitely the side of LLM research I’m more interested in.
Large language models (LLMs) often produce errors, including factual inaccuracies, biases, and reasoning failures, collectively referred to as “hallucinations”. Recent studies have demonstrated that LLMs’ internal states encode information regarding the truthfulness of their outputs, and that this information can be utilized to detect errors. In this work, we show that the internal representations of LLMs encode much more information about truthfulness than previously recognized. We first discover that the truthfulness information is concentrated in specific tokens, and leveraging this property significantly enhances error detection performance. Yet, we show that such error detectors fail to generalize across datasets, implying that — contrary to prior claims — truthfulness encoding is not universal but rather multifaceted. Next, we show that internal representations can also be used for predicting the types of errors the model is likely to make, facilitating the development of tailored mitigation strategies. Lastly, we reveal a discrepancy between LLMs’ internal encoding and external behavior: they may encode the correct answer, yet consistently generate an incorrect one. Taken together, these insights deepen our understanding of LLM errors from the model’s internal perspective, which can guide future research on enhancing error analysis and mitigation.
Other Content
Distilling the Knowledge in a Neural Network
Always helpful to read the classic papers to get better appreciation of some of their foundations. Thank you to Max Buckley for sharing this (I had never read this paper prior to his share)-
A very simple way to improve the performance of almost any machine learning algorithm is to train many different models on the same data and then to average their predictions. Unfortunately, making predictions using a whole ensemble of models is cumbersome and may be too computationally expensive to allow deployment to a large number of users, especially if the individual models are large neural nets. Caruana and his collaborators have shown that it is possible to compress the knowledge in an ensemble into a single model which is much easier to deploy and we develop this approach further using a different compression technique. We achieve some surprising results on MNIST and we show that we can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model. We also introduce a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse. Unlike a mixture of experts, these specialist models can be trained rapidly and in parallel.
AI Reading List 1: Understand Transformers, Reflection-70B Update, and LLMs Still Cannot Reason
An extremely comprehensive list of updates by Logan Thorneloe .
Create a Universe From a Simple Rule — Mandelbrot Explained In JavaScript
The video is funny, informative, and beautiful. Great watch, if you’re looking to appreciate the beauty in Math.
The Platonic Representation Hypothesis
Personally, I think this paper is a lot less shocking than people are claiming for 3 reasons-
We’re largely using the same datasets (sampled/weighed differently, but largely similar corpora) and tuning in very similar ways. I’d also bet that a lot of data providers use GPT to create datasets instead of people- which would reduce diversity even more).
Data is the result of stripping away information signals to let your model focus on the most important insights. This means data by it’s very nature only captures certain kinds of information, which is another blow for diversity. We’ve covered this idea in more detail here and here.
A lot of ML training is specifically oriented towards reconstruction/matching the ground truth signals.
It’s not surprising that scaling up training would lead to similar representations, but I did this paper was interesting nevertheless.
We argue that representations in AI models, particularly deep networks, are converging. First, we survey many examples of convergence in the literature: over time and across multiple domains, the ways by which different neural networks represent data are becoming more aligned. Next, we demonstrate convergence across data modalities: as vision models and language models get larger, they measure distance between datapoints in a more and more alike way. We hypothesize that this convergence is driving toward a shared statistical model of reality, akin to Plato’s concept of an ideal reality. We term such a representation the platonic representation and discuss several possible selective pressures toward it. Finally, we discuss the implications of these trends, their limitations, and counterexamples to our analysis.
Learning to Extract Structured Entities Using Language Models
I have to experiment more to see how good this metric ends up being, but it’s definitely an interesting idea. Will break it down if I my experiments with this do well (let me know how it goes for y’all too).
Recent advances in machine learning have significantly impacted the field of information extraction, with Language Models (LMs) playing a pivotal role in extracting structured information from unstructured text. Prior works typically represent information extraction as triplet-centric and use classical metrics such as precision and recall for evaluation. We reformulate the task to be entity-centric, enabling the use of diverse metrics that can provide more insights from various perspectives. We contribute to the field by introducing Structured Entity Extraction and proposing the Approximate Entity Set OverlaP (AESOP) metric, designed to appropriately assess model performance. Later, we introduce a new Multistage Structured Entity Extraction (MuSEE) model that harnesses the power of LMs for enhanced effectiveness and efficiency by decomposing the extraction task into multiple stages. Quantitative and human side-by-side evaluations confirm that our model outperforms baselines, offering promising directions for future advancements in structured entity extraction. Our source code and datasets are available at this https URL.
What three experiments tell us about Copilot’s impact on productivity
Seems to highlight the same findings as the earlier studies (lesser skilled people benefit more). I wonder how much of that is because we haven’t found ways to test for and quantify “higher levels of thinking”. The part about less experienced devs also being more willing to accept changes and thus inserting possible security risks is also important. Overall, good summary by Abi Noda
Two insights from this study stood out:
Less experienced developers (both in tenure and overall) accept AI-generated suggestions more frequently than do their more experienced counterparts. This raises concerns, especially given the potential for AI to produce buggy or outdated code. But rather than limiting the use of AI tools, the focus should be on education. A recent
Across the three experiments, 30–40% of developers opted not to adopt Copilot. This underscores a key point: simply providing access to AI tools isn’t enough to realize the productivity gains they promise.
If you liked this article and wish to share it, please refer to the following guidelines.
Reach out to me
Use the links below to check out my other content, learn more about tutoring, reach out to me about projects, or just to say hi.
Small Snippets about Tech, AI and Machine Learning over here
AI Newsletter- https://artificialintelligencemadesimple.substack.com/
My grandma’s favorite Tech Newsletter- https://codinginterviewsmadesimple.substack.com/
Check out my other articles on Medium. : https://rb.gy/zn1aiu
My YouTube: https://rb.gy/88iwdd
Reach out to me on LinkedIn. Let’s connect: https://rb.gy/m5ok2y
My Instagram: https://rb.gy/gmvuy9
My Twitter: https://twitter.com/Machine01776819