Interesting Content in AI, Software, Business, and Tech- 01/17/2024 [Updates]
Content to help you keep up with Machine Learning, Deep Learning, Data Science, Software Engineering, Finance, Business, and more
Hey, it’s Devansh 👋👋
In issues of Updates, I will share interesting content I came across. While the focus will be on AI and Tech, the ideas might range from business, philosophy, ethics, and much more. The goal is to share interesting content with y’all so that you can get a peek behind the scenes into my research process.
I put a lot of effort into creating work that is informative, useful, and independent from undue influence. If you’d like to support my writing, consider becoming a premium subscriber to my sister publication Tech Made Simple to support my crippling chocolate milk addiction. Use the button below for a lifetime 50% discount (5 USD/month, or 50 USD/year).
A lot of people reach out to me for reading recommendations. I figured I’d start sharing whatever AI Papers/Publications, interesting books, videos, etc I came across each week. Some will be technical, others not really. I will add whatever content I found really informative (and I remembered throughout the week). These won’t always be the most recent publications- just the ones I’m paying attention to this week. Without further ado, here are interesting readings/viewings for 01/17/2024. If you missed last week’s readings, you can find it here.
Reminder- We started an AI Made Simple Subreddit. Come join us over here- https://www.reddit.com/r/AIMadeSimple/. If you’d like to stay on top of community events and updates, join the discord for our cult here: https://discord.com/invite/EgrVtXSjYf.
Community Spotlight- Charley Johnson
Charley Johnson writes the excellent newsletter Untangled. Regular readers will recognize the name because I have shared his writings several times. Charley recently reached out b/c he is writing a book 'AI Untangled' where he brings up a very important point- What if we could transition from ‘minimizing harms’ to transforming the sociotechnical systems that continue propagating those harms? Charley has dropped some super interesting writeups, and I'm sure the book will be quality. Make sure you connect with him to not miss out (please note that I have NO commercial/professional affiliations with this book/Charley in general- I'm just a fan of his work). Here is a preview of the ideas discussed (minor spoiler for an upcoming piece: we will be touching upon the idea of tech reaffirming the status quo and how that can be exclusionary).
If you're doing interesting work and would like to be featured in the spotlight section, just drop your introduction in the comments/by reaching out to me. There are no rules- you could talk about a paper you've written, an interesting project you've worked on, some personal challenge you're working on, ask me to promote your company/product, or anything else you consider important. The goal is to get to know you better, and possibly connect you with interesting people in our chocolate milk cult. No costs/obligations are attached.
Highly Recommended
These are pieces that I feel are particularly well done. If you don't have much time, make sure you at least catch these works.
AI Researcher and man who knows every worthwhile LLM paper in existence- Sebastian Raschka- dropped some really important calculations for how much it would cost to train a model like Llama on Meta's Threads. Gives you a good idea of how expensive LLMs can get.
Doing a back-of-the-envelope calculation, a 7B Llama 2 model costs about $760,000 to pretrain! And this assumes you get everything right from the get-go: no hparam tuning, no debugging, no crashes and restarting. The real cost, including experimentation, is probably millions!It makes me appreciate all these openly available models!The math: On AWS, you pay ~$33/h per 8xA100 instance. In the Llama 2 paper, they cited 184,320 GPU hours for the 7B model. That's 184320 / 8 * 33 = $760,000
A Comprehensive Study of Knowledge Editing for Large Language Models
I saw Seb's thread, and I was curious how costs for LLMs look when considering things like supervision, data drift, and realignment. Ofcourse, man had the perfect paper to answer my question.
Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication. However, a primary limitation lies in the significant computational demands during training, arising from their extensive parameterization. This challenge is further intensified by the dynamic nature of the world, necessitating frequent updates to LLMs to correct outdated information or integrate new knowledge, thereby ensuring their continued relevance. Note that many applications demand continual model adjustments post-training to address deficiencies or undesirable behaviors. There is an increasing interest in efficient, lightweight methods for on-the-fly model modifications. To this end, recent years have seen a burgeoning in the techniques of knowledge editing for LLMs, which aim to efficiently modify LLMs' behaviors within specific domains while preserving overall performance across various inputs. In this paper, we first define the knowledge editing problem and then provide a comprehensive review of cutting-edge approaches. Drawing inspiration from educational and cognitive research theories, we propose a unified categorization criterion that classifies knowledge editing methods into three groups: resorting to external knowledge, merging knowledge into the model, and editing intrinsic knowledge. Furthermore, we introduce a new benchmark, KnowEdit, for a comprehensive empirical evaluation of representative knowledge editing approaches. Additionally, we provide an in-depth analysis of knowledge location, which can give a deeper understanding of the knowledge structures inherent within LLMs. Finally, we discuss several potential applications of knowledge editing, outlining its broad and impactful implications.
Another few LLMs came on HF (you can download them from above). Shoutout Harpreet Sahota 🥑 for this find. He is super on top of such updates, so follow him if you want help keeping up with all the models you can use.
DeciCoder-6B is a 6 billion parameter decoder-only code completion model trained on the Python, Java, Javascript, Rust, C++, C, and C# subset of Starcoder Training Dataset. The model uses variable Grouped Query Attention and has a context window of 2k tokens. It was trained using a Fill-in-the-Middle training objective. The model's architecture was generated by Deci's proprietary Neural Architecture Search-based technology, AutoNAC.
AlphaGeometry: An Olympiad-level AI system for geometry
Deepmind saw the Tesla Bot and Sam Altman's announcement about how close they were to AGI, and decided to come out swinging. There are a lot of really good takeaways from how they taught the AI to generate Olympiad-level solutions, and I would recommend this to anyone trying to build Gen-AI systems. If you'd rather wait, your boy will be breaking this down in more detail.
In a paper published today in Nature, we introduce AlphaGeometry, an AI system that solves complex geometry problems at a level approaching a human Olympiad gold-medalist - a breakthrough in AI performance. In a benchmarking test of 30 Olympiad geometry problems, AlphaGeometry solved 25 within the standard Olympiad time limit. For comparison, the previous state-of-the-art system solved 10 of these geometry problems, and the average human gold medalist solved 25.9 problems.
Magnetic control of tokamak plasmas through deep reinforcement learning
As a super pro-nuclear person, this is legendary. Fingers crossed we see AI Lead to massive breakthroughs in Nuclear Energy which leads to larger-scale adoption in nuclear energy. To those of you who are scared of nuclear, the exceptional science communication channel Kurzgesagt has a great video on it.,
Nuclear fusion using magnetic confinement, in particular in the tokamak configuration, is a promising path towards sustainable energy. A core challenge is to shape and maintain a high-temperature plasma within the tokamak vessel. This requires high-dimensional, high-frequency, closed-loop control using magnetic actuator coils, further complicated by the diverse requirements across a wide range of plasma configurations. In this work, we introduce a previously undescribed architecture for tokamak magnetic controller design that autonomously learns to command the full set of control coils. This architecture meets control objectives specified at a high level, at the same time satisfying physical and operational constraints. This approach has unprecedented flexibility and generality in problem specification and yields a notable reduction in design effort to produce new plasma configurations. We successfully produce and control a diverse set of plasma configurations on the Tokamak à Configuration Variable1,2, including elongated, conventional shapes, as well as advanced configurations, such as negative triangularity and ‘snowflake’ configurations. Our approach achieves accurate tracking of the location, current and shape for these configurations. We also demonstrate sustained ‘droplets’ on TCV, in which two separate plasmas are maintained simultaneously within the vessel. This represents a notable advance for tokamak feedback control, showing the potential of reinforcement learning to accelerate research in the fusion domain, and is one of the most challenging real-world systems to which reinforcement learning has been applied.
Free Will vs. Determinism: How Not to Think Philosophically
One of my new favorite channels, I really like the way Wes Cecil explains philosophy. In the video above, he breaks down the tired debate b/w free will and determinism, and talks about how many discussions can be detracted by false dichotomies and red herrings. And how changing definitions/frames can change the discourse. Absolutely worth a watch.
A Comprehensive Overview of Gaussian Splatting
This was a pretty cool technique that I learned about recently. I'm trying to increase my knowledge of more niche Machine Learning techniques, so if you have any recommendations that you think gets overlooked, shoot me a message.
Gaussian splatting is a method for representing 3D scenes and rendering novel views introduced in “3D Gaussian Splatting for Real-Time Radiance Field Rendering”¹. It can be thought of as an alternative to NeRF²-like models, and just like NeRF back in the day, Gaussian splatting led to lots of new research works that chose to use it as an underlying representation of a 3D world for various use cases. So what’s so special about it and why is it better than NeRF? Or is it, even? Let’s find out!
We're going to keep this one short because I've been in the process of making a few changes to my work (we got some extremely fun things planned).
If you liked this article and wish to share it, please refer to the following guidelines.
Reach out to me
Use the links below to check out my other content, learn more about tutoring, reach out to me about projects, or just to say hi.
Small Snippets about Tech, AI and Machine Learning over here
AI Newsletter- https://artificialintelligencemadesimple.substack.com/
My grandma’s favorite Tech Newsletter- https://codinginterviewsmadesimple.substack.com/
Check out my other articles on Medium. : https://rb.gy/zn1aiu
My YouTube: https://rb.gy/88iwdd
Reach out to me on LinkedIn. Let’s connect: https://rb.gy/m5ok2y
My Instagram: https://rb.gy/gmvuy9
My Twitter: https://twitter.com/Machine01776819
Thanks for the shoutout!