What Math do you need to be Good at AI
The topics you need to excel in AI- and how much to study them based on your goals.
Based on the messages I get, people trying to get into AI are trapped in the paradox of choice-
There are so many branches of AI that it’s hard to see what to pick.
Even if you know the field, there are so many courses and ideas that you can start learning with.
You also have lots of modalities- textbooks, college courses, YouTube series, Coursera, etc.
As AI matures, different sub-fields will continue to build new directions which will change the “required reading” that people must be familiar with to build AI-based systems. For example- many people who got into AI from Generative AI or NLP might not be familiar with techniques like Gini impurity- which quantifies how likely a randomly selected data point will be misclassified in a decision tree, helping identify the best way to split nodes.
Even I, who got into AI from Machine Learning, don’t know a lot of the older classic techniques like Ant Colony Optimization, Swarm Intelligence, and a lot of Bayesian AI- which are all algorithms I only started exploring much later-
In such a dynamic field, the best bet is to focus on the abstract principles that you can apply across various topics (the “analogies” to help you familiarize yourself with different ideas). To that end, understanding certain mathematical principles will have the largest carry-over across fields. In this article, we will talk about the following-
The 3 personas for AI, as far as learning math is concerned- non-technical, engineer, and researcher (many teams make the mistake of conflating the latter two, which creates problems for both their employees and the company). Each persona interacts with AI differently and thus needs Math to a different degree. By default, most courses you see are targeted toward the researcher persona, which makes them functionally useless (I would argue harmful, given the time and money investment involved) to other groups.
A list of the foundational math concepts that will give you a strong and adaptable foundation (the what to study). We will tailor the whats to how much to study them based on different foundations.
Given how important Math Foundations are (and that we will be splitting this for the different personas)- I think this will be one of the few articles that will be applicable to anyone trying to make major decisions (investing in, policy, strategy, building) wrt to AI. The foundations will help you make better decisions.
Get a premium subscription below for full access to this and all future articles.
Each of these articles takes a long time to research and write, and your premium subscription allows me to deliver the highest quality information to you. If you agree that high-quality work deserves compensation, please consider a premium subscription. We have a flexible subscription plan that lets you pay what you can here.
PS- Many companies have a learning budget that you can expense this newsletter to. You can use the following for an email template to request reimbursement for your subscription.
I provide various consulting and advisory services. If you‘d like to explore how we can work together, reach out to me through any of my socials over here or reply to this email.
Executive Highlights (TL;DR of the article)
Here is my “magnificent 7” math lineup that I believe you should know before getting into AI-
Linear Algebra: The study of vectors (and Vector Spaces), matrices, and linear transformations. It's the foundation for representing and manipulating data in AI/ML, and it is used in everything from image processing to natural language understanding.
Probability and Statistics: The study of uncertainty and data analysis. It's essential for building models that make predictions, evaluate performance, and understand the underlying patterns in data. Thinking probabilistically is an essential skill, so this is one of the skills that everyone should go quite deep into.
Calculus: The study of continuous change. It enables Neural Networks (NNs use derivatives to quantify the change in rate of change of error, which is used to find a function that minimizes error).
Discrete Mathematics: The study of discrete structures like graphs and networks. It's used in algorithms that work with structured data, such as social networks or knowledge graphs. Good for developing problem solving, and seeing how different techniques slot together. More of a focus for advanced people.
Number Theory: The study of the properties of numbers. It's used in cryptography and hashing, which are essential for security and data integrity in AI systems. It’s more advanced, so it's not going to be very useful to everyone.
Linear Optimization: A technique for finding the best solution to a problem with linear constraints. It's used in various AI applications, such as resource allocation and support vector machines. This is another field that I would recommend to everyone, not so much b/c you’ll go around optimizing things, but because the practice of setting up equations to calculate how many vegetables and what you should eat to meet certain constraints around costs, availability, and nutrition (you think I’m joking, but you’ll see this fr) is weirdly challenging and provides very useful skills for framing situations. You can also go for Optimization Theory more generally, but I think the time ROI on that isn’t as great (the non-linear parts will be covered by calculus and its prereqs).
Mathematical Modeling: The process of creating mathematical representations of real-world phenomena. It's essential for designing simulations, predicting system behavior, and optimizing processes. Similar to above, I think practicing modeling situations in different ways can be very helpful (one might argue that Reinforcement Learning is almost entirely won or lost by your model).
Before breaking down how much of these each role should study individually, it’s helpful to first talk about the roles in greater roles.
Understanding the Roles: The Math requirements for our 3 personas lie on a spectrum. In ascending order of requirements, we have-
Non-Technical Roles (Product Managers, Business Analysts, Stakeholders): These roles focus on the strategic and business aspects of AI/ML. They need to understand the capabilities and limitations of AI/ML, communicate effectively with technical teams, and make informed decisions about product development, resource allocation, and business strategy. It’s particularly important for you to be able to understand the challenges faced by developers for two reasons. Firstly, this will ensure you don’t give unreasonable tasks to your devs (I’ve experienced this a few times and it’s never a pleasant experience). Secondly, it will help you pick the right tasks to solve given resource constraints. Nontech roles often use technical terms wrongly (or differently), which causes additional confusion. Knowing the basics of tech will help bridge these gaps.
ML Engineers (Model Developers, Data Scientists, Applied ML): These roles are hands-on and focus on the practical application of AI/ML. They are responsible for building, training, and deploying models, as well as analyzing data and evaluating performance. As an engineer, it’s important to be skilled enough that you can look at algorithms and guess some major attributes (how it’s costs will grow, possible edge cases, implementation speedups, approximations, etc) so that you can effectively take the research presented to you and implement the correct steps quickly. And please, for god sake, don’t waste your time building things from scratch (unless you enjoy it). It’s intellectual posturing, much less useful than people pretend. Bar for bar, there are higher ROI ways to spend your learning time.
AI Researchers (Algorithm Developers, Theoretical ML): These roles are focused on the cutting edge of AI/ML. They are responsible for developing new algorithms, proving theoretical results, and pushing the boundaries of the field. They need a deep and rigorous understanding of the underlying math to innovate and advance the state of the art.
A good researcher has to be on the highest mathematical level of the Learning Pyramid for 2 significant reasons- Firstly, there is a LOT of research out there, and a good researcher must be able to filter most of it out very quickly. Secondly, researchers will often have to take a bunch of research and combine it for their own problems. Great deconstructions require the highest level of appreciation, and so a researchers time is not wasted by digging into the details.
All of that is our pregame, which sets important context to drive our analysis forward. Now for the reason you clicked on this article- exactly how much of each topic should you study, from where, and why? Let’s answer that by role-
For Non-Technical Roles: Focus on understanding the principles and intuition behind the math, not the detailed calculations. Your goal is to develop analytical thinking and the ability to approach complex situations logically. Engage with simple Linear Programming (LP) and probability puzzles to hone these skills. This will enable you to frame problems effectively and communicate with technical teams. Key areas include:
A basic understanding of how data is represented using vectors and matrices, and how transformations work. Don’t bother going too much beyond that for Linear Algebra. Basic YouTube videos and some visualizations will teach you enough. Vector Spaces are a pretty important concept, but tbh, as a non-technical person (NTP), you don’t really need to dig into the nuances to have intelligent conversations with people.
An intuitive grasp of probability, distributions, averages, and the difference between correlation and causation (drill this home; it seems simple but slips past our cognitive biases like its name is Altaïr. Completely unrelated, but 12/13-year-old I thought he had the coldest drip). Solve some problems on Brilliant, or other apps (there’s a great probability puzzles app on Android) to build this skills.
A conceptual understanding of how optimization and gradients are used in machine learning. The concepts of Calculus are a lot easier than people think, especially if you don’t have to calculate things yourself, so simple walkthroughs should help you appreciate the theory of calculus.
Studying common data structures like Tables, Graphs, and Lists and how they change results. Once again, YouTube is a good friend (they have some great Computer Science videos that cover components of IRL systems and Algorithms quickly). Skip this if you’re very busy.
The ability to use LP and probability puzzles to frame problems and think through solutions logically. Once you have built your knowledge base, doing these regularly will open your mind, so spend most of your learning journey doing these constantly. These are like riddles, so once you get the hang of it, it can be pretty fun. Set up your equations, and then leave if you want (computers solve equations pretty reliably, so solving is a relatively low-value skill). For LP- you have plenty of textbooks online. Buy one and the solution manual, and go ham. The math is easy enough to teach yourself, especially the earlier sections which is where I want most of your focus.
To save time, study these directly in the context of ML. So a sparse Matrix/Vector Space might resemble a dataset with missing values, probability distributions can be mapped to different kinds of problems, etc etc. This will help you directly interface with your teams.
For ML Engineers: You need a solid understanding of the core concepts and their practical applications. You should be able to implement algorithms, debug models, and evaluate performance. You also need to be comfortable framing problems using LP and probability. This means you need to understand the how and why of the math. You won’t be building things from scratch, and learning how to do it is a huge waste of time imo. As long as you appreciate the nuances of most things enough to analyze their important behaviors and know when to use what, you can peace out. From our magnificent 7, we get:
Proficiency in core linear algebra operations, matrix multiplication, and basic decompositions.
A strong understanding of probability distributions, Bayes' theorem, hypothesis testing, MLE, MAP, and basic causal inference (this might need to go up depending on your job).
The ability to compute derivatives and gradients and apply the chain rule. No calculations are required, but you should be able to set up final equations and model IRL situations easily.
A solid understanding of gradient descent and its variants, including their tradeoffs and important design decisions. Add tradeoff analysis of any kind of system there.
Familiarity with basic graph theory and combinatorics.
The ability to formulate and solve linear programming problems.
The ability to frame real-world problems as LP and probability problems. These will most likely be simplifications, but a good simple model can act as a powerful proxy.
When you dig into it, you’d be shocked how much ML Engineering is relies on proxies to estimate things. The above proxy is used by Amazon to detect bot ad clicks, which is multi-billion dollar problem for them.
Engineers should use all the resources mentioned for non-technical people to build up their intuition, and then start to split their time between studying System Design, Software Engineering + Computer Science, AI Research Papers, and ML System Design. Don’t skip any of them, but prioritize the last two. My recommended split for this would 1-2-3-4 to prioritize ML-specific skills while still rounding out your knowledge and giving you a broader tool belt. We have lots of free resources available online, so pick them. When critically thinking about ML Algorithms, focus important algorithmic behaviors that we mentioned earlier in the section(how it’s costs will grow, possible edge cases, implementation speedups, approximations, expected behavior, etc).
On top of this, if possible, think about business impacts, ROIs, and how you would do things differently. This is an important benchmark for engineers, and imo too many technical people forget that they aren’t paid to write code, they’re paid to solve problems.
![](https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07efe1ef-f599-49e5-9f5b-4f91f87700bb_700x87.png)
This also applies to researchers, but researchers get more leeway from organizations since research is more uncertain. Speaking of research, let’s discuss their plan next-
![](https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf475e6c-7441-4d06-9829-24f98d7cfc02_1080x1080.jpeg)
For AI Researchers: You require a deep and rigorous understanding of the math, including the theoretical foundations and advanced techniques. You need to be able to develop new algorithms, prove theorems, and push the boundaries of the field. You should also be able to use LP and probability to model complex systems. This means you need to understand the why at the deepest level. Key areas include:
A deep understanding of advanced linear algebra, including functional analysis.
A comprehensive understanding of advanced probability theory, Bayesian nonparametrics, and advanced causal inference.
Proficiency in multivariable calculus, differential equations, and convex optimization.
A deep understanding of advanced graph theory, combinatorics, logic, and set theory.
A strong foundation in advanced number theory.
Expertise in advanced linear and non-linear optimization.
The ability to model complex systems using LP and probability.
From my various conversations, many effective researchers study both their main fields and monitor papers in adjacent ones. Looking at our earlier split of System Design, Software Engineering + Computer Science, AI Research Papers, and ML System Design; a researcher might split 1-1-6-1, while keeping 1 paper for other kinds of domains. This is where studying things like Number Theory, Real Analysis, and Deep Modeling would come into the picture. You don’t have to know them intimately, but hearing about their advancements can spark interesting ideas.
Researchers have no choice but to dig into textbooks, conferences, attend discussion groups, and use generative AI tools to constantly scrutinize things and think about possible enhancements. The last one has been particularly helpful to me, since I’m fully self-taught, and thus a lot of my foundations had holes wrt to terms, symbols, rationales etc.
In the main section, we will be elaborating on these roles and their requirements in more detail. Instead of parroting the points right now, I want to focus on how effectively learning the Mathematical concepts might translate into each role, once you have a functional knowledge base.
Keep reading with a 7-day free trial
Subscribe to Artificial Intelligence Made Simple to keep reading this post and get 7 days of free access to the full post archives.