People who know nothing about football don’t rate him. People who do, rate him. But is he better than Cafu? Dani Alves? He’s rated just fine. Plus he never played for Stoke.
There are several wrong statements in this article. KAN is not new. First, article is published in 2021 by Andrew Polar and Mike Poluektov. Find it. Second, it is not slower than MLP. Visit OpenKAN.org and find there source code that runs quicker than MLP, there are comparisons.
Andrew. Thank you for your comments. A few things-
1) I call it new b/c the authors say they "introduced KANs". The related works section talks about how other approaches were not promising, and that "Our contribution lies in generalizing the network to arbitrary widths and depths, revitalizing and contexualizing them in today’s deep learning stream, as well as highlighting its potential role as a foundation model for AI + Science."
2) Similarly, I didn't make the claim about KANs being slower. The paper says that the Training is slower and I reported that. Didn't say that inference would be slower or anything in those lines.
Both those statements, if untrue, would be best taken up by the authors. If you'd like to publish a contradiction- you are free to come on this newsletter and publish it (as long as you can show the results).
You say several wrong statements. Are there any others, besides this?
OK. The following is wrong: 1. Lack of research. We published 2 articles with examples, several years before. There are others for example: [KASAM: Spline Additive Models for Function Approximation Heinrich van Deventer ∗ , Pieter Janse van Rensburg and Anna Bosman † Department of Computer Science, University of Pretoria]. They suggested spline model. 2. Slow training. MIT offered only one training method. We used completely different (it is published) and see a quicker training. Here is comparison: http://openkan.org/triangles.html it is quicker than MLP. And this is not the limit. I wrote this code quickly, did not work for years, like MLP people polished code for decades. There is a room for improvement. 3. Catastrophic forgeting. That is complete nonsense. All networks and models forget previous training. In KAN the function values are arguments in another functions, how can you remember previous training when retrain. Such things should be proven theoretically in math papers not stated. MIT article just say that without theoretical backup.
I think you're misunderstanding something here. The purpose of this article is to break down a specific paper (the one quoted) not the entirety of the research into the space. There is only so much time and space I have to cover things, and of course information is being left out. The goal here is to make sure that people can understand this generalized KAN paper that attracted attention. If they're interested in the details, they can always Google more themselves. If you'd like to do a guest post here highlighting the background of KANs in more detail, you are more than welcome to.
Regarding catastrophic forgetting - the authors did show an experiment that showed promising results. Did you not find it compelling? Why not?
"KANs (Kolmogorov-Arnold Networks)- Unlike traditional MLPs (Multi-Layer Perceptrons), which have fixed node activation functions, KANs use learnable activation functions on edges, essentially replacing linear weights with non-linear ones. This makes KANs more accurate and interpretable, and especially useful for functions with sparse compositional structures (we explain this next), which are often found in scientific applications and daily life."
Can you please explain how "learnable functions on edges" make KANS more interpretable? I have been giving it some thought but cannot understand it yet. If not possible no worries we can think about it for a while and get back later :)
Thank you for your article on Kolmogorov–Arnold Networks (KANs). I am not a specialist, but curious about one point: Why do you not directly cite the source article instead of just linking to it? A huge proportion of your article is a reprint of excerpts from the source. Surely the authors of the source should be formally credited? Did I miss something obvious?
Is this a hesitation, because your article is more of a 'repackaging' of their work rather than a review of it?
Several times you refer to 'the paper' and 'the authors.' Is it this one?
When it comes to paper breakdowns, this is how I've always approached it. It's pretty obvious that for a specific breakdown- a quote and/or a screenshot comes from the paper I'm breaking down unless explicitly stated otherwise (I always link to other sources as well). So it is credited/cited. I use this style because it's easy for me to read and not clunky. No reader has complained about it (including original authors of many papers that I've broken down)- so I think this
I love splines. I call them the flexible rulers of mathematical modeling. 😉
Splines are GG
This is a really useful article, thank you so much! 💗
Glad you liked it
Philipp Lahm is not underrated. 😂
Hell nah. Super underrated Baller.
People who know nothing about football don’t rate him. People who do, rate him. But is he better than Cafu? Dani Alves? He’s rated just fine. Plus he never played for Stoke.
Can't believe you put Dani Alves in the same conversation as Lahm. You Barca fans are wild
https://www.reddit.com/r/realmadrid/comments/1d6e1cn/dani_carvajal_the_only_player_to_start_win_six_6/
Nice to read a new take on KAN! Also very curious to see what people hack together with KANs
Very exciting times ahead
There are several wrong statements in this article. KAN is not new. First, article is published in 2021 by Andrew Polar and Mike Poluektov. Find it. Second, it is not slower than MLP. Visit OpenKAN.org and find there source code that runs quicker than MLP, there are comparisons.
Andrew. Thank you for your comments. A few things-
1) I call it new b/c the authors say they "introduced KANs". The related works section talks about how other approaches were not promising, and that "Our contribution lies in generalizing the network to arbitrary widths and depths, revitalizing and contexualizing them in today’s deep learning stream, as well as highlighting its potential role as a foundation model for AI + Science."
2) Similarly, I didn't make the claim about KANs being slower. The paper says that the Training is slower and I reported that. Didn't say that inference would be slower or anything in those lines.
Both those statements, if untrue, would be best taken up by the authors. If you'd like to publish a contradiction- you are free to come on this newsletter and publish it (as long as you can show the results).
You say several wrong statements. Are there any others, besides this?
OK. The following is wrong: 1. Lack of research. We published 2 articles with examples, several years before. There are others for example: [KASAM: Spline Additive Models for Function Approximation Heinrich van Deventer ∗ , Pieter Janse van Rensburg and Anna Bosman † Department of Computer Science, University of Pretoria]. They suggested spline model. 2. Slow training. MIT offered only one training method. We used completely different (it is published) and see a quicker training. Here is comparison: http://openkan.org/triangles.html it is quicker than MLP. And this is not the limit. I wrote this code quickly, did not work for years, like MLP people polished code for decades. There is a room for improvement. 3. Catastrophic forgeting. That is complete nonsense. All networks and models forget previous training. In KAN the function values are arguments in another functions, how can you remember previous training when retrain. Such things should be proven theoretically in math papers not stated. MIT article just say that without theoretical backup.
MIT paper has reference on our paper, which in turn has other references. Google the word KASAM and you'll find KAN with splines before MIT papers.
I think you're misunderstanding something here. The purpose of this article is to break down a specific paper (the one quoted) not the entirety of the research into the space. There is only so much time and space I have to cover things, and of course information is being left out. The goal here is to make sure that people can understand this generalized KAN paper that attracted attention. If they're interested in the details, they can always Google more themselves. If you'd like to do a guest post here highlighting the background of KANs in more detail, you are more than welcome to.
Regarding catastrophic forgetting - the authors did show an experiment that showed promising results. Did you not find it compelling? Why not?
Incredible article. Do you mind if I cross-post it sometime in the near future? I was thinking of writing something up about KANs but this is awesome.
Also, getting Yamcha'ed 🤣
Please. I'd be super happy if you x posted
"KANs (Kolmogorov-Arnold Networks)- Unlike traditional MLPs (Multi-Layer Perceptrons), which have fixed node activation functions, KANs use learnable activation functions on edges, essentially replacing linear weights with non-linear ones. This makes KANs more accurate and interpretable, and especially useful for functions with sparse compositional structures (we explain this next), which are often found in scientific applications and daily life."
Can you please explain how "learnable functions on edges" make KANS more interpretable? I have been giving it some thought but cannot understand it yet. If not possible no worries we can think about it for a while and get back later :)
Hi. That's a great question. There are two dimensions to this-
1) The ability to see univariate functions on all levels tells you how the AI is making decisions. Which is interpretable.
2) The ability to tweak/nudge these things more directly gives you the ability to nudge model (which is a more naked form of interpretability).
This section explains it very well- https://www.dailydoseofds.com/a-beginner-friendly-introduction-to-kolmogorov-arnold-networks-kan/#interpretability
Thank you for your article on Kolmogorov–Arnold Networks (KANs). I am not a specialist, but curious about one point: Why do you not directly cite the source article instead of just linking to it? A huge proportion of your article is a reprint of excerpts from the source. Surely the authors of the source should be formally credited? Did I miss something obvious?
Is this a hesitation, because your article is more of a 'repackaging' of their work rather than a review of it?
Several times you refer to 'the paper' and 'the authors.' Is it this one?
Liu, Z., Wang, Y., Vaidya, S., Ruehle, F., Halverson, J., Soljačić, M., ... & Tegmark, M. (2024). Kan: Kolmogorov-arnold networks. arXiv preprint arXiv:2404.19756.
Ziming Liu1,4 Yixuan Wang2 Sachin Vaidya1 Fabian Ruehle3,4 James Halverson3,4 Marin Soljačić1,4 Thomas Y. Hou2 Max Tegmark1,4
1 Massachusetts Institute of Technology
2 California Institute of Technology
3 Northeastern University
4 The NSF Institute for Artificial Intelligence and Fundamental Interactions
Cheers,
Leo
When it comes to paper breakdowns, this is how I've always approached it. It's pretty obvious that for a specific breakdown- a quote and/or a screenshot comes from the paper I'm breaking down unless explicitly stated otherwise (I always link to other sources as well). So it is credited/cited. I use this style because it's easy for me to read and not clunky. No reader has complained about it (including original authors of many papers that I've broken down)- so I think this
This is the paper mentioned- https://arxiv.org/html/2404.19756v1#abstract. It's linked in the first sentence of the breakdown.