20 Comments
User's avatar
Sergei Polevikov's avatar

I love splines. I call them the flexible rulers of mathematical modeling. 😉

Expand full comment
Devansh's avatar

Splines are GG

Expand full comment
Ananya Shahi's avatar

This is a really useful article, thank you so much! 💗

Expand full comment
Devansh's avatar

Glad you liked it

Expand full comment
Steve Raju's avatar

Philipp Lahm is not underrated. 😂

Expand full comment
Devansh's avatar

Hell nah. Super underrated Baller.

Expand full comment
Steve Raju's avatar

People who know nothing about football don’t rate him. People who do, rate him. But is he better than Cafu? Dani Alves? He’s rated just fine. Plus he never played for Stoke.

Expand full comment
Devansh's avatar

Can't believe you put Dani Alves in the same conversation as Lahm. You Barca fans are wild

Expand full comment
Hesam Sheikh's avatar

Nice to read a new take on KAN! Also very curious to see what people hack together with KANs

Expand full comment
Devansh's avatar

Very exciting times ahead

Expand full comment
Andrew Polar's avatar

There are several wrong statements in this article. KAN is not new. First, article is published in 2021 by Andrew Polar and Mike Poluektov. Find it. Second, it is not slower than MLP. Visit OpenKAN.org and find there source code that runs quicker than MLP, there are comparisons.

Expand full comment
Devansh's avatar

Andrew. Thank you for your comments. A few things-

1) I call it new b/c the authors say they "introduced KANs". The related works section talks about how other approaches were not promising, and that "Our contribution lies in generalizing the network to arbitrary widths and depths, revitalizing and contexualizing them in today’s deep learning stream, as well as highlighting its potential role as a foundation model for AI + Science."

2) Similarly, I didn't make the claim about KANs being slower. The paper says that the Training is slower and I reported that. Didn't say that inference would be slower or anything in those lines.

Both those statements, if untrue, would be best taken up by the authors. If you'd like to publish a contradiction- you are free to come on this newsletter and publish it (as long as you can show the results).

You say several wrong statements. Are there any others, besides this?

Expand full comment
Andrew Polar's avatar

OK. The following is wrong: 1. Lack of research. We published 2 articles with examples, several years before. There are others for example: [KASAM: Spline Additive Models for Function Approximation Heinrich van Deventer ∗ , Pieter Janse van Rensburg and Anna Bosman † Department of Computer Science, University of Pretoria]. They suggested spline model. 2. Slow training. MIT offered only one training method. We used completely different (it is published) and see a quicker training. Here is comparison: http://openkan.org/triangles.html it is quicker than MLP. And this is not the limit. I wrote this code quickly, did not work for years, like MLP people polished code for decades. There is a room for improvement. 3. Catastrophic forgeting. That is complete nonsense. All networks and models forget previous training. In KAN the function values are arguments in another functions, how can you remember previous training when retrain. Such things should be proven theoretically in math papers not stated. MIT article just say that without theoretical backup.

Expand full comment
Andrew Polar's avatar

MIT paper has reference on our paper, which in turn has other references. Google the word KASAM and you'll find KAN with splines before MIT papers.

Expand full comment
Devansh's avatar

I think you're misunderstanding something here. The purpose of this article is to break down a specific paper (the one quoted) not the entirety of the research into the space. There is only so much time and space I have to cover things, and of course information is being left out. The goal here is to make sure that people can understand this generalized KAN paper that attracted attention. If they're interested in the details, they can always Google more themselves. If you'd like to do a guest post here highlighting the background of KANs in more detail, you are more than welcome to.

Regarding catastrophic forgetting - the authors did show an experiment that showed promising results. Did you not find it compelling? Why not?

Expand full comment
Logan Thorneloe's avatar

Incredible article. Do you mind if I cross-post it sometime in the near future? I was thinking of writing something up about KANs but this is awesome.

Also, getting Yamcha'ed 🤣

Expand full comment
Devansh's avatar

Please. I'd be super happy if you x posted

Expand full comment
Leo's avatar

Thank you for your article on Kolmogorov–Arnold Networks (KANs). I am not a specialist, but curious about one point: Why do you not directly cite the source article instead of just linking to it? A huge proportion of your article is a reprint of excerpts from the source. Surely the authors of the source should be formally credited? Did I miss something obvious?

Is this a hesitation, because your article is more of a 'repackaging' of their work rather than a review of it?

Several times you refer to 'the paper' and 'the authors.' Is it this one?

Liu, Z., Wang, Y., Vaidya, S., Ruehle, F., Halverson, J., Soljačić, M., ... & Tegmark, M. (2024). Kan: Kolmogorov-arnold networks. arXiv preprint arXiv:2404.19756.

Ziming Liu1,4  Yixuan Wang2  Sachin Vaidya1  Fabian Ruehle3,4 James Halverson3,4  Marin Soljačić1,4  Thomas Y. Hou2  Max Tegmark1,4

1 Massachusetts Institute of Technology

2 California Institute of Technology

3 Northeastern University

4 The NSF Institute for Artificial Intelligence and Fundamental Interactions

Cheers,

Leo

Expand full comment
Devansh's avatar

When it comes to paper breakdowns, this is how I've always approached it. It's pretty obvious that for a specific breakdown- a quote and/or a screenshot comes from the paper I'm breaking down unless explicitly stated otherwise (I always link to other sources as well). So it is credited/cited. I use this style because it's easy for me to read and not clunky. No reader has complained about it (including original authors of many papers that I've broken down)- so I think this

This is the paper mentioned- https://arxiv.org/html/2404.19756v1#abstract. It's linked in the first sentence of the breakdown.

Expand full comment
User's avatar
Comment deleted
Jun 17
Comment deleted
Expand full comment
Devansh's avatar

Hi. That's a great question. There are two dimensions to this-

1) The ability to see univariate functions on all levels tells you how the AI is making decisions. Which is interpretable.

2) The ability to tweak/nudge these things more directly gives you the ability to nudge model (which is a more naked form of interpretability).

This section explains it very well- https://www.dailydoseofds.com/a-beginner-friendly-introduction-to-kolmogorov-arnold-networks-kan/#interpretability

Expand full comment