22 Comments

I love splines. I call them the flexible rulers of mathematical modeling. 😉

Expand full comment
author

Splines are GG

Expand full comment

Incredible article. Do you mind if I cross-post it sometime in the near future? I was thinking of writing something up about KANs but this is awesome.

Also, getting Yamcha'ed 🤣

Expand full comment
author

Please. I'd be super happy if you x posted

Expand full comment

This is a really useful article, thank you so much! 💗

Expand full comment
author

Glad you liked it

Expand full comment
Jun 17Liked by Devansh

Philipp Lahm is not underrated. 😂

Expand full comment
author

Hell nah. Super underrated Baller.

Expand full comment
Jun 17Liked by Devansh

People who know nothing about football don’t rate him. People who do, rate him. But is he better than Cafu? Dani Alves? He’s rated just fine. Plus he never played for Stoke.

Expand full comment
author

Can't believe you put Dani Alves in the same conversation as Lahm. You Barca fans are wild

Expand full comment

Nice to read a new take on KAN! Also very curious to see what people hack together with KANs

Expand full comment
author

Very exciting times ahead

Expand full comment
Jun 23Liked by Devansh

There are several wrong statements in this article. KAN is not new. First, article is published in 2021 by Andrew Polar and Mike Poluektov. Find it. Second, it is not slower than MLP. Visit OpenKAN.org and find there source code that runs quicker than MLP, there are comparisons.

Expand full comment
author

Andrew. Thank you for your comments. A few things-

1) I call it new b/c the authors say they "introduced KANs". The related works section talks about how other approaches were not promising, and that "Our contribution lies in generalizing the network to arbitrary widths and depths, revitalizing and contexualizing them in today’s deep learning stream, as well as highlighting its potential role as a foundation model for AI + Science."

2) Similarly, I didn't make the claim about KANs being slower. The paper says that the Training is slower and I reported that. Didn't say that inference would be slower or anything in those lines.

Both those statements, if untrue, would be best taken up by the authors. If you'd like to publish a contradiction- you are free to come on this newsletter and publish it (as long as you can show the results).

You say several wrong statements. Are there any others, besides this?

Expand full comment

OK. The following is wrong: 1. Lack of research. We published 2 articles with examples, several years before. There are others for example: [KASAM: Spline Additive Models for Function Approximation Heinrich van Deventer ∗ , Pieter Janse van Rensburg and Anna Bosman † Department of Computer Science, University of Pretoria]. They suggested spline model. 2. Slow training. MIT offered only one training method. We used completely different (it is published) and see a quicker training. Here is comparison: http://openkan.org/triangles.html it is quicker than MLP. And this is not the limit. I wrote this code quickly, did not work for years, like MLP people polished code for decades. There is a room for improvement. 3. Catastrophic forgeting. That is complete nonsense. All networks and models forget previous training. In KAN the function values are arguments in another functions, how can you remember previous training when retrain. Such things should be proven theoretically in math papers not stated. MIT article just say that without theoretical backup.

Expand full comment

MIT paper has reference on our paper, which in turn has other references. Google the word KASAM and you'll find KAN with splines before MIT papers.

Expand full comment
author

I think you're misunderstanding something here. The purpose of this article is to break down a specific paper (the one quoted) not the entirety of the research into the space. There is only so much time and space I have to cover things, and of course information is being left out. The goal here is to make sure that people can understand this generalized KAN paper that attracted attention. If they're interested in the details, they can always Google more themselves. If you'd like to do a guest post here highlighting the background of KANs in more detail, you are more than welcome to.

Regarding catastrophic forgetting - the authors did show an experiment that showed promising results. Did you not find it compelling? Why not?

Expand full comment

"KANs (Kolmogorov-Arnold Networks)- Unlike traditional MLPs (Multi-Layer Perceptrons), which have fixed node activation functions, KANs use learnable activation functions on edges, essentially replacing linear weights with non-linear ones. This makes KANs more accurate and interpretable, and especially useful for functions with sparse compositional structures (we explain this next), which are often found in scientific applications and daily life."

Can you please explain how "learnable functions on edges" make KANS more interpretable? I have been giving it some thought but cannot understand it yet. If not possible no worries we can think about it for a while and get back later :)

Expand full comment
author

Hi. That's a great question. There are two dimensions to this-

1) The ability to see univariate functions on all levels tells you how the AI is making decisions. Which is interpretable.

2) The ability to tweak/nudge these things more directly gives you the ability to nudge model (which is a more naked form of interpretability).

This section explains it very well- https://www.dailydoseofds.com/a-beginner-friendly-introduction-to-kolmogorov-arnold-networks-kan/#interpretability

Expand full comment

Thank you for your article on Kolmogorov–Arnold Networks (KANs). I am not a specialist, but curious about one point: Why do you not directly cite the source article instead of just linking to it? A huge proportion of your article is a reprint of excerpts from the source. Surely the authors of the source should be formally credited? Did I miss something obvious?

Is this a hesitation, because your article is more of a 'repackaging' of their work rather than a review of it?

Several times you refer to 'the paper' and 'the authors.' Is it this one?

Liu, Z., Wang, Y., Vaidya, S., Ruehle, F., Halverson, J., Soljačić, M., ... & Tegmark, M. (2024). Kan: Kolmogorov-arnold networks. arXiv preprint arXiv:2404.19756.

Ziming Liu1,4  Yixuan Wang2  Sachin Vaidya1  Fabian Ruehle3,4 James Halverson3,4  Marin Soljačić1,4  Thomas Y. Hou2  Max Tegmark1,4

1 Massachusetts Institute of Technology

2 California Institute of Technology

3 Northeastern University

4 The NSF Institute for Artificial Intelligence and Fundamental Interactions

Cheers,

Leo

Expand full comment
author

When it comes to paper breakdowns, this is how I've always approached it. It's pretty obvious that for a specific breakdown- a quote and/or a screenshot comes from the paper I'm breaking down unless explicitly stated otherwise (I always link to other sources as well). So it is credited/cited. I use this style because it's easy for me to read and not clunky. No reader has complained about it (including original authors of many papers that I've broken down)- so I think this

This is the paper mentioned- https://arxiv.org/html/2404.19756v1#abstract. It's linked in the first sentence of the breakdown.

Expand full comment