"This is why adding more parameters to a Neural Network adds to the cost exponentially."
Nonsense. Training cost is linear -- not even quadratic -- in the parameter count.
I know: nontechnical people throw around the term "exponentially" as if it only meant "a lot". But it has a very specific technical meaning, and we computer scientists should use it only when that meaning is applicable.
In fact, parameter counts themselves have been growing exponentially. Training these large networks would not be possible if the cost were even quadratic, never mind exponential, in the parameter count.
I base the exponential statement on the simplification that when we add a new node to a graph, the number of connections go from n^2 to (n+1)^2. I have done some research looking into how training costs change with parameters, but haven't found anything specifically about that. If you have any sources for linearity, I would be happy to look into it.
Not at all mate. I always appreciate inputs because they help me get better. To me, what's important is to get things right. Corrections/improvements are always deeply appreciated, since they are important for the quality of my work.
Also, you're completely right about it being a quadratic relationship. I guess I heard exponential somewhere and accepted it without thinking critically. Goes to show the importance of regularly checking your assumptions about what you thought you know.
I'll go over the sources. Depending on how it goes, I'll either do a paper breakdown or just put a correction post. Either way, I really appreciate you sharing them.
Once again, thank you for your comments. The worst thing that can happen to a writer like me is for my audience to stop correcting me when I make mistakes. I hope you'll continue to share your insights into the future.
"This is why adding more parameters to a Neural Network adds to the cost exponentially."
Nonsense. Training cost is linear -- not even quadratic -- in the parameter count.
I know: nontechnical people throw around the term "exponentially" as if it only meant "a lot". But it has a very specific technical meaning, and we computer scientists should use it only when that meaning is applicable.
In fact, parameter counts themselves have been growing exponentially. Training these large networks would not be possible if the cost were even quadratic, never mind exponential, in the parameter count.
I base the exponential statement on the simplification that when we add a new node to a graph, the number of connections go from n^2 to (n+1)^2. I have done some research looking into how training costs change with parameters, but haven't found anything specifically about that. If you have any sources for linearity, I would be happy to look into it.
This was the only thing I found- https://arxiv.org/pdf/2007.05558.pdf. If you have any other sources I should look into, please do share.
https://en.m.wikipedia.org/wiki/Large_language_model#Training_cost "For Transformer-based LLM, [...] it costs 6 FLOPs per parameter to train on one token..." Ref.: https://arxiv.org/abs/2001.08361
"when we add a new node to a graph, the number of connections go from n^2 to (n+1)^2"
-- That would be a quadratic relationship (n^2), not an exponential one (2^n). Don't confuse them!
I am appreciating your content generally, so please don't be discouraged by my offering a correction.
Not at all mate. I always appreciate inputs because they help me get better. To me, what's important is to get things right. Corrections/improvements are always deeply appreciated, since they are important for the quality of my work.
Also, you're completely right about it being a quadratic relationship. I guess I heard exponential somewhere and accepted it without thinking critically. Goes to show the importance of regularly checking your assumptions about what you thought you know.
I'll go over the sources. Depending on how it goes, I'll either do a paper breakdown or just put a correction post. Either way, I really appreciate you sharing them.
Once again, thank you for your comments. The worst thing that can happen to a writer like me is for my audience to stop correcting me when I make mistakes. I hope you'll continue to share your insights into the future.