Thanks Dev. Your article confirms my suspicions about the inherent limitations of AI in its current iterations. Notwithstanding, the mere fact that anyone can access major parts of written knowledge is incredibly useful and empowering. But, it begs the question about how to understand the inherent biases and limitations in any given model without extensive research. In music, “errata lists” showing errors in published versions are very common and useful in performing any given piece because all of them contain errors of some sort. Does such a thing exist for the current models showing bias, holes, etc?
Bias is a contextual thing. From an information theory perspective, it's just a short-cut to information, which depends heavily on how you're framing the problem.
I've been on the board for AI-audits for a while. Have orgs that look at what other people do and whether they comply with best practices.
How useful are these systems without human input? From the results of some tests, the answer seems to be that they are essentially useless. To some extent, this is somewhat hopeful. Look, machines can't completely replace us—their brains will assimilate!
However, on the other hand, this might not be so optimistic. When artificial intelligence takes over the world, maybe it won't kill humans; perhaps it will just drive us into content farms, where we will all be forced to write lists about the "Star Wars" series and contribute our family recipes to Botatouille to keep the models running without crashing.
Read your response again this am and I now understand what you’re saying. Thanks.
I think that translates to how the data selection process works for LLM and ML training. Which means that “all” data is not selected, and that the selected data must be tested and curated for the specific intended purposes. Damn, this process can get mind bogglingly complicated before getting to doing anything with the data. And its power is limited, or “biased,” by the data used.
In thinking about GPT4.o, which I’m now using to learn to use AI, was the training data specific and narrow, or was the data very broad, but not as deep?
Welcome honesty 🙏
<3
Thanks Dev. Your article confirms my suspicions about the inherent limitations of AI in its current iterations. Notwithstanding, the mere fact that anyone can access major parts of written knowledge is incredibly useful and empowering. But, it begs the question about how to understand the inherent biases and limitations in any given model without extensive research. In music, “errata lists” showing errors in published versions are very common and useful in performing any given piece because all of them contain errors of some sort. Does such a thing exist for the current models showing bias, holes, etc?
Bias is a contextual thing. From an information theory perspective, it's just a short-cut to information, which depends heavily on how you're framing the problem.
I've been on the board for AI-audits for a while. Have orgs that look at what other people do and whether they comply with best practices.
Are those audit public?
Great piece. Thank you for writing this. It helped me to take a step back and see things from a broader perspective.
Glad I can help
How useful are these systems without human input? From the results of some tests, the answer seems to be that they are essentially useless. To some extent, this is somewhat hopeful. Look, machines can't completely replace us—their brains will assimilate!
However, on the other hand, this might not be so optimistic. When artificial intelligence takes over the world, maybe it won't kill humans; perhaps it will just drive us into content farms, where we will all be forced to write lists about the "Star Wars" series and contribute our family recipes to Botatouille to keep the models running without crashing.
You might appreciate this podcast on A.I. :
https://soberchristiangentlemanpodcast.substack.com/p/ai-deception-s1-ep-1-of-3
Read your response again this am and I now understand what you’re saying. Thanks.
I think that translates to how the data selection process works for LLM and ML training. Which means that “all” data is not selected, and that the selected data must be tested and curated for the specific intended purposes. Damn, this process can get mind bogglingly complicated before getting to doing anything with the data. And its power is limited, or “biased,” by the data used.
In thinking about GPT4.o, which I’m now using to learn to use AI, was the training data specific and narrow, or was the data very broad, but not as deep?