Language is only an end product. It is derived from intelligence.
The intelligence is everything that created the language and the training corpus in the first place.
When AI is able to create entire thoughts and ideas without any concept of language, then we will truly be closer to artificial intelligence. When we get to this point, we then use language as a way to let the AI communicate its thoughts naturally.
Such an AI would not be accused of “stealing” copyrighted work because it would pull its training data from direct observations about reality itself.
As you can imagine, we are no where near accomplishing the above. Everything an LLM is fed today is stuff that has been pre-processed by human minds for it to parrot off of. The fact that LLMs today are so good is a testament to human intelligence.
I'm not saying that language necessarily is the biggest stumbling block on (our) road towards AI, but it is a very prominent feature that we have used to distinguish our capabilities from other animals long before AI was even conceived of. So the current successes with LLMs are highly encouraging.
I'm not buying the "current AI is just a dumb parrot relying on human training" argument, because the same thing applies to humans themselves-- if you raise a child without any cultural input/training data, all you get is a dumb cavemen with very limited reasoning capabilities.
"I'm not buying the "current AI is just a dumb parrot relying on human training" argument [...]"
One difficulty. We know that argument is literally true.
"[...] because the same thing applies to humans themselves"
It doesn't. People can interact with the actual world. The equivalent of being passively trained on a body of text may be part of what goes into us. But it's not the only ingredient.
Clearly, language reflects enough of "intelligence" for an LLM to be able to learn a lot of what "intelligence" does just by staring at a lot of language data really really hard.
Language doesn't capture all of human intelligence - and some of the notable deficiencies of LLMs originate from that. But to say that LLMs are entirely language-bound is shortsighted at best.
Most modern high end LLMs are hybrids that operate on non-language modalities, and there's plenty of R&D on using LLMs to consume, produce and operate on non-language data - i.e. Gemini Robotics.
The intelligence is everything that created the language and the training corpus in the first place.
When AI is able to create entire thoughts and ideas without any concept of language, then we will truly be closer to artificial intelligence. When we get to this point, we then use language as a way to let the AI communicate its thoughts naturally.
Such an AI would not be accused of “stealing” copyrighted work because it would pull its training data from direct observations about reality itself.
As you can imagine, we are no where near accomplishing the above. Everything an LLM is fed today is stuff that has been pre-processed by human minds for it to parrot off of. The fact that LLMs today are so good is a testament to human intelligence.