The media frenzy surrounding ChatGPT and other large language model AI systems has covered a range of topics, from prosaic large language models that could replace traditional web search to worries that AI will eliminate many jobs, and overstretched artificial intelligence. Intelligence creates a threat level of extinction to humanity.
All of these themes have one thing in common: large language models herald artificial intelligence to replace humans.
But large language models, despite their complexity, are actually pretty dumb. Despite the name “artificial intelligence,” they rely entirely on human knowledge and labor. Of course, they cannot reliably generate new knowledge, but not only that.
ChatGPT cannot learn, improve, or even stay up-to-date without humans providing new content and telling it how to interpret that content, let alone programming the model and building, maintaining, and powering its hardware. To understand why, you first have to understand how ChatGPT and similar models work, and the role humans play in making them work.
How ChatGPT works
Broadly speaking, large language models like ChatGPT work by predicting which characters, words, and sentences should be in order given a training dataset. In the case of ChatGPT, the training dataset contains a large amount of public texts scraped from the Internet.
Imagine I trained a language model with the following set of sentences: A bear is a large, furry animal. Bears have claws. Bears are actually robots. Bears have noses. Bears are actually robots. Bears sometimes eat fish. Bears are actually robots.
The model is more inclined to tell me that bears are secret robots than anything else, since that sequence of words occurs most frequently in its training dataset. This is clearly a problem for models trained on error-prone and inconsistent datasets, including all datasets, even academic literature.
People write a lot of different articles about quantum physics, Joe Biden, healthy eating, or the January 6th uprising, some more valid than others. How is the model supposed to know what to say when people say a lot of different things? The Need for Feedback This is where feedback comes in. If you use ChatGPT, you’ll notice that you can choose to rate replies as good or bad. If you rate them bad, you’ll be asked to provide an example of a good answer. ChatGPT and other large language models use feedback from users, development teams, and contractors hired to label the output to learn which answers, and which predicted sequences of text, are good or bad.
ChatGPT cannot compare, analyze or evaluate arguments or information on its own. It can only generate text sequences that are similar to those that others have used when comparing, analyzing, or evaluating, preferring those that are similar to answers it has been told to be good in the past.
So when the model gives you a good answer, it uses a lot of human power to tell it what is a good answer and what is not. There are hundreds of thousands of human workers hidden behind the screen, and they will always be needed if the model is to continue to improve or expand its content coverage.
Hundreds of Kenyan workers spent thousands of hours reading racist, sexist and disturbing text from the darkest recesses of the internet, including graphic depictions of sexual violence, according to a recently published investigation by Time magazine reporters , and label it to teach ChatGPT not to copy such content. content.
They make no more than $2 an hour, and many report understandably suffering psychological distress as a result of the work.
What ChatGPT can’t do
The importance of feedback can be seen directly from ChatGPT’s propensity to “hallucinate”; that is, to confidently provide inaccurate answers. Even when good information on the topic is widely available on the Internet, ChatGPT cannot provide good answers on a topic without training.
You can try it yourself by asking ChatGPT about more or less obscure things. I found that asking ChatGPT to summarize the plots of different works of fiction was particularly effective, as the model seemed to be trained more rigorously on non-fiction than fiction.
In my own tests, ChatGPT summed up JRR’s plot.Tolkien’s lord of the rings, a very famous novel with only a few mistakes.But its summary of Gilbert and Sullivan pirates of penzance and Ursula K. Le Guin’s dark left hand – both are slightly less niche, but by no means arcane – close to crazy free play with characters and place names. It doesn’t matter how good the respective Wikipedia pages of these works are. Models need feedback, not just content.
Because large language models don’t actually understand or evaluate information, they rely on humans to do it for them. They are parasitic on human knowledge and labor. As new sources are added to their training dataset, they require new training on whether and how to construct sentences from those sources.
They cannot evaluate the accuracy of news reports. They cannot evaluate arguments or weigh pros and cons. They can’t even read encyclopedia pages, only make statements consistent with them, or accurately summarize the plot of a movie. They rely on humans to do all these things for them.
Then they interpret and remix what humans say, and rely on more humans to tell them whether their interpretation and remix is good. If common knowledge on certain topics changes—for example, whether salt is bad for the heart, or whether early-stage breast cancer screening is useful—they will need to undergo extensive retraining to incorporate the new consensus.
In short, large language models, far from being the harbinger of fully independent AI, illustrate the utter reliance of many AI systems, not only on their designers and maintainers, but also on their users. So if ChatGPT gives you a good or useful answer about something, remember to thank the thousands of hidden people who wrote the words it worked on and taught it what is good The answer, what is the bad answer.
ChatGPT is far from being an autonomous superintelligence, like all technology, ChatGPT is nothing without us.
Svlook