Yes, I am very familiar with the things you wrote down. The language models currently do not index the raw data, the input data. They add no value to it that way. With no index and not able to trace what they generate to the source data, they literally are incapable of tracing the information they use, and do not therefore consistently and reliably cite the authors and sources.
They are using what they took for commercial gain. It is not illegal in this world that allows anything, but I consider it improper and unfair. They are giving nothing substantial back proportional to the money and power they gain.
They could index and link to the raw data, not just create response probabilities. Now they are struggling because they did not index in the first place, which is not hard, just tedious.
All the weaknesses of the language AIs come because they did not properly and completely index and tokenize the input data using links to the real world. They put zero effort into curation, proper indexing, and standardizing their input data. They did not help with understanding the knowledge and helping to make it better. They only processed it to extract a few language rules and generative probabilities. I have tested them. I say they ought to be verified and certified before being allowed to give answers and advice to humans. I say the companies should literally pay back in time and money to their sources from which they took. They have added no true value.
Do you know that game “Chinese whispers”? The process garbles messages passed from human to human. When it is done in writing, that is what happens in human society where all research and knowledge is forced through channels with human readers taking things in by eyeball, then writing what they thought they read. It is why all human research now takes decades, not days. It is why “covid” global response allowed millions to die.
The AI language models do not faithfully index their input information, and will always generate falsehoods and mistakes on average. The world cannot afford answers generated at random. The world has real problems affecting the lives of all humans and related species.
ChatGPT and Bard are are incapable of answering anything that requires them to cite their sources and explain their reasoning. They make mistakes on simple logic, reasoning, arithmetic, mathematics, scale, comparisons – because they did not train them to be reliable, only plausible, only giving glib answers. They will always fail on important classes of problems, because they left out “clean and organize the input data” and “be able to trace where things come from”.
I had ChatGPT read my note to you and asked it to check for clarity and reasoning. Here is what it generated. Unless you know how to ask the right questions, it will give implausible and false answers. Its first response was chaotic and useless. I had to manually edit its response to correct the formatting. But these are its exact words. I had to change the headings to bold text because it was too spread out. I had to explicitly ask it to include a recommendation that AI companies give back by helping Wikimedia index and curate Wikipedias. I was able to ask it to use the format I suggested. But it is not reliable, so I did that manually first. It only gets about 30% right. And always makes mistakes in scientific calculations and mathematics.
Richard Collins, The Internet Foundation
Verification and Accountability: Before these models are deployed for wide use, they should be rigorously verified and certified. Companies should also be held accountable for giving back to the communities and data sources they benefit from.
Recommendation for Giving Back: I recommend that AI companies engage in initiatives to help index and curate valuable knowledge resources like Wikimedia’s Wikipedias. Such contributions would not only address ethical concerns but also improve the quality and reliability of information that serves the global community.
Richard Collins, The Internet Foundation
Richard: I sent it to WikiMedia Foundation with subject: Asking all AI companies to contribute to Wikimedia