AIs are not good calculators because their companies refuse to provide them calculators and computers and tools that humans use.
AIs are not good calculators because their companies refuse to provide them calculators and computers and tools that humans use.
@EnyanZhang I think an easy solution is to train the AIs to use human tools like programming languages, validated computer models, online tools, desktops. The lossy statistical basis and the really bad input data and bad tokening mean they have an average skill level of a freshman in high school and the reasoning ability of a second grader.
But they can be trained to program, compile, test, debug and use computers, as though they were humans.
And, extending that, treat the AIs as you would humans and require them to show and be audited on their training. I would go so far as they have to take human courses and be certified. The problem with that is they can memorize better than humans and pass the tests in schools that only make people memorize. Their writing and designs, performance on real human tasks can be recorded, monitored by humans, checked by high performing AIs, and graded by clients and people and groups affected. I spend some months tracing out how to verify and integrate these low performing and not trustworthy AIs. But none of the companies involved seem the least interested in doing work, and have their AIS get paid and be responsible.
The training data is terrible. If the answer is not in the input data in some form, it cannot be extracted by these simplistic algorithms and brute force. People have tried that since I started in the field (1966). An AI cannot solve Fermat’s theorem. It cannot derive calculus, nor derive Newtons laws. It might find thousand of amateur explanation but it is only researching the questions using the data in its lookups. And these multiple queries and ways of tackling problems are the hand written product of people I am certain never had to solve a problem where human lives or the fate of a company or country was at stake. It is those that people worry about and where overwhelmed professional need help. Not more cheesy and lazy junk like Grok dishes out.
It has a core statistical index of data that was not curated or verified by its authors. But I recommend encode the whole Internet first and then the AIs will have all public shared information in a format for direct use. That statistical processing that works on human languages will NOT work on equations, data that is embedded and wrongly coded and incompletely specified in chatty public sites.
I started the Internet Foundation 23 Jul 1998. But I started working on coding all human knowledge particularly Science Technology Engineering Mathematics Computing Finance Government Organizations Topics (STEMC-FGOT). I have worked in all those fields.
The chatty AIs should use a calculator, or a programming language or a much broader concept of spreadsheet. This is what human do and it work, and it fits how the human work, the human can understand and share and trace and verify.
The AIs should use the best methods in every field. If an AI wants to talk to people working with solar datasets, then the AI needs to learn all the datasets, their formats and tools, units , equations, models, physics, chemistry, atomics, electrodynamics, thermodynamics, and hundreds of related and necessary skills. The people who work in any field have those things sorted out (the good ones) and they are not “search the web and look for patterns in word sequences”
The LLMs were intended for human language translation and they follow implicit grammatical rules reasonably well. But they do not know that 5 is smaller that 1.8932E4 when they generate estimates of something using two different methods. I know problems where there are hundreds of partial solutions, none complete or correct, and the problem is to check and verify and synthesize. As the easy homework one shot answers are taken care of, then more value judgements, fairness issues, courtesies and design choices will be important. I did not really look at your background or I would use something specific you and your group(s) might have tried.
Look at https://en.wikipedia.org/wiki/Wiedemann%E2%80%93Franz_law
The AIs are not going to treat the equations in their compiled and standard form. They will try to deal with them as strings of tokens. And the context to use this ONE named law, where ever is shows up extends to hundreds of books, tens of thousands of papers, many large industries. Electronics alone is derived from hundreds of millions of individual bits is knowledge, repetition, use, and corrections by many tens of millions of people spread over about 100 human languages.
It is the extensive number of steps that puts it way outside the ability of a cheap AI. What is need is to encode using global open references — that the current people and groups doing that job can agree to and help.
Richard Collins, The Internet Foundation
All AIs fail consistently on scientific notation, unit conversions, anything not on the free Internet
I spent about 1500 hours the last two years checking the chat type AIs on scientific notion, units and dimension, use of fundamental and named constants. I have talked to groups about a global effort to fix things. But they hired people who do not have the skills to do precise calculations on real systems in the world. They have not been careful at all with their first efforts, and they depend on bad input data.
OpenAI especially fails consistently on division of scientific notation. There are some reason for that: The tokenizing is drawing from free sources that are not coded properly in the first place. The source data is restricted from tapping copyrighted and proprietary data source and most of real data and knowledge is NOT on the open Internet.
The AIs (ChatGPT, MicroSoft CoPilot, Google Gemina, X Grok particularly) are not assigning sufficient memory and processor time to their answers. That means multiple steps almost always fair, and because the failures are often not obvious except to an expert in the field, any serious projects can accumulate errors that will not be found until planes start falling out of the sky or patients dying in large numbers from quantity mistakes.
The groups in the world who are involved in precise works, calculations, models are not being included in a global effort to validate and check the AIs.
The huge upsurge in “calculators” online. They are are NOT doing complete jobs. They are trying to draw clicks and trying to harvest and monetize things they know. And things they can find (as are the chat AIs in a broader sense).
The AIs have no information on their own capabilities, their own limitations, The people who program and control their development at a client level “we don’t have to be responsible for anything” because the whole things started, nor from “true human wisdom and ability” but “entertaining chatbots”, “pretty pictures” and a few cute demos by young people who have not worked on hard problems at global scale yet.
OpenAI will ALWAYS fail in anything deep that requires more than one equation at a time. It will almost always fail if unit conversions and SI prefixes are involved. It simply scabbed things that were free and easy so it is barely able to function. It is NOT trustworthy for anything that involves human life and I say that because
I know the systems in the world and how things get into computer software. When Y2K came I checked the global status of all countries and sectors, all industries. I edited books on it, advised industries, and checked the Joint Chiefs scenarios for them. Introducing systemic global changes into society and human systems is what I spent the last 26 years checking with the Internet Foundation.
All the chat AIs cannot compare scales and context. Humans pick up millions of clues over a long life and can bring them to bear because they endlessly practice small rules in many situations. The algorithm all these are using are simple linear algebra and Bayesian models with a few tweaks. The ones that have small machine and reduce the bit size they are aiming for something they can sells that works in a domain.
If the programmers and AI handlers do not know how, they are NOT going to be able to know what is important. I think they all ought to go to ‘”how to listen to customers” and “find out what your clients are doing and need” 101.
I cannot write it all here. I have hundreds of conversation in “Open”AI and cannot share them because they have not a clue what “global open formats” mean, and could not calculate their way out of a paper bag.
Sorry, dredging through very poorly conceived and execute software is not pleasant at all.
Richard Collins, The Internet Foundation