All AIs fail consistently on scientific notation, unit conversions, anything not on the free Internet

https://x.com/yuntiandeng/status/1836114401213989366

Yuntian Deng @yuntiandeng  Is OpenAI’s o1 a good calculator? We tested it on up to 20×20 multiplication—o1 solves up to 9×9 multiplication with decent accuracy, while gpt-4o struggles beyond 4×4. For context, this task is solvable by a small LM using implicit CoT with stepwise internalization. 1/4 https://pic.x.com/et5db9bhnl
Replying to @yuntiandeng

All AIs fail consistently on scientific notation, unit conversions, anything not on the free Internet
 
I spent about 1500 hours the last two years checking the chat type AIs on scientific notion, units and dimension, use of fundamental and named constants. I have talked to groups about a global effort to fix things. But they hired people who do not have the skills to do precise calculations on real systems in the world. They have not been careful at all with their first efforts, and they depend on bad input data.
 
OpenAI especially fails consistently on division of scientific notation. There are some reason for that: The tokenizing is drawing from free sources that are not coded properly in the first place. The source data is restricted from tapping copyrighted and proprietary data source and most of real data and knowledge is NOT on the open Internet. 
 
The AIs (ChatGPT, MicroSoft CoPilot, Google Gemina, X Grok particularly) are not assigning sufficient memory and processor time to their answers. That means multiple steps almost always fair, and because the failures are often not obvious except to an expert in the field, any serious projects can accumulate errors that will not be found until planes start falling out of the sky or patients dying in large numbers from quantity mistakes.
 
The groups in the world who are involved in precise works, calculations, models are not being included in a global effort to validate and check the AIs.
 
The huge upsurge in “calculators” online. They are are NOT doing complete jobs.  They are trying to draw clicks and trying to harvest and monetize things they know. And things they can find (as are the chat AIs in a broader sense).
 
The AIs have no information on their own capabilities, their own limitations,  The people who program and control their development at a client level “we don’t have to be responsible for anything” because the whole things started, nor from “true human wisdom and ability” but “entertaining chatbots”, “pretty pictures” and a few cute demos by young people who have not worked on hard problems at global scale yet. 
 
OpenAI will ALWAYS fail in anything deep that requires more than one equation at a time. It will almost always fail if unit conversions and SI prefixes are involved. It simply scabbed things that were free and easy so it is barely able to function. It is NOT trustworthy for anything that involves human life and I say that because
 
I know the systems in the world and how things get into computer software. When Y2K came I checked the global status of all countries and sectors, all industries. I edited books on it, advised industries, and checked the Joint Chiefs scenarios for them. Introducing systemic global changes into society and human systems is what I spent the last 26 years checking with the Internet Foundation.
 
All the chat AIs cannot compare scales and context. Humans pick up millions of clues over a long life and can bring them to bear because they endlessly practice small rules in many situations. The algorithm all these are using are simple linear algebra and Bayesian models with a few tweaks. The ones that have small machine and reduce the bit size they are aiming for something they can sells that works in a domain.
 
If the programmers and AI handlers do not know how, they are NOT going to be able to know what is important. I think they all ought to go to ‘”how to listen to customers” and “find out what your clients are doing and need” 101.
 
I cannot write it all here.  I have hundreds of conversation in “Open”AI and cannot share them because they have not a clue what “global open formats” mean, and could not calculate their way out of a paper bag.
 
Sorry, dredging through very poorly conceived and execute software is not pleasant at all.
 
Richard Collins, The Internet Foundation
Richard K Collins

About: Richard K Collins

Director, The Internet Foundation Studying formation and optimized collaboration of global communities. Applying the Internet to solve global problems and build sustainable communities. Internet policies, standards and best practices.


Leave a Reply

Your email address will not be published. Required fields are marked *