Craig Smith and Mathew Lodge discussing GPT and unit testing – AIs must be verifiable, testable, certified

Craig Smith, Eye for AI: The Future of Large Language Models in AI | Mathew Lodge | Eye on AI #130 at

Craig Smith, You and Mathew Lodge criss-crossed and brainstormed much of the discussion and issues. The “frenzy” is mostly click bait, people wanting lightning to strike by saying the right combination of things. You really do NOT want it to code in old languages, though that can be an option. You want to tell your computer “Help me go in this direction” You do NOT want an AI that you have to tell ever single step and constantly correct and you never can trust. Elsewhere I wrote to treat the black box software (GPT or “reinforcement toolkit” or “IBM thingy” or “Microsoft thingy”) can be treated by looking at who is responsible. Who id going to be a good open partner. Who can you trust? I could fine tune the GPT to write unit test software. But if I know unit test well enough to write it in English or in a series of perfectly timed English instructions that GPT processes and some software interprets. I can follow that. But it is a waste of my time and yours.

GPT has no link to its training data so it will always be unreliable on “where did you find that?” so you will have to micromanage and fix that. It takes more to fix that after their stupid omission, so better to make a new one. They did not use real tokens that are globally shared and accessible to everyone without ambiguity. That is part of what you were saying “semantic” to indicate. No two GPTs can be combined now, unless they share precise tokens and the way the tokens are created or classified is globally accessible auditable and verifiable. Mathew has stepped through code in high level language debuggers, in machine language, in machine code. If he knows microcode or hardware description language. Those HAVE TO WORK, not “sounds like” or “looks like” or “similar” or “maybe”. Most of the automation in global systems that run all parts of human society is engineered, tested, verified, certified, and often “audited” or “monitored” or “spot checked”. I like the word “calibration” – where you are matching real world very specific movements and measurements to predicted (expected, on the schedule to happen, nominal, extreme, “danger”, “somethings broken”).

A master programmer can take the word sequence that GPT generates for examples of working programs as examples, run it and test and fix it. For things I don’t do every day, being reminded or informed of “what is important to consider” when trying to write code the old way, where a human translates what they want to happen and might be seen later, into code that simulates that, code that compares the expected and real – and moves toward “something that works – reliably, verifiable, safely, efficiently and does not break in ways that are a danger to life, property or other more critical systems.

When Y2K came out I checked the global status and activities of all the countries, states and sectors. I answered questions about what would happen at an international level preparing for the worst. But the image I gained from that was computer software groups writing software to scan and test software “in the large”

I took about 200 of the largest GitHub project things like “Python”, “Linux”, “Chromium” and read all the files and their linkages, Most of those are not documented, Much of the hard coded settings are not obvious. Many steps are “implicit”, you cannot tell what might happen, unless you run it and step through it WITH the exact code that will be used later. If you try to unravel the whole thing (find the source code for all the dependencies, that same problem occurs) no human or human group will put everything where it can be found and precisely understood. If you get to compiled binaries for a known machine, you can use the DLL and libraries, you can reverse engineer (high level disassemble) you can go so far on experience (precise memories of what you have done and seen before)

GPT is just an index to the probabilities of word sequences (token sequences) with a bit of flags and tokens and “bag of tokens” thrown in. Since the whole “OpenAI” is closed, locked, untraceable, not trying to help people really – that is a waste of time.

I could not remember the precise word sequences use by groups who model the chemical, isotopic, molecular and particle compositions of stars and the sun. So I ask questions close to that and GPT tries to answers. Sometimes that is faster than googling it. Especially now Google gave up any pretense of offering you a fair index of human knowledge. But you can trick google into helping. You can trick GPT into helping save a little time.

Craig. Focus on what you want to do. Why are you doing these things? Do you need money? Or do you want to “solve some big problem” or “make something wonderful happen”? I am saying it in a teasing way, but I am deadly serious. Every day I go over the needs of 8 Billion humans faced with tens of thousands of “global issues”, “global crises”. shouted and repeated tens of millions or more times.

Really spend a few thousand hours (as I have done) looking at what happens with “5 Billion humans using the Internet try to form a concept of “covid” and what they need to do” is crossed into a search space of “7.5 billion entry points for closely related terms for covid on the internet in many human languages by many “authorities” and “experts” and “reliable sources” and crossed with “most of those entries do not link to sources, repeat a core of a few hundred pages that could have been written on day 1, was already written (I learned epidemiology and pandemics early). But “epidemiology” has 292 Million entry points and those people do not have their act together, and benefit from chaos and misinformation and “stupid searchers”.

The 25th Anniversary of the Internet Foundation is 23 Jul 2023 (Sunday). I have worked on “global issues”, “global use of computers by humans for human purposes”, “all human knowledge’ every day about 12-18 hours a day. it is that hard for a human brain to examine all the pathways and main issues (main topics, main subjects, main pathways, main options, main actions, main what happens, main what might happen). The old flowcharts are NOT wrong. Project managment is directed graphs and bidirectionally linked data structures. Global accounting use double entry and transaction logs. There are tools that work.

When you get the GPT to itself learn to use a damn calculator ( I can show where it ALWAYS make arithmetic errors). a symbolic math library, a logical inference engine for triples, Unit test, airplane repair instructions, instructions for ALL GitHub resources that have been verified to work. The kind of grammar rules we would all use if we actually memorized and ruthlessly followed those rules.

We want to be able to say, “Computer set course for “Alpha Centauri Station” and know “swearing on all the lives of the people on board”: that it will do a nearly perfect job – within engineering tolerances” or better.

Richard Collins, The Internet Foundation

Richard K Collins

About: Richard K Collins

Director, The Internet Foundation Studying formation and optimized collaboration of global communities. Applying the Internet to solve global problems and build sustainable communities. Internet policies, standards and best practices.

Leave a Reply

Your email address will not be published. Required fields are marked *