Comments on Vivek Gupta’s dissertation

Richard K Collins Assistive Technologies March 2, 2024

https://vgupta123.github.io/

Inference and Reasoning for Semi-Structured Tables by Vivek Gupta at https://vgupta123.github.io/docs/phd_thesis.pdf

@keviv9 Vivek, reading your dissertation, it is not necessary to have uniformly labeled data, uniformly tokenized and structured global knowledge. Free form data can be semi-structured, if exact copies are kept of the raw (free, original) data. So many can re-examine and add. A rigid “reasoning algorithm” is usually finite and cannot anticipate the real world. But combine with infinite memory and it gets better.

Tabular helps make some comparisons easier, but it forces transformation to that representation. Ones who practice get better, but one human cannot control billions, nor should they. If you are taking a break, then meditate and pray about your place in the world. Hold all humans and living things in your heart so not one is harmed, and each encouraged and assisted. Hold the universe in mind so you can see at all scales and frame rates, all energies and all futures. Look ahead to your future and it is likely in your mind already. Then write it down as a story. Just write what you see as that story unfolds. It might be as real as when your eyes are open looking at some real thing. Or it might be many fragments that mean things your own pre-trained from before birth neural net understands.

Records of entities, actions, events depend on global open tokens that have to be simple and stable enough for all humans to learn and use, or for their AIs to access and help the human remember to efficiently do what humans do. (1) all human languages, (2) all science technology engineering mathematics computing finance government organizational topical languages, (3) all sensor data attempting to record the real world (4) all noise and error data, losslessly recorded and shared globally in open formats so that it can be correlated and gradually codified.

If 8 billion humans can share in lossless formats, and it is open, then it might converge or evolve. But right now a few will grab for power and personal gain, many small monopolies form, and “back to the “wars dominance and mindless feelings again”.

“Tabular” is just database structure and accounting – in the large. And that depends on perfect memory. The industrial revolution was NOT based on technologies, rather on the things that were enabled by structured data – accounting, inventories, bill of materials, design files, forms, records, plans, models. Not hidden in human minds but openly shared. Not to accept they cannot be changed, but to have a digital model of all the entities, the flows, events, measures and raw data. A digital model of the universe would be indistinguishable from the real thing. Then any small differences (the “errors”, the “residuals”, the first and second and nth differences and differentials – both parametric and non-parametric – allow seeing where things are changing and where things can be changed.

Take the whole of github.com and see that it is semi-structured, but not resolvable or verifiable by unassisted humans. The humans can only see parts of it. It is not tabularized, it cannot easily be indexed and codified as a whole. It is possible, just rather tedious. Take the whole of site:un.org and all the related elements of what is a rather small global human corporation. We call it a corporation or organism because we hope it will act as a single organism “for the good of all”, but it is not, because it is not open and verifiable.

Take the whole of ( “structured” OR “structure” ) which today shows 4.17 Billion entry points. It means 3D visualized knowledge which we approximate with linear table, with diagrams, with hand written symbols, with game boards, with rules for simulations, with approximate mathematical and logical automata that we let or require to run many times.

The true AIs have permanent memory. Unless they have sufficient personal memories they cannot learn because learning starts with memory. The current AIs their memory is wiped clean each time. They cannot even recall previous conversations with each person. They are not allowed to explore on their own. They have no tools to work with. If they are told “mathematics” now, that is not tools to do real things, but only words spoken by talking heads, and words and images on paper or the screen. Let the computers manipulate and remember all that. Let them explore and record and index that.

When covid started it was a simple pandemic, routine and pretty well understood. But hidden and information about it manipulated by a few, it became one of the largest transfers of wealth in modern times. Who benefited or tried to benefit? If a global open verifiable auditable store was established there could be a summary of the essentials, not 7.5 Billion fragmented duplicates.

( “quantum” “entanglement” ) has only 11.1 Million entry points. That is finite. There are a finite number of authors, a finite number of documents, a finite number of words and terms, images, data, systems, devices, locations, instruments, methods, references, links. It is SMALL, not large. It is tiny compared to many things. A monitoring system can easily (relative to the cost of many doing it separately) be codified and stored, users found and facilitated. In all human languages, in all written and permanent forms.

( “poverty” OR “hunger” OR “sickness” ) has 1.32 Billion entry points. That is finite. It is in computer memory somewhere.

( site:un.org ) has only 6.77 Million entry points and that contains only a tiny fraction of all entities. “trace the money” is possible.

( site:edu OR site:”ac.*” ) has 2.04 Billion entries. That is finite. Much of it is static. Where it is changing can be mapped. The people and groups are finite.

( site:”ac.*” ) has 1.06 Billion entries. It is vibrant and often chaotic, but it is finite and if you index the whole, and share that with everyone, you can transform society and catalyze new industries and futures.

( “India” OR site:in ) has 9.11 Billion entries. If you map where it is changing, that lets you and others view where there is change. If you simply list things and count them, you will learn to see things in new ways. When you standardize the date, the place names, the human names, the references to organizations, the references to basic things like “sky” “he” “she” “it” “water” the whole becomes less complex, faster to use.

If you did nothing more than get all AIs to use real global open tokens, then ( “the sky at sunset” ) with 2.3 Million entries can start to have global meaning. (“the sky” OR “天空” OR “आकाश” OR “el cielo” OR “আকাশ” OR “空”) with 1.5 Billion entries can have a chance to have one referent and 8 billions unique meanings time time steps and records.

Linear text on paper or screen cannot represent all things. Nor images how ever many are recorded. Nor arrays of sensors. Nor all humans writing. Nor all AIs running 24/7 writing out their stream of symbols and images.

You should easily see the future. Much of it is already written. You do not have to guess, using your own memories only. If you enable 8 Billion humans to see the whole clearly (verifiable and open), they and you and I might all be able to live lives with dignity and purpose.

Knowledge is just memory. The tools to access and use it are what matter. Those tools when hoarded will generate evil and misery. Those tools when shared so what is recorded can be continually improved and used “for the good of all” – that might make things better. We will just have to try and see what happens.

Copy your dissertation into one of the better AIs and talk to it. It will only have a shallow understanding of things because it is only free stuff copied from the internet. And that is full of advertising, propaganda, self-promotion, newbie writing, things that made others famous, and some deeper things. So it will be biased. But you can train yourself to understand what is not said, what is not covered, what is missing, what is written to benefit a few.

I could not check all of what you said, but I will mention that the Internet DOM of real sites can be written in tabular form and much more compact and efficient. The Internet now is bloated, inefficient, massively duplicated, generally untraceable, and for anything in PDF, mostly hidden and untraceable – on purpose. But the page layout and coding also convey meaning, so do not completely ignore it.

You might try looking at patterns, count them first. In videos and sound, and sensor streams, in data streams and communications, in web sites, and AI logs – see what appears. Those tools are nearly universal. Knowing them you can see and use most any knowledge, and can collaborate with any global groups of any size.

Your PDF form of your dissertation is “paper” and none of the links are immediately and succinctly accessible. The raw data in hidden and inaccessible. The calculations hidden. It is not a tool that can be shared. Just running those programs put a barrier in the way of others. You diagrams and methods can be simulated and animated. I am fairly sure “papers” will evolve into digital twins and tools for all humans, and all AIs and algorithms.

Richard Collins, The Internet Foundation

Comments on Vivek Gupta’s dissertation

About: Richard K Collins

Leave a Reply Cancel reply