All LLMs and sites should use global open tokens for all human knowledge
jietang @jietang We have released GLM-4-520 and have the open-sourced version GLM-4-9B with superior performance beyond Llama-3-8B.
https://github.com/THUDM/GLM-4/blob/main/README_en.md https://pic.x.com/cmonog5nq5
Replying to @jietang
Create a global open resource that has all domain specific knowledge in a form for all human languages and all humans. When anyone adds new things they do not post it on their site, but register it first so it has a place in global open discussions and can be made accessible to all humans.
When Internet sites all tokenize all of their site from global open tokens, it is immediately accessible to all AIs, without going through search engines. And when sites use global open tokens they automatically can be translated to all human and domain specific languages. The tokens themselves are indexed, so all users of any token are indexed in many global groups. Efforts to make sense of (“mathematics”) on the Internet can tackle the whole of that domain on the Internet. But they cannot be allows to monopolized or personally benefit.
There are roughly 7000 human languages and many tens of thousands of domain specific languages. But “the sun” only has one global open token identifier and then links to the terms (and context specific) best translation in all.
“speed of light” would map to all places on the Internet that is used or discussed and it would map, in any place to the whole concept, not just be a bunch of unconnected text or symbols or images or content stored as many arbitrary fonts and character sets, or proprietary or ill supported formats.
