{"id":8463,"date":"2023-03-02T09:36:33","date_gmt":"2023-03-02T09:36:33","guid":{"rendered":"\/?p=8463"},"modified":"2023-03-02T09:50:37","modified_gmt":"2023-03-02T09:50:37","slug":"all-part-of-speech-groups-working-together-tokenize-the-whole-internet-so-ais-can-work-with-real-languages","status":"publish","type":"post","link":"\/?p=8463","title":{"rendered":"All part of speech groups working together &#8211; tokenize the whole Internet, so AIs can work with real languages"},"content":{"rendered":"<div class=\"default-style\">Paul Rayson,<\/div>\n<div class=\"default-style\"><\/div>\n<div class=\"default-style\" style=\"padding-left: 40px;\">The reason the OpenAI Bing ChatGPT fails is because it uses a bad tokenizer.<\/div>\n<div class=\"default-style\"><\/div>\n<div class=\"default-style\" style=\"padding-left: 40px;\">If the part of speech community would work together, they could standardized the part of speech tokens and code the entire Internet. So it would not have to be scanned and parsed every time.\u00a0 A pre-tokenized, pre-coded, internet would feed straight into GPT and other AIs.<\/div>\n<div class=\"default-style\"><\/div>\n<div class=\"default-style\" style=\"padding-left: 40px;\">The difference is the AIs could work with a foundation of real languages, not arbitrary character sequences.<\/div>\n<div class=\"default-style\"><\/div>\n<div class=\"default-style\" style=\"padding-left: 40px;\">I sent you an earlier email.<\/div>\n<div class=\"default-style\"><\/div>\n<div class=\"default-style\">Richard Collins, The Internet Foundation<\/div>\n<div>\n<hr \/>\n<p>Popular Science @PopSci\u00a0 ChatGPT can actually help you learn to code or prep for an interview. https:\/\/trib.al\/BNsKxHp<br \/>\nReplying to @PopSci<\/p>\n<\/div>\n<div><span data-offset-key=\"3nmr6-0-0\">Be careful!Tested ChatGPT on a wide range of problems, including coding and mathematics. It will glibly give false results that are hard to detect. The failing is largely due to the bad tokenizer used in its training. Recommending an open token system for the whole Internet. <\/span><span data-offset-key=\"3nmr6-1-0\">\ud83d\udd25<\/span><\/p>\n<hr \/>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Paul Rayson, The reason the OpenAI Bing ChatGPT fails is because it uses a bad tokenizer. If the part of speech community would work together, they could standardized the part of speech tokens and code the entire Internet. So it would not have to be scanned and parsed every time.\u00a0 A pre-tokenized, pre-coded, internet would <br \/><a class=\"read-more-button\" href=\"\/?p=8463\">Read More &raquo;<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[43],"tags":[],"class_list":["post-8463","post","type-post","status-publish","format-standard","hentry","category-assistive-technologies"],"_links":{"self":[{"href":"\/index.php?rest_route=\/wp\/v2\/posts\/8463","targetHints":{"allow":["GET"]}}],"collection":[{"href":"\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=8463"}],"version-history":[{"count":3,"href":"\/index.php?rest_route=\/wp\/v2\/posts\/8463\/revisions"}],"predecessor-version":[{"id":8466,"href":"\/index.php?rest_route=\/wp\/v2\/posts\/8463\/revisions\/8466"}],"wp:attachment":[{"href":"\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=8463"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=8463"},{"taxonomy":"post_tag","embeddable":true,"href":"\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=8463"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}