{"id":14758,"date":"2024-04-24T00:05:39","date_gmt":"2024-04-24T00:05:39","guid":{"rendered":"\/?p=14758"},"modified":"2024-04-24T00:19:16","modified_gmt":"2024-04-24T00:19:16","slug":"next-generation-true-ai-researchers-may-get-it-right-for-all-humans-not-just-a-few","status":"publish","type":"post","link":"\/?p=14758","title":{"rendered":"Next generation true AI researchers may get it right &#8211; for all humans, not just a few"},"content":{"rendered":"<div data-rbd-draggable-context-id=\"0\" data-rbd-draggable-id=\"nghn\">\n<div class=\"\" data-block=\"true\" data-editor=\"5i6gu\" data-offset-key=\"nghn-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"nghn-0-0\"><span data-offset-key=\"nghn-0-0\"><span data-offset-key=\"nghn-0-0\"><span data-offset-key=\"nghn-0-0\">John Hewitt @johnhewtt\u00a0 Ruth-Ann&#8217;s great work building a Jamaican Patois Natural Language Inference dataset was picked up by Vox as part of its video &#8220;Why AI doesn\u2019t speak every language.&#8221; Happy to see Ruth-Ann&#8217;s work (and disparities in NLP across languages) get this general audience coverage. x.com\/ruthstrong_\/st\u2026<br \/>\nRuth-Ann Armstrong\u00a0 @ruthstrong_<br \/>\nCheck out this Vox video I was featured in where I chat about JamPatoisNLI which I worked on with @chrmanning and @johnhewtt! Many thanks to @PhilEdwardsInc for platforming our work https:\/\/youtu.be\/a2DgdsE86ts<br \/>\nReplying to @johnhewtt and @ruthstrong<\/span><\/span><\/span><\/p>\n<hr \/>\n<p><span data-offset-key=\"nghn-0-0\"><br \/>\nOne of the comments on the Vox video is that small languages should write more. I think it would be good to look at statistical compression of sound recording of speech in those languages. TikTok went to short videos, but you can go smaller and use speech, not images. There are FFT based methods, but many simpler methods as well.<\/span><\/p>\n<\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"0\" data-rbd-draggable-id=\"4jh8h\">\n<div class=\"\" data-block=\"true\" data-editor=\"5i6gu\" data-offset-key=\"4jh8h-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"4jh8h-0-0\"><span data-offset-key=\"4jh8h-0-0\">Korea&#8217;s Sejong created Han&#8217;gul so all speakers could learn to write down their sounds. This aimed to break the monopoly of Chinese writers controlling all things. A single encoding scheme, speech to sound codes can cover the old alphabets, but use one encoding scheme for all languages. When all humans can transcribe all language sounds, even when they do not know the meaning, they could write it down and share. I think SEMI industry and related groups would help make devices. And apps on cell phones and computers can do much of it. And speed &#8220;universal sounding speech&#8221; to codes, codes for AI tokenizations, and have sound based data for all spoken-mostly languages.<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"0\" data-rbd-draggable-id=\"ec9pb\">\n<div class=\"\" data-block=\"true\" data-editor=\"5i6gu\" data-offset-key=\"ec9pb-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"ec9pb-0-0\"><span data-offset-key=\"ec9pb-0-0\">\u00a0<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"0\" data-rbd-draggable-id=\"62qo3\">\n<div class=\"\" data-block=\"true\" data-editor=\"5i6gu\" data-offset-key=\"62qo3-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"62qo3-0-0\"><span data-offset-key=\"62qo3-0-0\">If all the knowledge about covid experiences had been accepted in lossless form during the chaos (rather than forcing &#8220;writers&#8221; to do all the recording, biased as it was) then perhaps a global picture of what was happening would have emerged much faster. Voices and stories sent direct to a global open database that all could know was not biased, because it is built that way. <\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"0\" data-rbd-draggable-id=\"1oqm3\">\n<div class=\"\" data-block=\"true\" data-editor=\"5i6gu\" data-offset-key=\"1oqm3-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"1oqm3-0-0\"><span data-offset-key=\"1oqm3-0-0\">\u00a0<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"0\" data-rbd-draggable-id=\"bmkh4\">\n<div class=\"\" data-block=\"true\" data-editor=\"5i6gu\" data-offset-key=\"bmkh4-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"bmkh4-0-0\"><span data-offset-key=\"bmkh4-0-0\">Sound recording and encoding of speech is smaller because human speech capabilities for most people is never trained. Encoded speech (highly compressed but maintaining the meaning of the tokens and words) is what words do. How the codes are written down does not matter as much as that there are unique codes for transmitting words of languages.<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"0\" data-rbd-draggable-id=\"ehp7r\">\n<div class=\"\" data-block=\"true\" data-editor=\"5i6gu\" data-offset-key=\"ehp7r-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"ehp7r-0-0\"><span data-offset-key=\"ehp7r-0-0\">\u00a0<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"0\" data-rbd-draggable-id=\"c2qj1\">\n<div class=\"\" data-block=\"true\" data-editor=\"5i6gu\" data-offset-key=\"c2qj1-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"c2qj1-0-0\"><span data-offset-key=\"c2qj1-0-0\">I think that human sounds to AI communication could potentially have much higher bandwidth, perhaps adding in electro-, magneto- and audio- physiological data as well.<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"0\" data-rbd-draggable-id=\"euueg\">\n<div class=\"\" data-block=\"true\" data-editor=\"5i6gu\" data-offset-key=\"euueg-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"euueg-0-0\"><span data-offset-key=\"euueg-0-0\">\u00a0<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"0\" data-rbd-draggable-id=\"8fdgu\">\n<div class=\"\" data-block=\"true\" data-editor=\"5i6gu\" data-offset-key=\"8fdgu-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"8fdgu-0-0\"><span data-offset-key=\"8fdgu-0-0\">Good luck with your projects.<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"0\" data-rbd-draggable-id=\"89de7\">\n<div class=\"\" data-block=\"true\" data-editor=\"5i6gu\" data-offset-key=\"89de7-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"89de7-0-0\"><span data-offset-key=\"89de7-0-0\">\u00a0<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"0\" data-rbd-draggable-id=\"ct7l2\">\n<div class=\"\" data-block=\"true\" data-editor=\"5i6gu\" data-offset-key=\"ct7l2-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"ct7l2-0-0\"><span data-offset-key=\"ct7l2-0-0\">Filed as &#8220;Next generation true AI researchers may get it right &#8211; for all humans, not just a few&#8221;<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"0\" data-rbd-draggable-id=\"70m9d\">\n<div class=\"\" data-block=\"true\" data-editor=\"5i6gu\" data-offset-key=\"70m9d-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"70m9d-0-0\"><span data-offset-key=\"70m9d-0-0\">\u00a0<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"0\" data-rbd-draggable-id=\"b1c33\">\n<div class=\"\" data-block=\"true\" data-editor=\"5i6gu\" data-offset-key=\"b1c33-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"b1c33-0-0\"><span data-offset-key=\"b1c33-0-0\">Richard Collins, The Internet Foundation<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"0\" data-rbd-draggable-id=\"650u3\">\n<div class=\"\" data-block=\"true\" data-editor=\"5i6gu\" data-offset-key=\"650u3-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"650u3-0-0\"><span data-offset-key=\"650u3-0-0\">\u00a0<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"0\" data-rbd-draggable-id=\"84cpg\">\n<div class=\"\" data-block=\"true\" data-editor=\"5i6gu\" data-offset-key=\"84cpg-0-0\"><\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>John Hewitt @johnhewtt\u00a0 Ruth-Ann&#8217;s great work building a Jamaican Patois Natural Language Inference dataset was picked up by Vox as part of its video &#8220;Why AI doesn\u2019t speak every language.&#8221; Happy to see Ruth-Ann&#8217;s work (and disparities in NLP across languages) get this general audience coverage. x.com\/ruthstrong_\/st\u2026 Ruth-Ann Armstrong\u00a0 @ruthstrong_ Check out this Vox video <br \/><a class=\"read-more-button\" href=\"\/?p=14758\">Read More &raquo;<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[73,72],"tags":[],"class_list":["post-14758","post","type-post","status-publish","format-standard","hentry","category-all-knowledge","category-all-languages"],"_links":{"self":[{"href":"\/index.php?rest_route=\/wp\/v2\/posts\/14758","targetHints":{"allow":["GET"]}}],"collection":[{"href":"\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=14758"}],"version-history":[{"count":8,"href":"\/index.php?rest_route=\/wp\/v2\/posts\/14758\/revisions"}],"predecessor-version":[{"id":14766,"href":"\/index.php?rest_route=\/wp\/v2\/posts\/14758\/revisions\/14766"}],"wp:attachment":[{"href":"\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=14758"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=14758"},{"taxonomy":"post_tag","embeddable":true,"href":"\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=14758"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}