Comment on Harvard.edu missing sitemap, covid duplicates, duplicate modules

Dear Harvard Web Groups:

I was curious why your sitemap page at https://www.harvard.edu/sitemap is missing.

I am reviewing edu and ac.* sites to see how much duplicate covid material they are posting.

site:harvard.edu is showing 14.8 Million entry pages
site:harvard.edu (“covid” OR “coronavirus” OR “corona virus”) shows 768,000 entry points.

From checking randomly on the Internet, most of the materials are duplicates and variations on a few hundred pages of basic material with only a few specific local references. The research flows are all hampered because of the use of PDF and other print equivalent formats which flatten all the working data, models, data streams and scenarios.

Has anyone ever tried to curate ALL the Harvard.edu covid references? Even separating { Prevention, Treatment, Monitoring, Modeling, Supporting Research } categories helps cut down on the total human cost of Internet search globally. But so far I have not found any serious efforts for one university to clean up its website a single topic, let alone something serious. Leaving the duplicates and variations in place means the search engines have too many false leads. There is no way to ask for materials by purpose or category.

But removing them is NOT the answer. The people in such processes do not have the background to know what is important globally. What you do it find all the entries, classify them, find the responsible persons (not departments, actual people), invite them to join that topic group and then follow their activities and contributions. Keep all the locations and purposes where they can be browsed and filtered. Replace the copies and variations with best in class, and then work with the search engines to give them a map of everyone and everything you have to offer, or are doing.

(“covid” OR “coronavirus” OR “corona virus”) is currently showing 9.01 Billion entry pages on Google. But there is no way to verify their work, nor do they share basic maps of what they found. Any of 20 of your departments could run a few million pages on your site. And a small number of groups (perhaps using common crawl and similar things) could try to map all the EDU domain, at least.

site:edu (“covid” OR “coronavirus” OR “corona virus”) has 704 Million entry points

site:edu has 941 Million entry pages.

Think. If 4.5 Billion Internet users face 9 Billion pages of disorganized and massively duplicated basic information — how large is the search space, and how much does that cost the human species? On one topic? On the 20,000 (roughly) topics that give more than a million results on a search.

Why is “global climate change” taking so long? — Duplicates and variations by millions of people earnestly posting materials on the Internet. Not working together. The Internet and Harvard.edu site, particularly, are NOT self-organizing.

All I am asking is for someone to think about it a bit. I have given up on anyone actually doing anything. But Harvard.edu could set an example and start a global community to tackle this simple matter of duplicates and variations. Then there might be enough people to try cleaning up the delays and problems in data stream and model sharing. I have spent the last 22 years with the Internet Foundation looking at global communities, their growth and development. Not a single group uses best practices for that scale of problem or opportunity.

Any feedback or contact form should be capable of HTML and links, attachments, CC and BCC. And should also always copy the author the full text and context of the submission. And you should tell people that you are doing it.

You can use ONE such editor per site. I would like ONE small set of such editors on the whole Internet. Count the cost of all the manual editing and programming that goes into:

site:Harvard.edu (“contact us” OR “feedback” OR “comment”) – 467,000 entry points

(“contact us” OR “feedback” OR “comment”) – 13.98 Billion entry points.

What sad and costly duplication of human effort. What a burden on all Internet users, site owners, and all global processes.

Sincere regards,
Richard Collins, Director, The Internet Foundation

Richard Collins

About: Richard Collins

Sculpture, 3D scanning and replication, optimizing global communities and organizations, gravitational engineering, calibrating new gravitational sensors, modelling and simulation, random neural networks, everything else.


Leave a Reply

Your email address will not be published. Required fields are marked *