Large computers hosting associated global communities of global projects
For the Internet Foundation, I am reviewing the role of large computers and computer arrays. What started me on this slightly ahead of schedule was the ExaScale group at E4S-Project.github.io | GitHub website for E4S Project and then today looking at the Dark Energy Survey that leads back to NCSA.
My outline for a Dark Energy Survey video I will make soon are at /?p=1230 and these notes are at /?p=1242
It is rather rough, but I got most of the key points. It is easier to point and describe issues and policies while visiting sites, but then it needs written form to have substance and permanence.
Why I am writing you is for direction. At https://en.wikipedia.org/wiki/Dark_Energy_Survey#Data_management it says that NCSA processes the data. So you have storage and processing capability to do that task.
Could NCSA host a collaborative open site to replace all the fragments of the subcontractors and paid participants and organizations? And for the hundreds of millions on the Internet who might have an interest in the pixels, models, data and images, groups and methods and opportunities and tasks and learning connected to “dark energy survey”?
I will explain in that note and video. This is my first experiment asking anyone. I think it should be a standard part of all the supercomputer and university computer and government and nonprofit computers and server farms – where the stated purpose is “open”, “traceable”, “auditable”, “lossless”, “archived”, “everyone”, “all ages”,”fast learning”.
I am doing this to demonstrate that it is possible to rewrite and reorganize a “small site” like Illinois.edu or NASA.gov
site:illinois.edu has 1.59 Million entry points indicated on google
site:NASA.gov has 3.03 Million entry points
This accretion of “stuff” in websites hinders the whole world.
site:illinois.edu “climate change” has 22,100 entry points
site:illinois.edu “dark energy” has 1,660 entry points
site:illinois.edu (“covid” OR “coronavirus” OR “corona virus”) has 165,000 entry points
None of these global issues are clear on your Illinois.edu site, and not on the Internet. Duplication, variations, copies with no links, copies and postings with no owner or way to contact them, undigested accumulations of things. You could easily have the complete answer to “covid” on a site, but no one could find it with the current organization and navigation on the Internet.
It is possible to deal with that. But it starts with small things like getting groups and their users to work as one community – to exhaust all value and uses of the resources gathered, or to find new connections. The result is a living and traceable and understandable whole, not billions of document fragment thrown onto the Internet.
What is happening is that the “number of visitors” times “search time” times “GDP per capita as a proxy for the value of human time” is going up, not down – for any hard problem. That, and publishing in PDF and forms that strip all intelligence and capability from anything “shared” in print. I have written about that elsewhere.
Is a sense, the 4.8 Billion people with some access to the Internet are each having to look through an average of 20 to 1000 pages for things related to something like “covid” and the number of ways 7.8 billion entries of “covid” on the Internet are searched 20 at a time makes the search space too large. So people and organizations generally give up. I would like to test and quantify that more, but I have been at this 7 days a week for 23 years now and I just cannot do everything.
I set up FEWS.net for USAID and the US State Department to gather all the data on the potential for famine. That was after about a million people died in Africa in the 1984-1985 droughts and famines. It showed me that nothing is impossible, and many “impossible” things only look that way until you have put a small group at it for enough time to simply find and document and map what is known. Not in text that does nothing, but in models, simulations, visualizations, tools and exploration assistants that can help with finding new things and making good estimates and choices.
Richard Collins, Director, The Internet Foundation
—————- Note to firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, email@example.com
It is one coherent community, or ought to be. But the stuff on the Internet is fragmented, every page style and method different, nothing verifiable, no consistent practices. Every person who gets paid or supported could help others and get credit for it. And all the projects and activities (which stretch to cover the whole Internet) can have context icons with about, contact, team, partners, community, sponsors.
For the Internet Foundation, I am looking at many data and model sharing efforts on the Internet. To judge difficulty of use, gather notes on practices, policies, methods, content, groups, and links between efforts. I am posting this note also to /?p=1230 so more people can see it. If you want me to change anything (I am not trying to embarrass you) just send me a note. I think you have great potential, and just need a little help getting onto the Internet. I do not charge for helping people and groups – especially those who have the potential to help millions or billions of others. I don’t know anyone at NSF or AURA and too tired to trace down something that should be in the context icon of every page they sponsor. I hope the “info” people can route to the proper people. This is also a test of responsiveness and cohesion.
I found a link to https://www.darkenergysurvey.org/the-des-project/data-access/ and wanted to look at a couple of images to see what kind of pixels and visualizations to expect.
The link is to “NOAA Portal” at http://archive1.dm.noao.edu/ which says you are in transition to NOIRLab
DarkEnergySurvey should get updated. Actually, it would be a good idea if all references to it were found and updated.
“darkenergysurvey.org” has 9,380 entry points
“dark energy survey” has 219,000 entry points
“TheDesSurvey” has 7 entry points
“DArchives” is not a unique string.
DArchives” “energy” has 231,000 entry points
“Dark Energy Spectroscopic Instrument” has 32,700 entry points
You keep saying “400 scientists” but potentially this data could be used to train tens of thousands of the 1.92 Billion first time learners from 5-20 in the world right now. Because I look at the whole Internet, with about 4.8 Billion people with some access, someone proudly saying they are sharing with 400 sounds rather faint.
If it were my group, I would make sure that every mention of these terms would be linked to a small number of core starting points by a simple floating context icon over every page that contains them. That can be a browser add on. I am looking at different strategies and methods. The main thing is to provide authoritative links, not partial and unlinked copies of things. It should be an organic whole, sensitive in every part, continually improving, continuously aware and responsive.
Now to the NOIRLab:
I am a young person (I am actually 72 now) searching the web for “dark energy” and come across that page above. I get to this “NOIRLab Astro Data Archive”. There are words “Explore our petascale library of astronomical data” but it is not a link, just something they said. I am not sure what “petascale library” means and I cannot hover it. Another one of those sites where they expect you to know who they are and all the insider words.
API Access means more reading long documentation, specialized software tools, and a long delay getting to any “data”. Skip that.
“Login” – but no register. So somewhere I have to get permission, and then I can use that one. Skip that.
What are these huge </> Q ? things. How intrusive and busy and unneccessary. They just make a giant button. Probably someone working on a cell phone or touch pad. Skip or ignore those. Can’t turn them off. How annoying.
Logos for NSF and NORILab, but no hovers, no links, nothing. What a waste of screen space. They should at least have titles “National Science Foundation and what part of the NSF and a brief description and links in a basic hoverbox. NOIRLabs logo should be the floating context symbol. And “About” is standard for those, don’t need a separate “About”. But that is all they show. No hover on about, so I have to open a whole new tab just to glance at who runs this page.
ABOUT: Right Clicked and opened https://astroarchive.noirlab.edu/about/ LONG delay with blank screen then statistics. They don’t know how to right justify a column of numbers, and the stuff is just text so I can’t put it in a spreadsheet. They have no note field to let me add somethings. So it is not collaborative at all. Just the usual “click, go to a whole new page”
As I suspected, copy and paste of the table to Excel gives garbage. You guys try it and see how hard it is to use those numbers. I might want to start my data analysis with at least finding and learning what all those arcane and useless (to me just now) Telescope names mean.
Remember, I am a young person (probably high school) somewhere in the world. I already know how to program in several computer languages, have my own web page, write games and parsers. The usual high school kid. My younger brother and sister got taught these things in school, I had to learn on my own, and now they are ahead of me. These things like “bok23m – 90prime” mean nothing to me. The table has no image sizes, dates, what part of the sky, what area of the sky.
All those strange names need hovers, or columns. I should be able to query ANYTHING on the pages and get more information. And the things that come up, those should be hoverable, no artificial boundaries or limits, just because someone did not know or could not find it. 400 smart people and they can’t make two pages of background hovers on all the basics. “Shame on you” (that is me the 72 year old speaking).
I right clicked and pasted the table as plain text, and summed the column. Pretty thin, but you would be surprised that most tables on the internet don’t add up, or have “numbers” that are not numbers.
Look at this text blob: “This research uses services or data provided by the ” Astro Data Archive” at ” NSF’s National Optical-Infrared Astronomy Research Laboratory“. ” NSF’s OIR Lab” is operated by the ” Association of Universities for Research in Astronomy (AURA), Inc.” under a ” cooperative agreement” with the ” National Science Foundation.”
I just checked, the National Science Foundation is a US Government organization that sponsors American groups and individuals. (My hypothetical young person is from South America and hasn’t memorized these things). https://en.wikipedia.org/wiki/National_Science_Foundation I had to search for it myself. The nsf.gov was too off-putting and does not explain. Wikipedia is flat, has the crudest hovers, but is usually ok.
Web Search: I don’t want to “search the web”. I want to see the data. Why say “search the web”? I have learned little so far, except there are 15 million images in large collections, not a single sample on the landing page.
All Holdings pull down only has “All Holdings” not very useful.
“Target via coordinates”. Yeah, sure. I have no idea where these telescope are, where they are pointed, how much of the sky they use. IF ANY OF THEM HAVE DATA I can actually download. I want pixels, not more ambiguous stumbling blocks.
Object name. (Me: My young person does not know that m31 is an astronomical name. Even if they recognize it as such, they have not memorized its coordinates and extent in the sky. And certainly have not learn RA and Dec and Radius. (72 year old guy talking now — what units are you using??? Decimal degrees? Parsecs, Z, some other distance scale. Come on. You are smarter than that. This multimillion dollar effort that NSF apparently paid for, and 400 smart people have worked on, and none of them bothered to help make the data the most useful and powerful tool for those one in a hundred thousands of kids who like quantitative science and mathematics and computing and engineering? 1.92 B/100,000 = 19,200. And those are likely to affect and influence a half dozen people around them on average. Say about 115,000 bare minimum. And the core ones are going to be the only ones in their class usually. So that many teachers involved. So about 20,000 adults too. And parents. Not a small group. And you 400 ignore all them and put a text only, laborious, ambiguous, and slow portal online? Shame on you.)
Pull downs. For new users, put ALL the dimensions out where people can see them. I tried Observation yp
I cannot carry my example person for these details. Needless to say, they cannot see anything at a glance, so you force them to open the boxes which disappear, losing context. Trying to get the lists out of this code is a real pain. Those dimensions are core to this dataset and you have them so stuck away I cannot just copy and paste. Horrible.
At this point, my head hurts and I see that all the effort went into building yet more “programming required” APIs.
If you want to build global communities of users and make data really accessible to everyone ( the Internet if for everyone ), then you have to expend effort to walk in their shoes. And that means all countries, all languages, all ages, all backgrounds. It is possible. It is not hard, just tedious. And small compared to the cost of every person spending weeks or months or years just to see the things this rich group of 400 got to see.
I am not being facetious or diminishing what you all have done. But I see the needs of the world and this kind of thing is the usual.
I, personally, like noise. I study it in every kind of instrument and in every facet of the universe. So I saw “large camera”, “share” “data” and I got momentarily excited that I might actually be able to get pixels from a few images to count the patterns and properties of the pixels. I don’t care if they are stars or galaxies yet. I hope to learn that later, once I see if the data, its core visualizers, and tools are found and organized.
I know some is noise in the sensor. Some is noise from the earth getting into the sensor. Some is atmosphere. Some interstellar, satellites, and light pollution. But it starts with pixels. In a database, or text file, not a proprietary image format that only a few ten thousand people in a very select community can afford to read. The API. I doubt it has any Internet formats (I have to think of what tools people have, not what a few hundred thousand or a few million have or can afford).
Science starts with numbers and counting. Sharing starts with imagining all the details of how many others will interact with something you want to share. Putting a giant, inaccessible lump of “data” in a file on the Internet is not sharing.
In July 1998 I was thinking about how I wanted to spend the rest of my life. I registered TheInternetFoundation.Org to use to put my notes on the Internet. I had already spent 30 years working with large human populations, setting up data and tools for groups to see the whole of things. I am proud of the Famine Early Warning System (FEWS.net) because it shows that a small group of people can monitor and intervene in something as complex and large as “famine”. The famine process is not some vague thing. I set up the software and computers and data feeds, developed methods and trained the people, learning and sharing with them. The famine proces is extremely precise and there is a lot of data and people. Same with “all sky surveys”, “astronomy”, “astrophysics”, “camera images”, “covid”, “global climate change”, “online education”, “the Internet”.
Anyway, I had already built hundreds of websites and reviewed thousands. I had set up websites for groups and owned and helped with many domains. So when I got an email from Network Solutions asking me to contact a senior vice president at Network Solutions, about TheInternetFoundation.Org, I was curious. I called and he told me that Al Gore had diverted the money that was supposed to pay for the Internet Foundation ($15 per year per domain, about $6 B per year now) to put Internet into neighborhood. The US Attorney General ruled that “taxation without representation” and the whole thing got cancelled. He asked me why would I take on something that was that large and was supposed to have that much funding and resources? I said, “Better one person, than none at all”.
So I spent 20 years just studying and experimenting, scanning and learning, testing ideas, and counting lots of things. Then I started contacting groups about small issues and topics. Tracing through what people are facing. I cannot put it in a few sentences in this message. I hope to share what I found. But some of it is the absolute certainty that “the Internet is for everyone” and “the first purpose of the Internet is the survival of the human species”. It is an extension of the human species, its memory, its knowledge, an instantaneous guide for everyone. Or should be.
The thing that caused my to start reaching out to people is GW170817. I had already measured the speed of gravity using public data from the small but global network of superconducting gravimeters and repeated that with the broadband seismometers that were sensitive enough to track the sun and moon tidal signal. But GW170817 showed that the speed of light and the speed of gravity are identical, not just close, identical. And that meant, to me, that can only happen if they share the same underlying potential. You might not care about such things, but it excited me. And I knew that the people working in so many different places would not grow as a community, unless they had tools to work together. So that is part of why I spent so much time on geophysical, astrophysical, atmospheric and sensor issues on the Internet. Plus covid kicked many brick and mortar groups onto the web – unprepared, poorly trained, poorly equipped for global scale collaboration in real time, in infinite depth.
Richard K Collins, Director, The Internet Foundation