Counting extensions and file sizes of project folders on GitHub to recommend changes

Over the past several years I have been studying groups on GitHub. Part of the Internet Foundation studies for global communities on the Internet for the past 23 years.

Much of the human cost of learning these various projects, for any group size, is dealing with the many and different formats. A mature project like LLVM-Project (see attached) has what I see as the normal human capabilities curve (where the file sizes are log normal with a peak around 1000 characters). And a long tail of exceptions for filetypes that are seldom used, but often critical.

I see you using a database that I have not seen or tried yet. I have been downloading source code folders to my computer, (I am also reading, parsing and analyzing the contents of the files). I would like to put these kinds of tools and analyses where others can try them. I mostly use Javsascript (with a localhost for access to hardware and file system services).

Would it be too much to get a directory of all files in all projects on GitHub so I can analyze the maturity and character of them all? I can tell a lot about the learning curve and costs involved for a project just by looking at the source code folder and repositories.

I want to recommend different practices for GitHub and these kinds of projects on the internet. The current practices (global) are wasting too much human time and delaying response of things like “covid”, “global climate change”, “deforestation”, “online education” and others. I have about 20,000 global communities that I have investigated to see why the stall or die or simply take years to do something that can be done in a few days with the proper tools.

I talked about some of the related issues in a video I made yesterday and mentioned where this kind of analysis might fit into the larger picture.

**New Video: Energy Office of Science, PNNL Article, Climate Model, Sharing

Richard K Collins, Director, The Internet Foundation

![Counts of Extensions and Log FileSizes for LLVM-Project](
[Counts Plots of Sourcecode folder extensions and slzes Exts Log10ths.xlsx](


Richard K Collins

About: Richard K Collins

Director, The Internet Foundation Studying formation and optimized collaboration of global communities. Applying the Internet to solve global problems and build sustainable communities. Internet policies, standards and best practices.

