Retrieving or finding from the Internet is seldom “fair” in a statistical sense.

@alpha_alimamy Alpha Almany Kamara. Is this your paper? You need a website.
 
https://wwjmrd.com/archive/2022/5/1804/heart-disease-prediction-support-system-using-machine-learning-approaches
 
Naive LLMs cannot distill medical wisdom from stuff posted on the free internet, no matter how efficient the algorithms. If they only have partial or wrong information, they cannot make good decisions. When hundreds of millions or billions of humans are affected and they try to share their voices, if the sampling is not fair, and the algorithm has only partial information, the results can be distorted and corrupted.
 
In your datasets, the data was gathered and organized to be an efficient predictor of heart disease. Retrieving or finding from the Internet is seldom “fair” in a statistical sense. Google biases its results and will not simply give verifiably random samples. HuggingFace and Common Crawl do not seem to concern themselves, but they are a bit chaotic.
 
Richard Collins, The Internet Foundation
Richard K Collins

About: Richard K Collins

Director, The Internet Foundation Studying formation and optimized collaboration of global communities. Applying the Internet to solve global problems and build sustainable communities. Internet policies, standards and best practices.


Leave a Reply

Your email address will not be published. Required fields are marked *