I am determining which dataset or information sets are machine-readable and which are not. I have a brief list of machine-readable and human-readable formats, but keep running into formats that are not on my lists.
Start by going to Match Me (http://www.networkofinnovators.org/match-me) and selecting open data. Under “I want to learn…” click on more about machine readable data and you’ll get matched with people who can help you. You can write to them or simply tag them here in a post like this @chrswng @david
Unlikely it’s possible to list them all. Too many formats used by scientific organizations and too many legacy machine readable formats used by abandonware.
I think that it’s more important to focus on formats that should be used, not formats that exists in wild.
Open Standards - Project Open Data
5 ★ OPEN DATA
Open Format Definition
Open Data Formats - CfA
File Formats - Open Data Handbook
A few thoughts here:
5 ★ Open Data is the pie in the sky we are all reaching for.
PDFs serve a few great purposes, but are rarely used for them, and their typical implementation is anything but open data. Avoid using PDFs as much as possible. Friends don’t let friends use PDFs.
HTML doensn’t get held in high regard in any of these lists, and that is for great reason, when implemented poorly, HTML is anti-machine readable. Upon proper implementation though, HTML is machine-readable, while also providing many benefits that other formats do not, being that it is a web platform technology.
CSV/JSON are ideal for most cases on the web; W3C is in the process of finalizing specifications for CSV on the Web and JSON-LD.