Open source validator/scraper for new schema.org metadata & HTML5 more generally
Schema.org has [its faults][1], but at least there is now a fairly significant standard for expressing metadata within HTML. As Bing & Google are behind it, there will be adoption.
HTML5 has a large number of new attributes that are quite semantically rich. However, tools built for an earlier era don't know about them.
- Developers would like to know that their content is being expressed well.
- Data wranglers would like to pull structured data from pages.
To support this, we could build a validation and/or a scraper that knows what to do with this content.
[1]: http://lists.freebase.com/pipermail/freebase-discuss/2011-June/006621.html--