"Reuters, however, believes it has overcome this problem. It recently launched a service called Calais that takes raw web pages (and, indeed, any other form of data) and does the marking up itself. The acronyms can then get to work. That promises to imbue the streams of unstructured text and data sloshing around the internet with almost instant meaning.
The idea is that any website can send a jumble of text and code through Calais and receive back a list of "entities" that the system has extracted--mostly people, places and companies--and, even more importantly, their relationships. It will, for instance, be able recognise a pharmaceutical company's name and, on its own initiative, cross-reference that against data on clinical trials for new drugs that are held in government databases. Alternatively, it can chew up a thousand blogs and expose trends that not even the bloggers themselves were aware of."The examples are pretty cool, especially the Wikipedia + Amazon API + Calais mashup Semantic Book Suggestions. The Powerhouse Museum's example is pretty good too (see "Auto-generated tags: [Beta]" down the right column).