The Associated Press and the Media Standards Trust (never heard of them) have proposed a new standard for metadata for news.
Depending on its use and intent, I think it’s a good idea to add more data to news. We all should add into the standard, though, or else it will come from the perspective of big news organizations (whose motive, as is obvious from their “standard,” as as much protection as it is enhancement).
I’ve long argued that we need to operate under an ethic of linking to journalism at its source, for in the link economy, value comes with links, the recipient of the links is the one to take advantage of that value, so to support reporting, we must link to reporting (over rewrites, whether those are from blogs or the Associated Press). So that’s one thing I’d like to add to the standard: “original reporting here.” Yes, that could be gamed, but in a limited population of news sources – at the aggregators of GoogleNews, Daylife, The New York Times, or HuffingtonPost – it would not be hard to police against the gamers.
More: We at CUNY are working with Nokia on assignment software that will wrap metadata around a story from the inception (‘there’s a fire on Main Street, who wants to cover it?’) through to its reporting (this text, photo, audio, or video was captured by this person at this place at this time on this data). Many elements of a story can come together (imagine the collaborative, real-time news produced using Google Wave), each element with data about its provenance. When you think beyond the article as the atomic element of journalism, then you’ll want to incorporate metadata into the heart of reporting as it comes out in an article, a blog, a wiki, a topic page, a whatever.
I would like there to be a standard for adding corrections to work, which could, in turn, enable readers (and linkers) to subscribe to updates.
I’d like to see metadata include links to source material – footnotes – that continues the notion of the provenance of the information within. This could also include footnotes about the author: links to her page, her feed, her identity, how to communicate with her.
Would we want to add traffic data to news (show me the most popular version of this story?).
Tags would be wonderful, of course.
Geo data about the gathering, creation, use, subject, and authorship of the news would be valuable in so many ways.
What else? What other metadata, when it’s offered, would enhance search, would add value to the content, would enhance information about its source and thus its credibility? If a standard is to be a standard, both the creators and users of content (and by users, I mean not just the people formerly known as the readers but also technologists and their algorithms) must add to it and it must be a living beast. Whether it’s this effort (here’s its Google Group) or another effort, join in.
Direct link to a specific timeline if the information is not a “standalone” (and it surely never is a standalone).
What about qualifying tags ? Tags that are part of a quote, tags that are “name dropping”, tags about the topic, tags about new names or novlang (WMD for example)…
Once again the news industry proves it understands neither the law nor technology…
* They seem to think that adding markup to encode “usage rights” will somehow allow them to “protect content.” This simply isn’t the case except for the limited usage of granting rights which would otherwise be restricted under copyright. There is *nothing* that a content publisher can add to a page that will result in legally enforceable usage constraints that are in any way greater than those granted under copyright. (“How can the water rise above its source?) Constraints on usage can *only* be established by either legislatures or by explicit prior contract. If by contract, then the markup isn’t necessary.
* The “Value Added News” proposal relies on the horribly ugly and old fashioned “microformats” style of metadata embedding. A more rational and up-to-date proposal would use RDF/A instead. (i.e. instead of overloading the class attribute, they would use RDF/A’s “property” attribute) The Microformat method was a work-around for the fact that RDF/A (or something similar) didn’t exist as a standard. Well, now RDF/A does exist and there isn’t any reason to use the old microformat junk any more except in legacy applications.
Finally, the IPTC has been working on metadata standards for news for years. The proper forum for debating “news metadata encoding” proposals is within that forum. The fact that this group is acting outside the realm of their industry’s long standing standards efforts indicates that there is, in fact, little support within the industry for what they are doing.
bob wyman
Following on what Bob said:
This is yet another attempt by AP to ignore whatever else the news industry is doing or has doing for some time, and come up with their own idiosyncratic system. An attempt to reinvent the wheel the AP way.
Since the late 1990s the world’s major wire services (except AP), some leading newspapers, press release distribution companies, and many major manufacturers of news industry CMS system, all meeting under the behest of the International Press Telecommunications Council of Geneva, have been developing a successor to ANPA-1312 (now long-since known as IPTC-NAA 1312). Their goal is to develop a new coding format that will work with all computers, all devices, all typesetting equipment, all mobile phones, and in all languages’ character sets.
The committee developing this new standard includes representatives from AFP, ANSA, APA, Business Wire, Japan’s NSK, Kyodo, Marketwire, the UK Press Association, Pressetext Nachrichtentagentur, PR Newswire, Reuters, SDA/ATS, European Broadcasting Union, The Irish Times, UPI, the Wall Street Journal, and Ifra.
More than a dozen years ago, they focused their efforts on Dublin Core XML. In October, 2003, this IPTC-sponsored group released the NewsML and SportsML, and their companies began using these news code, albeit to various degrees. In November, 2008, versions 2 of NewsML and SportsML were released, plus a new subset, EventsML. Meanwhile, the ad industry worldwide release AdsML. All of these coding schemes mesh. http://www.iptc.org/cms/site/index.html
So, now the AP is formulating and releasing its own coding scheme. After all, no good idea is new or worthwhile until the AP gets around to claiming to have conceived developed it! (viz. small sat dishes, customized stock tables, etc.) Although AP’s new scheme is based on XML, it doesn’t appear to fit within NewsML, SportsML except for a few common ANPA 1312 headers that date from the teletype (TTS) days.
I had a conversation with some AP folks about this a few weeks ago at a newspaper technology conference. The conversation went as if I was talking to my hat. I’d tell them their microformat was not only redundant but incompatible with what everyone else had already developed. They’d respond as if I were a mute flapping my lips, and tell me, “Isn’t this great, what we’ve developed!”
This new format is either going to be one more nail in the AP’s coffin (which a great many American newspapers are now ready to varnish) or else yet another AP invention whose implementation will further isolate American newspapers and broadcasters from more important technological advances in media.
I have to chime in too. I’ve run into very few companies “in the wild” that support microformats. I have come across a few that know what they are and have chosen NOT to implement then, seeing no value (i.e.: customers) in them. Microformats were an idea that simply never caught on (“horribly ugly and old fashioned” may be key here).
And then AP has repeatedly been one of the worst “vendors” to work with from a technology perspective. They seem to go out of their way to put up barriers and “do things differently” just because they can.
Given this, I just can’t see this idea coming out good.
For info: the Media Standards Trust is headed by Dr Martin Moore (a visiting fellow at City University London). Martin and Tim Berners-Lee were awarded £350k from the Knight News Challenge.
Hi Jeff, I should introduce myself – I’m Martin Moore, director of the Media Standards Trust and at City (thank you Adrian for intro too :-). I really appreciate you blogging about this and encouraging as many people as possible to get involved in developing the standards.
We’ve been working on this thanks to a grant from the Knight Foundation (as Adrian says, we won a Knight News Challenge Award last year) and a grant from the MacArthur Foundation – and have always intended it to be a way of working towards consistent news mark-up for anyone producing journalism. Which, we think, is in the interests of the journalist, the news organisations, the blogger and the public. So the more who get involved the better.
The initiative is also being done in partnership with Sir Tim Berners-Lee’s Web Science Research Initiative (WSRI).
Thanks, Vin.
Ah, ANPA. Boy, that brings back memories.
hi Jeff,
i just want to say i love your book “what would google do” absolutely amazing. i am totally new to the web world. i started my blog anewsme.blogspot.com. i wonder if you post a reply and be the first?
Bloggers should consider imposing reciprocal restrictions on how AP uses their posts. Let’s see how good AP is at obeying its own ridiculous rules when quoting from blogs.
The whole thing reminds of of the Electronic Data Interchange (EDI) efforts of the early 90’s. Committees wth impressive lists of represented industry players, collaborating about how to categorize/datamodel specific parts of the world.
I suspect the result will be the same as EDI as well: while these groups are busy discussing & debating the latest tag (geo data? link data? traffic data?), standards will emerge from among the people that these standards are purporting to help (e.g., XML, CSS, Twitter #tags, etc.).
Jeff encourages us all to join in, but if history serves as any lesson, we will all solve the problem without joining…
There are already standards available. In fact, there are probably too many standards. Why would we need/want another one?
Of course AP wants to help lead this. Their content is becoming less relevant particularly WRT search engines.