Wednesday, December 7, 2011

Why 90% of news will be computer generated in 15 years

At News Foo Camp, I was asked about how much news will be computer generated in 15 years.  My reluctant take on this was that it would be on the order of 90%.  My reluctance was the result of the fact that while this strikes me as inevitable, it always leads to a fair amount of angst among the people who hear it.  With that in mind, it seemed to me that it might be a good idea to explain what that number means and why I think it makes sense given current information and technology trends.

Data availability

First, given that we are talking about content that is generated from data, that is unambiguous machine-readable data rather than human readable text, it is clear that one of the key drivers will be the availability of the data itself.  There is no question that more and more data, in sports, finance, real estate, government, business, politics, etc. is coming online.  This trend is clear, unstoppable, and even a genuine social good if you believe in transparency.

Likewise, as more and more of the transactions and operations associated with business and commerce are happening online and being metered, we will actually be creating new types of data that describe the world and how it functions.

As the trend continues and accelerates, there will be tremendous opportunities to mine this data, gather insights from it and transform those insights into narratives that can help to inform the public.  Many of the tasks associated with data journalism as it stands today will be given over to the machine (under the control of editors and writers) and enable us to generate compelling narratives at scale that are driven by the all of this data that better describes our world.

But this trend is really only about data as data.  That is, data that is unambiguous and machine readable rather than textual information that is still only understandable by human readers.  The world of human-readable text is a different matter.  This leads me to the next trend.

Turning text into data

On a parallel path, language understanding and data extraction systems are improving to a point at which much of the information that is currently human readable yet impenetrable to computers will be itself transformed into data; data that can be used as the driver for the generation of new narratives.

This means that textual descriptions of events, government meetings, corporate announcements, plus the ongoing stream of social media will be transformed into not just machine readable, but machine understandable representations of what is happening in the world.  This data will then be integrated into the expanding data sources that are already available.
Stories currently driven by game stats, stock prices, employment figures etc. will be augmented and improved with information, now transformed into data, about off-field player behavior, business strategy, and city counsel meetings in ways that will allow human guided systems to automatically create richer and richer narratives that weave together the world of numbers and events.

But these two trends, combined with computer generation of content, only take us so far.  Given the current models of content creation and deployment, we still think in terms of content in the broad, which limits the scope of the content that we even consider creating.  That leads to a third element: scale and personalization.

Scale and the long tail

As journalism adjusts to the new world, it is clear that in many sectors, there is a need for content that is more narrow in focus and aimed at smaller audiences.  This more targeted content, while not valuable to a broad audience, is tremendously valuable to the smaller, niche audiences.  Stories about local sports and businesses, neighborhood crime, and city counsel meetings may only be of interest to a small set of people, but to them, that content is tremendously informative and useful.

The problem, of course, is that these audiences are often too small to warrant the kind of coverage they deserve.  The economics of covering Little League games makes it infeasible for organizations to provide the staff and publication resources to produce them.  Logistically and financially, it is simply impossible for organization to produce hundreds of thousands of stories, each of which will likely be read by fewer than fifty people.

As the data becomes available and the computer develops a better understanding of events, however, there is an opportunity to create content like this at tremendous scale.  This is an opportunity that only makes sense and, in fact is only possible, through the computer generation of stories.  A computer can write highly localized crime reports, personalized stock portfolio reporting, high school and youth sports stories at scale to provide coverage that was previously impossible and could never be possible in a world of purely human generated content.

Man and Machine

These three trends (and there are others) come together to provide an opportunity for the use of computers to automatically create content that serves communities that are currently ignored by the current world of journalism and story production.  By creating content that integrates existing and derived data and providing stories that are not just local but actually personal, these systems will generate directly into the long tail of need and interest.

As more and more data becomes available and individuals are given news that is designed to be personally relevant and informative, systems will end up generating far more than is produced today with a volume that dwarfs present production.  Because much of this production will be for the individual, it will never overwhelm, but will instead provide a new sort of news experience in which the events of the day, the events of the world, will be provided in a personal context that makes it more meaningful and relevant.  That 90% of news will be computer generated fifteen years from now seems not just reasonable, but inevitable.

Tuesday, December 6, 2011

The end of “destination" sites

I have often said that the clock is ticking on news and information sites as destinations which, to my surprise, seems to both upset and offend. In part, this seems to be the result of a confusion between site and brand, but I think is more a resistance to what is an existing and overwhelming trend towards social as the way in which people get their information flow.

There are essentially three ways to get to information (and news) in the current world: browsing from a landing page, getting to articles provided by organic search, reading docs shared by others in your different social circles.

Pre-Google, browsing ruled. You knew your starting point, you went to trusted sources, and browsed your way to the content that was interesting to you. The coin was the site, how much you trusted it and whether it had content that was relevant to you and your interests.

In this world, the site, its reputation (including both brand and voice), and its usability were all that mattered. Monetizing content was a matter of monetizing the site. But with the rise of Google (and search in general) this world quickly slipped away.

With Google, the entry point is content relevance. Users search for the content that they care about enter sites through individual content elements rather than landing pages and, because they have a topic focus, ingest the content and then bounce back to the search page.

This dynamic created a new tension for sites. Can you grab the reader and provide follow-up content that is clearly marked to keep them on your site, given that they got to you based on their interest in a specific content area rather than a belief or faith in your site and its ability to serve them.

Monetization shifted to focus on advertising related to content and interests rather than the site itself. It became more driven by the user rather than the site.

In this world, the role of the site is diminished though the brand is still crucial. Given the nature of Google and Page Rank, the authority of the site still participates in the ranking of results, but this is a proxy for brand though brand participates in the algorithm itself. Likewise, in the presentation of results, the brand also participates in the user’s decision as whether or not to follow a link, but often less so than the title.

This just to say that in the Google world, the stress has shifted way from the site/brand towards the relevance of the content to the user. The brand is still crucial, but the site and its navigational structure far less so.

With the rise of social and sharing, we see another shift away from the site and brand and the emergence of trust as it relates to your social network. You don’t need to trust a site or brand when you know that a trusted “friend” has just recommended a content element. That piece of content might not fit your current needs but will be likely to fit your general interests given the social circle that is promoting it.

With social, the dynamic is pushed even further away from the site and even away from relevance in the specific. You are ingesting simply because you are getting content recommended by your circle and not because of the site, brand or even relevance to task at hand. You trust your circle not the brand. You read what is recommended or curated by your circle, not the because it has been created or vetted by the site.

Individual content elements are optimized for search and sharing. There is still support for internal navigation, but clearly the emphasis is skewing more towards search and social.

But we are at the early days of this. As more and more consumption skews towards social recommendation, there is less need for the site as site. Even a year ago the figures showed that the bounce on news sites is strikingly high and the trend is not going away. This does not mean that the structure of news production has to change, or that the notion of brand goes away, just that deployment and publication will shift towards a greater optimization for search and social as well as the now emerging approaches to locational and situational information systems. The site will simply no longer be the major vehicle for publication.

The brands that will win will be those that embrace rather than fight this shift. Brands that are focused their sites are going to lose. Brands that decide that their content deployment will be focused on developing the social circles that reflect their brands and their content will flourish as more and more of the traffic is driven by trusted recommenders.

This is not to say that sites will vanish… just that they will be the least important component of what determines if a brand has impact and survives.