Wednesday, December 7, 2011

Why 90% of news will be computer generated in 15 years

At News Foo Camp, I was asked about how much news will be computer generated in 15 years.  My reluctant take on this was that it would be on the order of 90%.  My reluctance was the result of the fact that while this strikes me as inevitable, it always leads to a fair amount of angst among the people who hear it.  With that in mind, it seemed to me that it might be a good idea to explain what that number means and why I think it makes sense given current information and technology trends.

Data availability

First, given that we are talking about content that is generated from data, that is unambiguous machine-readable data rather than human readable text, it is clear that one of the key drivers will be the availability of the data itself.  There is no question that more and more data, in sports, finance, real estate, government, business, politics, etc. is coming online.  This trend is clear, unstoppable, and even a genuine social good if you believe in transparency.

Likewise, as more and more of the transactions and operations associated with business and commerce are happening online and being metered, we will actually be creating new types of data that describe the world and how it functions.

As the trend continues and accelerates, there will be tremendous opportunities to mine this data, gather insights from it and transform those insights into narratives that can help to inform the public.  Many of the tasks associated with data journalism as it stands today will be given over to the machine (under the control of editors and writers) and enable us to generate compelling narratives at scale that are driven by the all of this data that better describes our world.

But this trend is really only about data as data.  That is, data that is unambiguous and machine readable rather than textual information that is still only understandable by human readers.  The world of human-readable text is a different matter.  This leads me to the next trend.

Turning text into data

On a parallel path, language understanding and data extraction systems are improving to a point at which much of the information that is currently human readable yet impenetrable to computers will be itself transformed into data; data that can be used as the driver for the generation of new narratives.

This means that textual descriptions of events, government meetings, corporate announcements, plus the ongoing stream of social media will be transformed into not just machine readable, but machine understandable representations of what is happening in the world.  This data will then be integrated into the expanding data sources that are already available.
Stories currently driven by game stats, stock prices, employment figures etc. will be augmented and improved with information, now transformed into data, about off-field player behavior, business strategy, and city counsel meetings in ways that will allow human guided systems to automatically create richer and richer narratives that weave together the world of numbers and events.

But these two trends, combined with computer generation of content, only take us so far.  Given the current models of content creation and deployment, we still think in terms of content in the broad, which limits the scope of the content that we even consider creating.  That leads to a third element: scale and personalization.

Scale and the long tail

As journalism adjusts to the new world, it is clear that in many sectors, there is a need for content that is more narrow in focus and aimed at smaller audiences.  This more targeted content, while not valuable to a broad audience, is tremendously valuable to the smaller, niche audiences.  Stories about local sports and businesses, neighborhood crime, and city counsel meetings may only be of interest to a small set of people, but to them, that content is tremendously informative and useful.

The problem, of course, is that these audiences are often too small to warrant the kind of coverage they deserve.  The economics of covering Little League games makes it infeasible for organizations to provide the staff and publication resources to produce them.  Logistically and financially, it is simply impossible for organization to produce hundreds of thousands of stories, each of which will likely be read by fewer than fifty people.

As the data becomes available and the computer develops a better understanding of events, however, there is an opportunity to create content like this at tremendous scale.  This is an opportunity that only makes sense and, in fact is only possible, through the computer generation of stories.  A computer can write highly localized crime reports, personalized stock portfolio reporting, high school and youth sports stories at scale to provide coverage that was previously impossible and could never be possible in a world of purely human generated content.

Man and Machine

These three trends (and there are others) come together to provide an opportunity for the use of computers to automatically create content that serves communities that are currently ignored by the current world of journalism and story production.  By creating content that integrates existing and derived data and providing stories that are not just local but actually personal, these systems will generate directly into the long tail of need and interest.

As more and more data becomes available and individuals are given news that is designed to be personally relevant and informative, systems will end up generating far more than is produced today with a volume that dwarfs present production.  Because much of this production will be for the individual, it will never overwhelm, but will instead provide a new sort of news experience in which the events of the day, the events of the world, will be provided in a personal context that makes it more meaningful and relevant.  That 90% of news will be computer generated fifteen years from now seems not just reasonable, but inevitable.

12 comments:

  1. I'm not sure that a "personal context" actually makes news more meaningful and relevant. I suspect it merely reduces the choices people have to make so they feel on top of things.

    I don't think computers yet have the ability to develop significance from data that hasn't been recorded. That's where human reporters are essential.

    A computer might be able to generate a news story about action taken by the local school board based on examination of the posted minutes. I'm not sure the computer would notice that the reason the board gave for meeting in executive session is not a legally valid reason for holding an executive session.

    To a computer, it's probably just as important that the board added Joe Blow to the list of substitute teachers as it is that the board consistently violates the state's open meetings law. Taxpayers might have a different opinion.

    ReplyDelete
  2. Kris,

    Thanks for posting your thoughts on this! I agree with a lot of what you have to say.

    When you say that 90% will be computer-generated, I interpret that as implying a much larger set of articles which are not currently created vs. replacing those created by real reporters.

    With that said, there would also be a fair amount of overlap, and I'm curious as to your view on how this changes the role of a reporter. I would think it has to transition into a role focused more on analysis and commentary as reporting on pure objective fact becomes more of a commodity

    ReplyDelete
  3. Hi Kris - Thanks for the article. It certainly is inevitable but it's great to put a deadline on it as you have! The company I work for has already being working in this space for a while. We launched Trendsmap.com a couple of years ago which analyses and displays real-time local Twitter trends. We just yesterday launched thewall.com and thewall.co.uk which take this a step further by clustering those trends into news topics. We categorize and tag each topic and in the associated tweets, media and links.

    We're not creating much content, but we're trying to make useful topics based on information that is already out there. I can see how we could create reports though using some of the data that we're collecting and analysing. The Wall does require some editorial work but we're gradually automating as many parts as we can. We're not quite at 90% but I can see how it's achievable.

    Cheers, Rob

    ReplyDelete
  4. And your prediction for how much of this 'news' will be read by humans?
    If computers are smart enough to write for us, they will also be smart enough to read and summarize the streams, so that we will only need to read news in one place, to satisfy our individual needs (personal and professional).

    ReplyDelete
  5. Hi Keith, this is an ambitious vision and take. Thank you for sharing it. Is there any way to see some examples of "automatic narratives" generated from data?

    Thank you.

    ReplyDelete
  6. 90% of the articles, but what percent of the views. I can see the computers generate a ton of articles, but that may not translate to views.

    ReplyDelete
  7. I'm with you for this part:
    "As the trend continues and accelerates, there will be tremendous opportunities to mine this data, gather insights from it" sure, computers will be great for that but: "and transform those insights into narratives" I haven't seen any machine doing something remotely like that.

    ReplyDelete
  8. By the time all the relevant data is inputted, the story a computer might derive from it is no longer news.

    ReplyDelete
  9. Etech institution is an <a href='www.etech.com>online earning</a> training center. your web site is very important to us.

    ReplyDelete
  10. These articles and blogs are genuinely sufficiency for me for a day.
    best-tablet-reviews.weebly.com

    ReplyDelete