Wednesday, December 7, 2011

Why 90% of news will be computer generated in 15 years

At News Foo Camp, I was asked about how much news will be computer generated in 15 years.  My reluctant take on this was that it would be on the order of 90%.  My reluctance was the result of the fact that while this strikes me as inevitable, it always leads to a fair amount of angst among the people who hear it.  With that in mind, it seemed to me that it might be a good idea to explain what that number means and why I think it makes sense given current information and technology trends.

Data availability

First, given that we are talking about content that is generated from data, that is unambiguous machine-readable data rather than human readable text, it is clear that one of the key drivers will be the availability of the data itself.  There is no question that more and more data, in sports, finance, real estate, government, business, politics, etc. is coming online.  This trend is clear, unstoppable, and even a genuine social good if you believe in transparency.

Likewise, as more and more of the transactions and operations associated with business and commerce are happening online and being metered, we will actually be creating new types of data that describe the world and how it functions.

As the trend continues and accelerates, there will be tremendous opportunities to mine this data, gather insights from it and transform those insights into narratives that can help to inform the public.  Many of the tasks associated with data journalism as it stands today will be given over to the machine (under the control of editors and writers) and enable us to generate compelling narratives at scale that are driven by the all of this data that better describes our world.

But this trend is really only about data as data.  That is, data that is unambiguous and machine readable rather than textual information that is still only understandable by human readers.  The world of human-readable text is a different matter.  This leads me to the next trend.

Turning text into data

On a parallel path, language understanding and data extraction systems are improving to a point at which much of the information that is currently human readable yet impenetrable to computers will be itself transformed into data; data that can be used as the driver for the generation of new narratives.

This means that textual descriptions of events, government meetings, corporate announcements, plus the ongoing stream of social media will be transformed into not just machine readable, but machine understandable representations of what is happening in the world.  This data will then be integrated into the expanding data sources that are already available.
Stories currently driven by game stats, stock prices, employment figures etc. will be augmented and improved with information, now transformed into data, about off-field player behavior, business strategy, and city counsel meetings in ways that will allow human guided systems to automatically create richer and richer narratives that weave together the world of numbers and events.

But these two trends, combined with computer generation of content, only take us so far.  Given the current models of content creation and deployment, we still think in terms of content in the broad, which limits the scope of the content that we even consider creating.  That leads to a third element: scale and personalization.

Scale and the long tail

As journalism adjusts to the new world, it is clear that in many sectors, there is a need for content that is more narrow in focus and aimed at smaller audiences.  This more targeted content, while not valuable to a broad audience, is tremendously valuable to the smaller, niche audiences.  Stories about local sports and businesses, neighborhood crime, and city counsel meetings may only be of interest to a small set of people, but to them, that content is tremendously informative and useful.

The problem, of course, is that these audiences are often too small to warrant the kind of coverage they deserve.  The economics of covering Little League games makes it infeasible for organizations to provide the staff and publication resources to produce them.  Logistically and financially, it is simply impossible for organization to produce hundreds of thousands of stories, each of which will likely be read by fewer than fifty people.

As the data becomes available and the computer develops a better understanding of events, however, there is an opportunity to create content like this at tremendous scale.  This is an opportunity that only makes sense and, in fact is only possible, through the computer generation of stories.  A computer can write highly localized crime reports, personalized stock portfolio reporting, high school and youth sports stories at scale to provide coverage that was previously impossible and could never be possible in a world of purely human generated content.

Man and Machine

These three trends (and there are others) come together to provide an opportunity for the use of computers to automatically create content that serves communities that are currently ignored by the current world of journalism and story production.  By creating content that integrates existing and derived data and providing stories that are not just local but actually personal, these systems will generate directly into the long tail of need and interest.

As more and more data becomes available and individuals are given news that is designed to be personally relevant and informative, systems will end up generating far more than is produced today with a volume that dwarfs present production.  Because much of this production will be for the individual, it will never overwhelm, but will instead provide a new sort of news experience in which the events of the day, the events of the world, will be provided in a personal context that makes it more meaningful and relevant.  That 90% of news will be computer generated fifteen years from now seems not just reasonable, but inevitable.


  1. I'm not sure that a "personal context" actually makes news more meaningful and relevant. I suspect it merely reduces the choices people have to make so they feel on top of things.

    I don't think computers yet have the ability to develop significance from data that hasn't been recorded. That's where human reporters are essential.

    A computer might be able to generate a news story about action taken by the local school board based on examination of the posted minutes. I'm not sure the computer would notice that the reason the board gave for meeting in executive session is not a legally valid reason for holding an executive session.

    To a computer, it's probably just as important that the board added Joe Blow to the list of substitute teachers as it is that the board consistently violates the state's open meetings law. Taxpayers might have a different opinion.

  2. Kris,

    Thanks for posting your thoughts on this! I agree with a lot of what you have to say.

    When you say that 90% will be computer-generated, I interpret that as implying a much larger set of articles which are not currently created vs. replacing those created by real reporters.

    With that said, there would also be a fair amount of overlap, and I'm curious as to your view on how this changes the role of a reporter. I would think it has to transition into a role focused more on analysis and commentary as reporting on pure objective fact becomes more of a commodity

  3. Hi Kris - Thanks for the article. It certainly is inevitable but it's great to put a deadline on it as you have! The company I work for has already being working in this space for a while. We launched a couple of years ago which analyses and displays real-time local Twitter trends. We just yesterday launched and which take this a step further by clustering those trends into news topics. We categorize and tag each topic and in the associated tweets, media and links.

    We're not creating much content, but we're trying to make useful topics based on information that is already out there. I can see how we could create reports though using some of the data that we're collecting and analysing. The Wall does require some editorial work but we're gradually automating as many parts as we can. We're not quite at 90% but I can see how it's achievable.

    Cheers, Rob

  4. And your prediction for how much of this 'news' will be read by humans?
    If computers are smart enough to write for us, they will also be smart enough to read and summarize the streams, so that we will only need to read news in one place, to satisfy our individual needs (personal and professional).

  5. Hi Keith, this is an ambitious vision and take. Thank you for sharing it. Is there any way to see some examples of "automatic narratives" generated from data?

    Thank you.

  6. 90% of the articles, but what percent of the views. I can see the computers generate a ton of articles, but that may not translate to views.

  7. I'm with you for this part:
    "As the trend continues and accelerates, there will be tremendous opportunities to mine this data, gather insights from it" sure, computers will be great for that but: "and transform those insights into narratives" I haven't seen any machine doing something remotely like that.

  8. By the time all the relevant data is inputted, the story a computer might derive from it is no longer news.

  9. Etech institution is an <a href='>online earning</a> training center. your web site is very important to us.

  10. These articles and blogs are genuinely sufficiency for me for a day.

  11. This comment has been removed by the author.

  12. This comment has been removed by the author.

  13. Hey, great blog, but I don’t understand how to add your site in my rss reader. Can you Help me please?
    pos system

    Work from home theory is fast gaining popularity because of the freedom and flexibility that comes with it. Since one is not bound by fixed working hours, they can schedule their work at the time when they feel most productive and convenient to them. Women & Men benefit a lot from this concept of work since they can balance their home and work perfectly. People mostly find that in this situation, their productivity is higher and stress levels lower. Those who like isolation and a tranquil work environment also tend to prefer this way of working. Today, with the kind of communication networks available, millions of people worldwide are considering this option.

    Women & Men who want to be independent but cannot afford to leave their responsibilities at home aside will benefit a lot from this concept of work. It makes it easier to maintain a healthy balance between home and work. The family doesn't get neglected and you can get your work done too. You can thus effectively juggle home responsibilities with your career. Working from home is definitely a viable option but it also needs a lot of hard work and discipline. You have to make a time schedule for yourself and stick to it. There will be a time frame of course for any job you take up and you have to fulfill that project within that time frame.

    There are many things that can be done working from home. A few of them is listed below that will give you a general idea about the benefits of this concept.

    This is the most common and highly preferred job that Women & Men like doing. Since in today's competitive world both the parents have to work they need a secure place to leave behind their children who will take care of them and parents can also relax without being worried all the time. In this job you don't require any degree or qualifications. You only have to know how to take care of children. Parents are happy to pay handsome salary and you can also earn a lot without putting too much of an effort.

    For those who have a garden or an open space at your disposal and are also interested in gardening can go for this method of earning money. If given proper time and efforts nursery business can flourish very well and you will earn handsomely. But just as all jobs establishing it will be a bit difficult but the end results are outstanding.

    Freelance can be in different wings. Either you can be a freelance reporter or a freelance photographer. You can also do designing or be in the advertising field doing project on your own. Being independent and working independently will depend on your field of work and the availability of its worth in the market. If you like doing jewellery designing you can do that at home totally independently. You can also work on freelancing as a marketing executive working from home. Wanna know more, email us on and we will send you information on how you can actually work as a marketing freelancer.

    Internet related work
    This is a very vast field and here sky is the limit. All you need is a computer and Internet facility. Whatever field you are into work at home is perfect match in the software field. You can match your time according to your convenience and complete whatever projects you get. To learn more about how to work from home, contact us today on workfromhome.otr214423@gmail.comand our team will get you started on some excellent work from home projects.

    Diet food
    Since now a days Women & Men are more conscious of the food that they eat hence they prefer to have homemade low cal food and if you can start supplying low cal food to various offices then it will be a very good source of income and not too much of efforts. You can hire a few ladies who will help you out and this can be a good business.

    Thus think over this concept and go ahead.