Thursday, February 23, 2012

Why multi-lingual generation of content from data is important

Multi-lingual generation of content from data has always been on Narrative Science's road map and has informed the modularization of the core platform. It is only after all of the analysis of the facts, evaluation of their importance, and the composition of the representation that the system generates language. Within this model, generating in Spanish, Japanese, German, etc. is no different than generating in English.

The system is not designed to translate, but to generate in multiple languages.

In general, we are not ready to do this, mostly because of the composition of our client base, but doing so is a matter of puling in native speakers who know how to write in non-English languages to configure the platform for the new language.

Occasionally we are asked why we even care, given the rise of translation services. Along with the theoretical answer, that translation requires hard core natural language understanding to really get things right, we also see wonderful examples in the real world.

My current favorite is from a translation into English of a story in Japanese about Narrative Science and Storify. I have no idea what the initial concept was in the original Japanese, but it was rendered into English as:

Than what was in the automatic translation of English to Japanese over Google, has become a meaningful sentence smoothly through many times.

This is strikingly poetic, but more important, a clear argument that opportunities for automatic generation of multi-lingual are still out there.

No comments:

Post a Comment