"All that is old and already formed can continue to live only if it allows within itself the conditions of a new beginning."
Mr. Computer, Gimme Re-write
Artificial Intelligence Could Become Newsroom Aid
Ah, the end of the year. The holiday parties commence as news slows to a trickle.
But, d'oh! The blasted year-end retrospective. Time to scratch our heads again trying to remember what important things happened way back in February.
This dreaded task could soon become easier. Imagine if your newsroom could access artificially intelligent news summarization software, with the power to pull from your archives all past stories on a given topic. Imagine that it could then, with the push of a button, independently compose a short, cogent, imminently readable news summary.
Well, don't lay off your cranky human reporters just yet. It will likely be another three or four years before such a device is available. Researchers at Columbia University's Computer Science Department in New York City are now creating the tool, known as "MultiGen." A working demo is available at http://www.cs.columbia.edu/~hjing/sumDemo/multiGen/.
Columbia's Center for New Media has also signed onto the project, helping to evaluate and hone its journalistic functions. The project is supported by grants from the National Science Foundation.
MultiGen was devised by Computer Science Department Chair Kathy McKeown, and is the result of 15 years of work. McKeown says the project is based in natural language processing, a field within the study of artificial intelligence.
"You wouldn't say that it understands anything when it does its statistical processing," McKeown says, "but it does have some shallow understanding. For instance, it parses the sentences so that we know what the subject, the verb, and the object are. And it looks for repetition and similarity of phrases, so that it can determine that similar information occurs across a number of articles."
Like the Internet itself, the program was initially devised largely for military and governmental purposes, essentially to help defense analysts keep abreast of critical world events. The current working model reflects that singleness of purpose; MultiGen has only the ability to produce news summaries of one topic, terrorism. But John Pavlik, director of Columbia's new media center who is testing the program in the classroom, says its potential is much greater.
"In this age where we've gone from information scarcity to information abundance, I think that - either for journalists or for news consumers - having a tool to sort out (news) and come up with a summary might be extremely useful," Pavlik says.
The current online prototype has several summary scenarios, based on old stories pulled from Reuters. In one example, MultiGen scoops up eight news pieces about suspension of nuclear-weapons negotiations between North Korea and the United States after the death of Korean President Kim Il-sung. By clicking on a button labeled "Output Summary," this blurb is generated:
"According to Reuters on 07/09/94, North Korean President Kim Il-sung, died in Pyongyang. Kim Il-sung groomed his eldest son Kim Jong-il to succeed him. Little is known for sure about him.
"Earlier Saturday North Koreans postponed talks in Geneva on Pyongyang's controversial nuclear program because of Kim's death."
"What's amazing is that these are not just sentences lifted from existing stories," Pavlik says. "This has actually been written by MultiGen. It understands grammar and sentence construction. And it has a measure of the importance of something by how often it's mentioned, and where in the story it's mentioned."
Work is being done to give it much more functionality, to enable editors to analyze their staff's work, or the work of their competitors, among numerous other potential features.
"If you use a summary tool like this," Pavlik says, "you can have a tool that also finds differences in reports, for instance, 'What did Reuters say versus what the AP said, versus what the BBC said.' What are the things that they disagreed on? Who were the sources they used? Did they use the same sources? What are the diversity of sources?"
McKeown agrees that MultiGen has all those potential journalistic applications. It can already parse a great degree of similarity between news articles, she says. But its ability to detect differences is not well developed, and will take several years to perfect.
Nonetheless, when it is unleashed on the world of journalism, Pavlik indicates, it may become the prototypical killer app. And it could be made available in- house on a newsroom Intranet, while a public version could be provided to readers online, he says.
"There are a whole variety of things that we're planning to do in addition to what it already does," Pavlik adds. "This is just an early prototype."
Originally published by Editor & Publisher, Dec. 7, 1999
Kevin Featherly, a former managing editor at Washington Post Newsweek Interactive, is a Minnesota journalist who covers politics and technology. He has authored or contributed to five previous books, Guide to Building a Newsroom Web Site (1998), The Wired Journalist (1999), Elements of Language (2001), Pop Music and the Press (2002) and Encyclopedia of New Media (2003). His byline has appeared in Editor & Publisher, the San Francisco Chronicle, the St. Paul Pioneer Press, Online Journalism Review and Minnesota Law and Politics, among other publications. In 2000, he was a media coordinator for Web, White & Blue, the first online presidential debates.
Copyright 2004, by Kevin Featherly
