I’m a data scientist who builds programs that can read huge numbers of documents within minutes. These programs can do more than just read documents - they can also detect the underlying topics in these documents, and assign them to categories according to the content in each document.
This type of program can be helpful to any professional who needs to read massive amounts of text, allowing them to get a sense of what the text or group of texts is about without having to actually read them.
Since I worked as a journalist for many years, I was curious if I could build a program like this that would be useful for understanding the news.
That led me to design the program that powers the content on this blog. My "robot" carries out searches of news-related keywords on Twitter, reading through thousands of Tweets within minutes. The robot quickly figures out the main topics expressed in the tens of thousands or even hundreds of thousands of Tweets.
That's where the robot's work ends, and where the human work takes over.
I put my journalist's cap on, and analyze those lists of topics, which provide insights about the group of Tweets as an aggregate - insights which I could not possibly have reached without the robot having "read" them for me.
The blog: Robot Reporting
Each post follows the following format:
NEWS KEYWORD SEARCH: I tell you what term or terms I searched for.
MY EXPECTATION: What I expect the Tweets to be about, based on my knowledge as a journalist.
ROBOT'S RESULTS: I tell you what main topics the Tweets contain, according to the robot's analysis.
WHAT I LEARNED FROM THE ROBOT: Here I highlight the most important or enlightening elements of the robot's results.
TECHNICAL NOTES: Details on the statistical and machine learning methods and tools used.
I invite you to read my results, and I'd love to hear what you think. Please feel free to e-mail me your reactions and suggestions.
Enjoy!
Mary MacCarthy maccarthy.mary@gmail.com robotreporting.info