This week served as a bit of a mid-program check-in between my bosses and I. I got the chance to demo the Python script that I had worked on. And, while there are a couple of bugs due to the HTML structure itself being amended, it works exactly like it should. I generated batch files as well, so everyone else can just click on a Desktop shortcut and run the script. It made me feel really good to have them compliment my work. Web scraping is something I have always wanted to learn how to do, but in my mind is a relatively simple thing to do. So the fact that it had not been done yet, and I got to learn something new while also building something that will benefit the department made me feel really accomplished.
In addition to talking about work tasks, we talked about the internship program as a whole. What did I like? What did I not like? Am I doing everything that I had hoped to do? If we are being totally honest, I love it here. I am learning so much, doing so much for this company, and am so grateful to have the freedom to continue researching and learning digital methods that are important to me. I will admit, my internship is very different from what the other interns around CNN have. They all work with new media and generating content. I like to think that my job is to play with the new content they create. Scraping articles, perhaps doing some topic modeling on politics or sports or something. I get to read and learn so much.
Since the Python scripts works for everyone, and I successfully mapped all of the Google terms with our existing terms, my tasks are switching gears a little bit. I now am dealing with some sentiment analysis work as well as facilitating the creation of new tags for our term list. Specifically, I am identifying new people to add to our directory.
With the sentiment analysis, I am working with intensity scoring. In other words, the amount of weight given to a single word or limited string of words based on how those words make a reader feel. For example, terms like “controversial” or “powerful” are given a stronger weight, while terms like “unorganized” or “activist” are given a lesser weight. It is difficult to explain how sentiment analysis works, mostly because different people have different reactions to different stories. We debate quite a bit about what a word means and the feelings we get from how it is used. It seems to be a never ending battle. But that is half the fun!
For identifying what people we need to add to our terms list, I should preface this by saying I read probably around thirty to fifty stories per day. These stories cover any and all subject matter we publish here at CNN. That said, I read a lot of names. Many of them are already in our list. The names that I add now are names that appear frequently across multiple stories, and multiple times in stories. For example, I added Cori “Coco” Gauff to our terms list, as she became a very popular name very quickly due to her performance at Wimbledon.
I have said this before, and I will say it again: it is so cool being a historian and working with real-time, relevant information. More often than not, the material I work with is around 200 years old. To be using digital methods to play with information as it comes out is really cool premise for me.