Righteous Data: The Best Data Posts of the Week

This week, it’s all about the practical applications of data technologies.

Big (Bad) Data by Andrew Ross Sorkin
Great article – just because one thinks they see a trend from a large amount of data, it doesn’t mean there’s automatically a correct, or even a compelling narrative. Sorkin writes: “A study by the Pew Research Center, for example, found that Twitter users are more often than not negative. The study, which examined reactions on Twitter to news events, including Barack Obama’s and Mitt Romney’s presidential race, discovered that “for both candidates, negative comments exceeded positive comments by a wide margin.” More disturbingly, that reaction is not representative: “The reaction on Twitter to major political events and policy decisions often differs a great deal from public opinion as measured by surveys,” Pew reported. That is due, in part, to the fact that “Twitter users are not representative of the public”: They are younger and more likely to lean toward the Democratic Party. It turns out that what’s “trending” on Twitter may not really be “trending” at all.”

What are some actual projects data scientists have worked on? – Quora
A few descriptions of interesting end-to-end projects from people who have worked at Uber, Quora, and Boku.

The incredible stock-picking ability of SEC employees by Jia Lynn Yang
Hmm… SEC employees trade differently than the general population? Yang writes: “Researchers found that out of the 56 enforcement actions against publicly traded companies during the time period analyzed, SEC employees traded ahead of six — and were far more likely to sell rather than buy. ‘This fact pattern indicates that the monitoring mechanisms the SEC planned to impose to discourage such practice are either weak or nonexistent,’ the researchers say.”
The actual paper is here.

It Takes Teams to Solve the Data Scientist Shortage – Jeanne G. Harris, Nathan Shetterley, Allan E. Alter and Krista Schnell
The idea of the lone Data Scientist is a myth – data challenges are solved by teams and the use of sometimes disparate technologies. In other words, as the authors write… “create a team of people who individually lack the full skillset of a data scientist, but as a group possesses them all. When physicists take on a big project, they bring together a team to design the equipment, run experiments and analyze the data. Likewise, it makes sense to divide the labor of a data scientist rather than search for one person who can do it all.”




Leave a Reply