When I was working at the Connecticut Democratic Party last year we used algorithms to help us decide what voters to reach out to. I wanted to learn more about these algorithms, how they worked, and how they were made. Fortunately I discovered John Foreman’s Data Smart. Over the past few months I have set aside time to work through the examples and finally finished it last night. If you are interested in diving into the world of data science with little background, I strongly recommend this as an introduction.
This book is superior to its peers and much of the online material I have encountered in several ways. The first is that the majority of the book is run in Microsoft Excel. This provides most readers with a familiar environment to try out their new tricks in. The spreadsheets are made available for download from the Data Smart website so that you can follow along and also manipulate them. If something does not make sense in the book you are able to tinker with the spreadsheet or read the formulas in Excel to directly intuit what Foreman is explaining. Furthermore the book explains its concepts and examples in clear language. You are not likely to get bogged down in jargon or confused by too many math symbols. However Foreman does not baby you either, the math is there and there are plenty of references and recommendations for other texts to explore if you want to learn more about a subject.
The book is also well structured. It is divided into a series of case studies and examples around specific concepts. This allows you to sit down and go through an exercise to learn about a concept and then pick-up at natural stop points. If you forget a concept or need to re-read it is easy to locate the relevant material. Foreman starts with easier concepts at the beginning so you can build comfort with the material before tackling more complicated concepts like regressions. He has you crawl before he makes you walk, and makes you walk before you run.
Foreman is a talented writer, and it was enjoyable to read about the techniques he uses in his job on a daily basis. I walked away with a sense of what he does and how he does it. Fortunately the fun does not end in the book; he has a blog. There are also video sessions if you want to dip your toes in before committing to the book.
In the cubicles of Silicon Valley companies like Google have departments dedicated to user experience, often abbreviated UX. Google believes that this field is so important that they developed the following video that explains what they do:
You can see the impact of UX on websites in the form of one click purchasing on Amazon or iTunes. The whispersync feature in the Kindle that lets you pick up reading a book where you last left off on a different device is a UX innovation. The surveys that you are asked to complete after a customer service call are part of the user expereince design process as organizations try to collect feedback to better serve the needs of their customers. You may be surprised to learn that even your government has started to think about UX. The State of Connecticut had this presentation on UX on its website. It is thus little surprise that we have seen success follow in the form of high enrollment numbers.
User experience design has a lot of potential outside enrolling people in health insurance. Sadly we have not yet seen evidence of this thinking being carried over to other government departments. For example people still wait in long queues to obtain services at the DMV or talk to a representative at the Department of Labor or DSS. Companies like Apple have managed to deploy systems that allow people to schedule appointments for phone calls or in-person meetings that minimize wait times. They have also developed ways to help users help themselves through online knowledge bases that provide relevant answers based on the text of an e-mail. The resources and expertise to make government more responsive to the people clearly exists, and in the case of providing social services it can help reduce poverty, and in the case of tax administration it can increase revenue and compliance. This is worth doing.
New York State recently redesigned its website to focus on user experience. The front page provides easy access to tasks that people might want to complete including renewing drivers licenses, enrolling in health insurance, and starting a business. This is an interesting model, and it is encouraging to see that the talent to complete this sort of project is next door. Even if the state is not able to pull together the will and resources to improve user experience, I hope that one of the many mayoral candidates running in local elections embraces this as a cause.
Lawrence H. Summers, the former Treasury secretary, recently said that he no longer believed that automation would always create new jobs. “This isn’t some hypothetical future possibility,” he said. “This is something that’s emerging before us right now.”
from As Robots Grow Smarter, American Workers Struggle to Keep Up.
I have seen many articles concerned that technology is going to swallow work. It is difficult to believe this is true. The first issue is that people over-estimate how quickly technology penetrates the market and is adopted. Some examples of technologies that we are still waiting to hit tipping points include electric vehicles and solar panels. The second issue is that we have a large collection of unsolved problems in areas like mental heatlh, medicine, and education for which solutions would provide huge economic benefit to society. If investors want to find their next source of growth for capital they will move into these areas and that will create jobs to replace those that are lost.
A common counterargument is that these new jobs require new skills and are more difficult to learn than previous jobs. Yet the trend appears to be in the opposite direction. Previously becoming a taxi driver required learning the cityscape, now the GPS will do the navigation for you and even factor in real-time traffic data. Publishing a newspaper or music album required the resources of large organizations, now you can do it yourself with equipment that works better than what many professionals were using years ago. Companies like PayPal, Square, and Stripe make it easy for people to accept payments for their work. YouTube and content creators like Khan Academy offer videos on how to do things from differential equations to car repair.
The economy may be changing, but these changes will benefit us all in the long run.
Today I watched this video by Tim Davies at the Berkman Center on open data in government and thought it was worth sharing. In the video Davies outlines three important parts of open government data: proactive sharing, machine readability, and legal ability to use and share. It seems that we often end up with open data that only hits two of these three pillars. Also while watching the video I developed another idea for an open dataset to hack on that I submitted to the CT Open Data portal, so I wanted to encourage you to vote for it.
The other day I finally took the time to explore Connecticut’s Open Data portal in depth. While I was aware of the portal before I did not extensively use it. Like many members of the public I looked at a few of the interesting example data sets and moved on. My goal was to figure out if there are some interesting data sets that I can pull down and use with some of the new algorithms that I have been learning. However no specific data set stuck out to me.
One of the best practices that I have seen recommended is that open data portals provide a log of FOI requests. The Hartford open data portal has a running log but that log was not interesting. It was mostly media organizations requesting e-mails or attorneys requesting documents relevant to their cases. So I went to the section of the website where I could suggest a dataset and was encouraged to see there have been several suggestions already and the site administrator has been responsive to them. So after making an account I posted my suggestion and the administrator responded within 24 hours. I do not know how long it will take for this to appear, but I am hopeful he is able to find some of this data and release it.
My first request is for you to vote for my suggestion on the website so they see there is support for it. The second request is to check out the portal and let me know if you find any interesting data sets that you think should be wrangled and let me know in the comments.