Rather than write out a long post today I wanted to link to a bunch of interesting data related developments:
IBM Watson released five new services to its developer cloud - Speech to text, text to speech, visual recognition, concept insights, and tradeoff analytics. There are some fun mini-demos on the website but if you want to take advantage of visual recognition today one of the cool things you can do is upload a photo set to your Google Drive and search for unlabled images using text descriptions.
The White House released an interim big data and privacy report
Lex Machina is using data and machine learning to help companies win lawsuits - This software uses data from litigation to help attorneys ascertain what the most effective litigation strategies are. I think software like this can give new attorneys a leg up against more experienced attorneys who instinctively build this knowledge base over time.
Planet Money investigated Amazon Mechanical Turk - If you need a large volume of data entered inexpensively then Amazon mechanical turk is probably the best solution. However many of the workers are not making much money from it.
Today Fred Wilson asked his audience about the ethics of algorithms and pointed to a post on his firm’s website. I recently read a good article by Steven Levy on Google’s deep learning efforts and have been thinking about this issue quite a bit. I adopted a comment I wrote on the Union Square Ventures website into today’s blog post.
I think that we can divide the ethical issues into a four buckets:
- Correctable unforseen consequences
- Uncorrectable unforseen consequences
- Forseen correctable consequences
- Forseen uncorrectable consequences
Type 3 ethical issues are entirely on the people that make the algorithm. A common hypothetical that would be a type 3 issue is the autonomous car that has to decide whether it is going to crash into and kill a pedestrian or crash into some kind of building and kill the driver. We already know what we want the car to do, we’re just scared that it might be programmed to do something else. It is a no-win scenario and something bad will happen. All things being equal I think we just program the car to do what the driver would do, or let the automation cease control of the car at the moment and let the person decide.
For a type 3 issue you want to make sure you are collecting appropriate data and using it to correct the algorithm. For type 4 and 2 I think there is a fundamental analysis that is both economic and moral. If you replace a jury with an algorithm that we somehow know gets verdicts wrong 5% of the time, and we also know that jurys get verdicts wrong 12.5% of the time then we are really asking ourselves whether we are more accepting of flaws from people or from machines. The algorithm might mirror some of our own flaws. It could be racist, but people are also racist.
Type 1 problems can fortunately be fixed, but raise the question of whether the victims of the algorithm are deserving of some kind of remedy and if so does the culpability for introducing the problem sit with the designer of the algorithm or with the algorithm itself. In the case of unsupervised deep learning there is little human input into the design of the behavior of the algorithm other than the raw data it is fed. In supervised learning there is more control and the question can be raised about whether the algorithm designer thought about the issue or if it should have been forseen.
In tort law one of the fundamental things that is discussed is negligence so what we’re really going to look for is what is the negligence standard for the design of an algorithm. Certainly one that is designed to identify cats in YouTube videos is going to be much lower stakes than an algorithmic jury that can convict someone of a crime and impose the death penalty. There should certainly be a battery of testing involved.
I think an interesting regulatory model to follow is the food and drug administration. The ethical issues raised by algorithms are similar to ethical issues raised by testing new drugs. What are the side effects? Is the algorithm really doing what we need it to do? Is this ok to have this drug in the market if it cures cancer but ends up killing or debilitating a small subset of people? We do not regulate vitamin C supplements as heavily as painkillers because the stakes are completely different.
Yesterday I listened to this NPR Science Friday segment on the disagreement between scientists and the public on key issues. As I have been reading posts on Facebook and twitter about the measles outbreak and condemnation of people who do not vaccinate their children I asked myself, are these posts making a difference?
Sadly the answer might be no. Michael LaCour on the podcast discusses his study on changing minds on gay marriage and then studies vaccine education. Sadly the education campaigns the CDC put out that were studied did not lead more vaccination. Even more jarring are the two phone conversations in the podcasts between scientists and skeptics about vaccines and genetically modified foods. In both cases the skeptics decline to switch their positions despite the evidence and data presented to them. It turns out that knowing the truth is only half the battle, it has to be communicated effectively.
While LaCour seems to think that people are only persuaded by vulnerable people telling stories face-to-face, I think that there are other ways to communicate facts effectively and we just have to find them. In 2011 Barack Obama released his long-form birth certificate and after doing so the population of people who believed he was born outside the United States plummeted. Not only was the birth certificate definitive proof, but it also showed the skeptics to be frauds.
I started coding when I was in middle school. I taught myself through a combination of experimentation, searching the Internet, a book I received, and an intensive summer school class. Many years later I am still learning. I do not call myself a software developer or professional programmer, rather I view myself as a hacker. I am also not a professional teacher with a background and understanding of learning theory. However the failures and roadblocks you encounter over time are experience when they happen to you, but then become wisdom when shared with others and I think that we have a shortage of wisdom in this arena.
You probably noticed you are not making apps on your computer yet. I think the next good step is to build your foundation through Harvard’s CS50 course. I did not do this but I watched all the lecture videos last year and they are well done. You will learn the basics of computer science and at the end have the vocabulary to move on to whatever other adventures intrigue you. The advantage of CS50 over other courses or material is that there is a large volume of material and network of learners that you can draw upon to supplement your learning and fill-in the gaps. A frequent problem I encounter when learning programming is that tutorials will assume a piece of basic knowledge that I do not have or a revision in software will cause a change in behavior that I cannot explain. Much of the pain of learning involves trying to distill what I am missing. CS50 will spare you much of this pain.
Once you have learned the foundation then a problems, references, and examples model is what will guide you right in the future. Define a specific problem like creating a map of Connecticut election results and Google how to tackle it. Break it down into steps, you will be surprised how many little speed bumps you go over. Consult the reference for whatever programming language or framework you use relgiously. If you encounter a roadblock more likely than not someone on Stackoverflow has encountered it too. Once you build your own project, congratulations, you are a programmer!
A while back I recommended CoBook for contact management and syncing on the Mac. They were acquired by FullContact and the other day they finally released an updated iOS application.. This provides a gateway to easily sync your iCloud contacts over to your Google account. I also find the UI easier to use to update contacts. The only downside is it does not yet seem to have a mode where iCloud can be the canonical place where all your contacts live, and I am hopeful they fix this soon.