Harvard Used GoPros and Machine Learning to Monitor Lecture Attendance

An interesting slide deck on something that I think most professors already knew and noticed. Students often skip lectures and attendance drops off near the end of the semester. This happened in both my undergrad and in law school. My general sense is that a lot of students felt they could simply learn through self-study and did not get much value out of the lectures. If you can pull it off it’s probably not a huge problem, but for students who struggle to get good grades I think that this kind of data will be critical in trying to figure out how to engage them and help them improve their performance.

Of course this data is not nearly as bad as the rather dismal completion rate for MOOCs.

Lon Seidman on Data and the Business of YouTube

My friend Lon Seidman explains how he used data to learn about his YouTube audience and improve his videos:

Page Speed and Network Neutrality

The other day, to much applause by the Internet community, Tom Wheeler announced that the FCC would issue regulations to ensure network neutrality. This Wall Street Journal article tells the story about how this victory was achieved. For startups and small businesses that are unable to negotiate special deals with ISPs this keeps the playing field level. It is a win for capitalism and the Internet.

Even something like the loading speed of webpages makes a signifcant impact on a business. That is why companies like Google, Amazon, and Etsy have conducted studies on the impact of page speed on their businesses. Lara Hogan at Etsy has a helpful overview of designing for performance that explains why they care. Key fact: after 3 seconds 40% of users will abandon your website. If you are not thinking about page speed you are automatically cutting out nearly half of your potential viewers or customers. That is why companies are rightfully worried that their websites could end up in a slow lane, and why people will pay money for services like Amazon Cloudfront.

Links on Big Data

Rather than write out a long post today I wanted to link to a bunch of interesting data related developments:

  1. IBM Watson released five new services to its developer cloud - Speech to text, text to speech, visual recognition, concept insights, and tradeoff analytics. There are some fun mini-demos on the website but if you want to take advantage of visual recognition today one of the cool things you can do is upload a photo set to your Google Drive and search for unlabled images using text descriptions.

  2. The White House released an interim big data and privacy report

  3. Lex Machina is using data and machine learning to help companies win lawsuits - This software uses data from litigation to help attorneys ascertain what the most effective litigation strategies are. I think software like this can give new attorneys a leg up against more experienced attorneys who instinctively build this knowledge base over time.

  4. Planet Money investigated Amazon Mechanical Turk - If you need a large volume of data entered inexpensively then Amazon mechanical turk is probably the best solution. However many of the workers are not making much money from it.

The Ethics of Artificial Intelligence

Today Fred Wilson asked his audience about the ethics of algorithms and pointed to a post on his firm’s website. I recently read a good article by Steven Levy on Google’s deep learning efforts and have been thinking about this issue quite a bit. I adopted a comment I wrote on the Union Square Ventures website into today’s blog post.

I think that we can divide the ethical issues into a four buckets:

  1. Correctable unforseen consequences
  2. Uncorrectable unforseen consequences
  3. Forseen correctable consequences
  4. Forseen uncorrectable consequences

Type 3 ethical issues are entirely on the people that make the algorithm. A common hypothetical that would be a type 3 issue is the autonomous car that has to decide whether it is going to crash into and kill a pedestrian or crash into some kind of building and kill the driver. We already know what we want the car to do, we’re just scared that it might be programmed to do something else. It is a no-win scenario and something bad will happen. All things being equal I think we just program the car to do what the driver would do, or let the automation cease control of the car at the moment and let the person decide.

For a type 3 issue you want to make sure you are collecting appropriate data and using it to correct the algorithm. For type 4 and 2 I think there is a fundamental analysis that is both economic and moral. If you replace a jury with an algorithm that we somehow know gets verdicts wrong 5% of the time, and we also know that jurys get verdicts wrong 12.5% of the time then we are really asking ourselves whether we are more accepting of flaws from people or from machines. The algorithm might mirror some of our own flaws. It could be racist, but people are also racist.

Type 1 problems can fortunately be fixed, but raise the question of whether the victims of the algorithm are deserving of some kind of remedy and if so does the culpability for introducing the problem sit with the designer of the algorithm or with the algorithm itself. In the case of unsupervised deep learning there is little human input into the design of the behavior of the algorithm other than the raw data it is fed. In supervised learning there is more control and the question can be raised about whether the algorithm designer thought about the issue or if it should have been forseen.

In tort law one of the fundamental things that is discussed is negligence so what we’re really going to look for is what is the negligence standard for the design of an algorithm. Certainly one that is designed to identify cats in YouTube videos is going to be much lower stakes than an algorithmic jury that can convict someone of a crime and impose the death penalty. There should certainly be a battery of testing involved.

I think an interesting regulatory model to follow is the food and drug administration. The ethical issues raised by algorithms are similar to ethical issues raised by testing new drugs. What are the side effects? Is the algorithm really doing what we need it to do? Is this ok to have this drug in the market if it cures cancer but ends up killing or debilitating a small subset of people? We do not regulate vitamin C supplements as heavily as painkillers because the stakes are completely different.

Follow Zagaja.com posts: RSS Feed
This work by Matt Zagaja is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.