October 14, 2011
Five questions with Patrick Ball
contributor: Rob Goodier
Patrick Ball may be one of the few people who does not have a succinct, dinner-party ready answer to the question “what do you do?” As chief scientist and vice president at Benetech, a humanitarian technology firm, Ball has two jobs, both related: In one, he gathers data in war-torn regions to piece together pictures of alleged human rights violations. In his other job, he builds technology to empower human rights workers. With his tools, people can safely gather stories, collect data and tell others about the progress and the atrocities in their regions.
He has worked in this field for 20 years, assembling portraits of wrong doing in El Salvador, Ethiopia, Haiti, Chad, Sri Lanka, East Timor, Sierra Leone, South Africa, Kosovo, Liberia and Peru. We caught up with him by Skype after he rigged an internet connection through a cell phone from a room in the Democratic Republic of the Congo. He’s there doing work for the United Nations that he can’t say much about on the record. It was two weeks before his keynote address to IEEE’s Global Humanitarian Technology Conference in Seattle, Wash., on Nov. 1, and we had many more than just five questions for him. We distilled the essence of the interview into what follows. These are five (-ish) questions with Patrick Ball.
E4C: So, what are you doing in the DRC?
PB: I’m building a text-centric database.
It’s an interesting problem to think about what does the database mean to a human rights project. One important question is, What do we mean by useful information? A big part of the UN’s job is to know things. To know how to intervene to protect civilians. How do we verify things we hear out on the street and give it useful context? How do we determine what we should react to? There are [good and] bad ways to answer these questions.
E4C:What is one thing that people either don’t know, or don’t understand about human rights data analysis?
PB: Selection bias! Data are not reality. We cannot simply count things and assume that what we can see has any necessary statistical relationship with the true patterns. You can just put any sort of numbers you want into Excel and you have a bar graph. The problem is that when we take data that is not from a representative sample of the population and make an inference, we think that the data is telling us something about the world and it might not be. The only way we can nail down the notion of a sample being representative is to draw the sample randomly.
‘How many killings were there?’ is not a useful question to ask. People usually have a vested interest in hiding their massive acts of violence. We put a lot of effort into observing, but the act of observation creates more data. You don’t see something until you look for it.
Say, for example, that your data show there are more killings in April than in March. How do we know the number was going up? Maybe in April we were just listening more closely, maybe we had more people in the field, or people trusted us more, or we fixed our radios. We have no idea. Maybe it’s because there was more violence, that’s possible, but there’s no mathematical reason why hearing about more violence in April is related to actually having more violence.
Bias is a big deal. It’s a big deal because the exact factors that create observability are generally the same things we want to measure. So [if we ignore bias], we can get the story wrong. And the bottom line for human rights is we have to be right. We really have to be right.
I have two responses: Let’s think about what the human rights community does really well, and let’s build tools to help them do that better. One thing they do well is listen to the victims. Let’s build technology that helps [us listen] better. Let’s try to protect those voices with secure technology, amplify those voices, build tools to flow information from the collection point to the rest of the world. And let’s use machine learning and machine intelligence [like good search software] to help analysts figure out qualitative patterns, not quantitative patterns.
Say we hear that there’s lots of violence against young boys because they’re the next generation of soldiers. When a qualitative researcher writes a conclusion and says there’s a specific type of violence that targets young boys, and I think this because I have 43 stories that recount that violence, then we’re rooting a claim of a pattern of violence in how we know it. That’s the opposite of a quantitative claim.
E4C: What is one of the promising trends that you see in your field today?
PB: There are lots of cool trends in tech. In statistical analysis, the trends are much longer term. There’s a whole lot of cool tech around mobile platforms that are super. Mobile platforms are becoming the place where stuff happens. But on the other hand, they are really, really insecure. That’s a crucial question, How are we going to secure data on mobile platforms? [That’s something Benetech is working on].
I’m also excited about application of machine learning to massive amounts of qualitative information. To narrative. How can computers help us learn from giant piles of narrative without reducing it to meaningless numbers?
E4C: What has kept you awake at night or worried you about your work?
PB: Leaking sensitive data and being wrong have worried me forever. That worries me all the time. And I have been wrong.
E4C: For example?
In Haiti in 1995, I worked for the truth commission. We got thousands of testimonies from people all over Haiti, and then some colleagues collected a data set from the morgue in the University Hospital in Port au Prince. We looked at all of the violent deaths at the morgue and compared the pattern to the pattern in the testimonies reported to the commission. They were similar. We were like, Aha!, that means that either both patterns are reflecting the true patterns in reality, or they share a bias. At the time, I thought it was unlikely that they would share the same bias. They were different kinds of data: People who talked to us versus bodies that showed up in the morgue. I argued that the fact that they were so closely correlated was evidence that they were true. That was awfully optimistic.
Now I don’t think there were grounds for rejecting the shared-bias argument. What if, during one period, the strategy was to make examples out of killing people, so they toss the bodies into the street. That would generate lots of verbal reports and bodies in the morgue. What if they then changed their strategy and killed people quietly and dropped their bodies at sea. That would not generate verbal reports and no bodies would be in the morgue. I’ve thought a lot about that conclusion in the last 16 years and I’ve thought that, Wow, there’s just no way to know. That’s hard. That’s hard.
E4C: Five years from now, what improvements would you like to see in the technology that you and the people you work with use?
PB: A smartphone is a powerful computer, but how much more powerful could it be five years from now? How could we use that additional power to protect people rather than make it a powerful data collection device for the telco? The telco will always know where you are. And that’s a big surveillance problem. Oppressive governments used to have to follow dissidents around. Now they just have to ask the telco where they’ve been and who’ve they been talking to. How can we balance that kind of surveillance capacity against the other things that cell phones do for us?