“Is staring into my phone 24 hours a day going to make me blind and stupid?”
Ask that question and you’ll get lots of opinions back. But we need data if we’re going to get a definitive answer to that and many other questions about the impact of technology, says the Markup co-founder Julia Angwin.
“There isn’t data on some of these really important questions,” Angwin said on the latest episode of Recode Media. “We would like to collect data, sort of at scale, not just a few anecdotal interviews, but as best we can bigger data sets about these questions.”
That makes the not-yet-launched Markup’s form of “data journalism” different from outlets like FiveThirtyEight, which analyze existing data sets and make predictions about topics ranging from political elections to Major League Baseball. “That’s totally legitimate and awesome work,” Angwin said, but once the new site launches in early 2019 she and her team will be proactively building new data sets — something she and her co-founder Jeff Larson previously did at ProPublica — in the interest of investigative journalism.
“We will file public records requests,” she said. “We will use automated data collection across the internet. We will do crowdsourcing. One thing we did a lot at ProPublica was build tools that people could use to donate their data to us in very specific ways. So with the Facebook political ads we built a browser extension. People could add it to their browser and then when they were on Facebook it would identify which ads were political and send it to us.”
“We would build the tools so they didn’t have to actually do anything,” Angwin added. “The only thing they have to do is install the tool. So I imagine we’ll do a bunch of that.”
Below, we’ve shared a full transcript of Peter’s conversation with Julia.
Peter Kafka: This is Julia Angwin who I’ve known for ... we’re going to date ourselves ... at least a decade.
Yeah, let’s just go with a decade.
You used to work at the Wall Street Journal. I was part of AllThingsD, which was Journal-adjacent.
So we kind of competed. You were great. You got nominated for Pulitzers. I did not. You went to ProPublica. You were kicking ass there, covering Facebook and all sorts of abuses by tech giants and others and now you have founded a new company called?
Which is a publication-to-be.
Yes, it will begin publishing news in early 2019.
You announced this week that there’s a couple different ways to fund a new publication these days. You can go ask people for money, actual consumers for money. That’s, a lot of people who come on this show talk about their paywall or subscription strategy. You can find a very wealthy person to fund your company. You guys went with Strategy B for now.
Yes, we did. To be clear, we went with a non-profit, philanthropic model.
So, it’s $23 million in seed money?
That’s good seed. Twenty of that’s from Craig Newmark, who most of us know as Craig from Craigslist, and then some assorted good people from foundations.
OK. We set the table. This is very exciting. People are very excited about what you are doing. Do you want to describe just briefly what it is and then we can poke at it a little bit?
Yeah, absolutely. So, The Markup is a non-profit newsroom, as I said. We’re going to be investigating the impacts of technology on society. By technology-
Specifically looking at tech.
Yeah. Technology, we kind of look at two different ways. So one is the companies that you think of as tech. The big tech platforms. But also, technology is being used in all sorts of ways, right? The algorithm that decides who gets flagged for further screening at the airport, right? There’s technology that’s being used in all sorts of ways in our lives that’s-
This is kind of the Recode pitch, right? Tech is everything now.
And everything is tech.
I think Mark Andreessen’s “software is eating the world.” He may have started it before you guys did.
He’s probably better at it than we are too. So it’s tech-infused, but a specific focus. It seems like, or at least that’s what you were reporting at ProPublica, focused on the Facebooks of the world. Or are you going to spread it out?
I’d say equally focused on Facebooks and some of the work we did there, for instance, was about the software used for criminal sentencing or the algorithms used in insurance for redlining. So we kind of divided our time between the big tech giants and the use of tech in other parts of life.
And your premise is, and correctly — your premise is correct, up until now there really hasn’t been a lot of good consistently, skeptical, data-focused reporting on technology.
I think I would phrase it just slightly different, which is I think there are a lot of really good reporters doing skeptical, smart reporting. But what we have lacked as an industry, journalism just crippled financially, and we have lacked the resources for really intensive data investigations, which are expensive. So that’s how we’re going to approach the topic, with a staff that is half programmers, half journalist. So that’s very expensive, and the kinds of investigations we’re going to do are maybe take a long time.
So explain what data-focused journalism is and how it differs from the FiveThirtyEight Nate Silver “data journalism.”
I’m struggling because I actually would like to come up with a new word to differentiate myself from what people think of as data journalism but I haven’t come up with that word yet. But I will let you know when I do.
Essentially data journalism has long meant looking at existing data sets, right? So if you look at FiveThirtyEight, they’re really good at statistical, meta-analyses of existing data. So that’s why they do polls, you know, baseball, the Fed, these are all data set-
Sets that exists. Someone else has collected the data.
They’re doing the smart work of analyzing it.
Yes. Right. Correct. And that’s totally legitimate and awesome work. What we want to do is collect our own data. And the reason is that many of the questions that are so important for society to answer, like, “Is staring into my phone twenty-four hours a day going to make me blind and stupid?” There’s not an existing data set.
You don’t really need to study that.
I feel like we do need some data on that. Right?
So my point is there isn’t data on some of these really important questions. We would like to collect data, sort of at scale, not just a few anecdotal interviews, but as best we can bigger data sets about these questions.
There is a strain of investigative journalism that has been data-focused for a long time.
I’m very old, but I do remember for years someone on a newspaper I would work with who would go to the ... I don’t know the conference was called, but they’d come back saying, “I want to propose a data-focused” ... you know a lot of it would be going through phone books or whatever, but the idea of taking big clumps of data and using it to tell a story.
Absolutely. All I want to do is scale up that type of work and focus it towards tech and society.
This is work you’ve been doing.
Really well, at ProPublica.
The stuff I’m most familiar with is the stuff you’ve gone after Facebook about. Gone after is the slightly the wrong verbiage.
But good. You held them to account multiple times. Tell us your greatest hits of your Facebook reportage.
So with Facebook there were a couple of different strands in our reporting. One of them was discriminatory advertising. So we realized that they had given advertisers the ability to target ads really granularly, and in fact they had this ability to block people from seeing your ad. So you could buy an ad and say, “never show it to a black person.” So there was a little drop down menu called “Exclude These Groups” and they had racial groups in there. And so-
This sounds bad.
It already sounds bad.
But to play devil’s advocate, this granular targeting is the idea that has made Facebook as powerful and successful as it is. The good part of their pitch is, “We’re going to deliver ads to people you want to deliver ads to.” That’s a good thing, not a bad thing.
Right, it just happens to be illegal in certain categories of advertising: Housing, employment, and credit. So employers are and housing advertisers and credit advertisers can not discriminate in advertising by race. So Facebook enabling this could well be illegal. These cases are now in the courts. So I’ll let the courts decide, but-
There’s a debate about the legality of this. And Facebook-
There’s a debate about the liability, right? So the question is really is Facebook as a platform liable or is it advertisers themselves who are liable? That’s the debate. The debate about whether you can do discriminatory advertising in those categories is settled law.
I think there was one ... you did a series of these, and on a bunch of them, Facebook said, “Yeah, sorry,” or “We’ll fix it.” ... there was one where they pushed back and said, “No, you’re wrong.”
On age discrimination in employment. So we found dozens of the leading companies in America were discriminating in their ads. They would target their employment ads just to younger workers. And Facebook itself was targeting their own employment ads to younger workers. And Facebook disputed, A, its liability as the platform, but, B, they said, “Look, you can have a multi-pronged employment strategy. Let’s say you’re using Facebook to reach the younger workers but you have another strategy for reaching older workers.” That’s something that people have to prove to the courts. The court would have to really feel clear that this doesn’t violate the age discrimination laws that we have on the books in both state and federal law.
So leaving aside the legality of it, this is good, important journalism. Shines a light. Lets people know what’s going on. Basic journalism, right? Basic in the-
... in the way that you would like journalism to be basic. You were doing it at ProPublica, which is also a non-profit. They have a model where they publish on their own site and then also distribute their stuff to places like the New York Times or work with New York Times. So it got lots of attention again, like I said. It got Facebook to acknowledge its mistakes in many cases. So why not keep doing that at ProPublica?
ProPublica was a great place for me to do this work. But Jeff and I ... my partner, he’s been my partner on all these investigations, he’s sort of the programmer, I’m the journalist, although I think we’re both a little of both. I have a programming background as well.
Yeah, I grew up in Palo Alto. Steve Jobs was a neighbor and he was funding-
Like a neighbor-neighbor? Nextdoor?
No, not nextdoor. But everybody knew him.
You know what I mean, he was around. He funded a program for all fifth graders to learn to program. So I learned in fifth grade and my parents were both programmers, so I always was going to be a programmer. I studied math in college and my university didn’t have computer science, but I took computer science and I worked my summers at Hewlett-Packard. I was going to back after graduation to Hewlett Packard, but I fell in love with journalism.
This was not just “I was around tech and I learned BASIC.”
No, no, I did Pascal and Lisp. Lisp was an amazing language, I would like to say.
I’m nodding like I’ve heard of Lisp.
They’re all old languages and people now would laugh at me. So I haven’t coded for a long time. I guess the way you were Wall Street Journal-adjacent, I’m coding-adjacent now.
Got it. So you’re doing all that, again at ProPublica. They gave you the resources to do this good stuff. I’m just pushing a little bit because there’s got to be a story, right? It could be as basic as, “I wanted to do my own thing under my own auspices and not be part of something else. I wanted to do my own thing.”
No, so it’s a simple story, which is that-
I’m sure they’d like you to keep staying and doing journalism at ProPublica.
Yeah, we had a long conversation with them about what we wanted, Jeff and I. The thing is you are familiar maybe more recently with the ... we did about a year’s worth of work last year on Facebook. But the year before, we had done insurance algorithms and criminal risk scores and those algorithms that were biased and those investigations were really heavy-duty. I was feeling that I just didn’t like having to choose. I feel like there’s such important things happening in the criminal justice system with algorithms, and also Facebook. And so I guess my ambitions were just that I wanted more teams. I wanted there to be more of us. I wanted to have 20 reporters or 15.
And you figure you could do that better on your own than going to Craig Newmark and saying, “Can you donate an extra 20 million bucks for ProPublica so we can fund this within ProPublica?”
So I had conversations with ProPublica about whether I could do it internally and that was my first thought, was I’ll propose to them that I would do it internally. But for a whole bunch of reasons they weren’t into that idea. You know, they’re a young place themselves. So the idea of having already a startup within the startup-
Yeah, it’s a lot.
It’s a lot, right? So we all agreed in a very adult way that it was better for everyone to go their separate ways.
You should not apologize for being ambitious and saying, “I wanted to run my own thing,” and doing that-
I did want to run my own thing, but I actually was also really terrified of running my own thing. So I’m super lucky that I found a business partner. I probably wouldn’t have left if it was just Jeff and I. I think we were smart enough to know that journalists are not always the greatest at running a business.
I don’t know what you’re talking about.
So it wasn’t really until we found Sue Gardner and she agreed to join us that we found the confidence to do it on our own.
She’s from Wikimedia/Wikipedia.
I’m really curious about what it means that you’re doing this as a non-profit. At least initially you don’t intend to charge people for this content. Again, I’m really not exaggerating, but probably one out of every two guests that comes on this show is figuring out some way to charge people or is already charging people for their content and it’s now sort of, the new conventional wisdom is, “selling subscriptions, or some version of that, is not only a good idea for your business, but it’s inherently good for the journalism. People who value what you do will pay you for it and if you make stuff people value, it’s a virtuous cycle.” What does it mean this is going to be a non-profit? Does it mean that this kind of work can’t be supported by the market?
Well, I don’t know. I’m not a total expert on business models in journalism, but I know what I’ve experienced in journalism over the many, many years that I’ve been in it. What I’ve seen is that the for-profit model has led to the shrinking of resources for investigative work and long-form and the stuff that I do, which is resource-intensive. I certainly saw that at the Wall Street Journal. When I got to ProPublica, it was my first time at a non-profit journalism company. I had never in my life been in a newsroom that was expanding. It was incredible. There was no rounds of layoffs, there was no fear and dread all the time. It really convinced me that for the moment we’re in right now, this is the way I would like to be. I want to be in the part that seems to be thriving. I agree with you, there’s a lot of questions. How long are these rich people going to support us, right? I think for the kind of work we do, investigative, expensive work, it’s hard for me to imagine that readers definitely want to support it though paying directly through a paywall.
The older model for supporting investigative journalism, some combination of … you had a publication and it was ad- supported and/or maybe there subscriptions like the Wall Street Journal, and that papers that were wealthy enough ... and for a long time there were a bunch of them ... could do investigative journalism. Depending on your view of it, it was either a worthy thing to do, full stop. And/or it was a glamorous thing to do and it got you prizes and it made other journalists want to work there. Very often, you could see the bad version of this, which is a 10- or 12- or 50-part special published at the end of the year, built for an award committee and not for actual readers and very few people would actually get through the thing.
But now we’re in a world where you’re saying, “We want to do this stuff, it’s good. Full stop.” But it’s not tied to the business model. I guess I’m just sort of talking non-stop because I’m trying to figure out if it’s a bad thing that you’re not selling, you’re not asking people to pay for this directly.
Well, we are, first of all, going to ask people to donate. So we launched our static website, which literally is just an about us page and has a donate page. So we’re-
How’s it going? We’re four days into it.
Yeah. I mean there’s not a lot of donations yet.
But we haven’t shown the money, right? We haven’t published any news, but we are going to hope that our readers will donate. What we’re going to do is the trade off that we’re going to make, and this is going to be our favorite topic because we like fight about ad tech, is we’re not going to track them. A lot of people ask for money and they do all the surveillance and tracking and data mining, but we are going to have a really clean website and do as little as possible. There’s some things like the Stripe payment processor or MailChimp or something that might ... You can’t avoid some of it.
“We are going to be as virtuous tracking-wise as we possibly can.”
We’re going to respect the reader’s privacy as best we can and we hope that that encourages them to want to contribute to us.
There’s definitely a sizable community of people who are really intellectually, theologically opposed to ad tracking. It really upsets them. They’re very vocal on the internet.
It seems some of those are the former Facebook executives who all just left in the last week.
Well, they all get religion. They don’t return the money, though, I’ve noticed. They keep the money. Yeah. Those are good stories. The whole WhatsApp thing was so weird. I was just going back to the story.
Did you read the story today? It was so great.
Yeah. I read the story. I went back and found my story from 2016 which just tracked the WhatsApp guy’s comments from 2010. Advertising is terrible. They’re literally quoting Tyler Durden from whatever that dumb movie was with Brad Pitt. Fight Club. Had this very, “Ads are terrible, man, and we’ll never do it.” Then they sell the company for $22 billion. Then a few years later, “Oh, yeah. We sold our company to an advertising company.” That’s how that’s going to work.
I also love that Brian Acton ... I’m digressing here, but Brian Acton, who’s the source of the Forbes story today is putting money into-
Signal and that’s a privacy thing, but his co-founder Jan Koum, the line was, “Has left to pursue his collection of air-cooled Porsches.”
Air-cooled Porsches. Yeah. That was a fantastic detail.
That was great though.
I don’t even know what that means, but it sounds really glamorous.
It sounds fancy. Let’s talk about where the money’s coming from because there is no free money, right? Like Craig Newmark, my co-worker Kara Swisher, a lot of people like Craig Newmark. You had some line here, I’ll quote it back to you, saying, “He’s great because we’re perfectly aligned.” Let’s say everyone is working with the best intent, but how do you insulate yourself when Craig Newmark eventually says, “I don’t like that story you did,” or find some way to express disapproval or says, “You guys are doing great, but I’m not going to fund this anymore.” How do you buffer yourself in the money?
Yeah. That’s a good question. In our conversations with Craig leading up to this investment, what was so great about those conversations was that he said, “I know that I can never see a story before it’s published. I will not ever see one. I would like to email you if I think it needs a correction.” I said, “Yes. You can do that. That would be fine.” He also has said, “I’m not really going to pass along ... I could pass along some stories idea, but I may or may not and you don’t have to do any of them.” In terms of funders, that’s about as much as you can ask for. Now, did he sign a contract in blood saying that? No, but I believe him. He doesn’t seem to have an interest in meddling. Honestly, we wouldn’t let ourselves be meddled with, either.
Right. It’s not that there are no strings attached, you just can’t see the strings or you’re unaware. I mean you’re going into this eyes open, I guess, and you think he’s as good a steward as you could ask for.
Yeah. I mean I previously worked for Rupert Murdoch.
Yeah. I cashed some of those checks. Just to be clear, was very hands-on in various ways.
He really likes journalism. He likes newspapers. He likes being in it more than he does making money from it I think.
Did you look at ... All right, I remember Pierre Omidyar had good intent and he founded The Intercept and then a couple other stuff. There was a mess there. Are there things we can learn from that or one rich guy is different from the other rich guy?
Yeah. I mean I don’t know enough about what has happened to The Intercept to know if that’s something to do with Pierre’s funding or not. The stories I know mostly have to do with sources getting burned inadvertently, which is terrible and tragic and that breaks my heart as a journalist. Whether those were avoidable, what led to that, I don’t know, but I guess I wouldn’t say that there’s something that came screaming out to me from that. I would say that when we were looking at funding, the most important thing to me was that we ... Just not have somebody who’d want to interfere. The foundations actually are notorious for wanting to interfere. They often wanted to fund just one beat or one topic.
Like I said, there’s no free money. Someone is always giving you money with something in mind, doesn’t mean it’s bad. It’s just there’s a reason they’re giving it to you.
Right, but it is interesting ... I don’t know if you know this. I spend a bunch of time now in the weeds in the foundations, but MacArthur and some other foundations actually put together a pledge that said that journalism funding should be unrestricted grants. A lot of them have signed it and committed to that. That is a great thing because I think there is a growing recognition that A, journalism isn’t going to fund itself at least until someone figures out the best pivot ever, and B, restricted grants are just a way of censorship or controlling the stories. I found that the big foundations, the ones whose names you know, are pretty good about that.
10 years ago was the rise. Craigslist has been growing for a long time, but Craigslist was really growing. At the same time, newspapers were falling off a cliff. Lots of people, including myself, connected those things. You feel comfortable working with Craig Newmark. He’s also donating to other journalistic endeavors, $20 million to CUNY. You don’t see this as penance for whatever he did or didn’t do to papers?
The truth is I don’t know what motivates him and I’m not going to speculate, right? I actually feel like generally as a journalist, I try not to focus on motivations because it’s probably true, for instance, that Facebook’s motivations in doing the drop-down menu for racial discrimination were fine, but the outcome might be they were breaking civil rights laws.
“I don’t care what you meant. I care what you did.”
Yes. I feel that way about funding too, which is I don’t know what motivates Craig to do this. I’m really grateful. We aren’t going to take corporate money. We wouldn’t take money from Google or Facebook, but we will take money from individuals in the tech industry or in other industries.
You would take money from a Facebook executive?
Yeah and that’s fine.
Maybe a former Facebook executive.
Maybe a former Facebook executive.
Who still has some money even though he feels bad about how he made it.
In between. He could just have one less Porsche.
I want to talk to you about Facebook and what they’re going through and how it’s connected to what you’re focused on. We’re talking after 2016 elections, Cambridge Analytica, and a lot of things get conflated about what Facebook is going through. Your reporting has contributed to that and I think in a positive way. Now it’s conventional wisdom that people are angry at Facebook and users are leaving in part connected to all this anger. Do you think that’s true?
Well, you know that’s the thing that is so hard about the tech companies, is we don’t have a lot of independent data. We actually don’t even have the equivalent of Nielsen for TV ratings. Facebook says, “This is-”
Those [Nielsen] are very crude user numbers.
Right, which is very crude user numbers. Thank you. Facebook tells us how many people they say are using it. They tell us how many people saw each ad. There’s no independent metrics. To me, the real question that I’m interested in is providing those independent metrics.
Right. Facebook measures this, supposedly very rigorously. They’re constantly asking users, “How do you feel?” But they’re not sharing that information.
Sure, but we don’t know what it is.
What I feel like my mission is, is to provide some counterpoint to that, some other data. It will never be as good as Facebook’s data, but perhaps I can provide a little bit of data. Because I don’t know if people are leaving Facebook. Truthfully, I feel like a bunch of techies got a little worked up about it, some elite people, but my regular friends are using it all the time, still.
Yeah. It’s definitely hard for me to reconcile. I’m smart enough to be wary of those narratives, but I don’t know how to replace them. I also have this gut that the Cambridge Analytica thing, which is supposedly, according to Facebook executives, a giant thing caused a lot of regular people to be upset with Facebook, is actually a proxy for Trump. If this was the exact same story but you replaced Trump with Clinton, it would be less of a big deal. If it was just a data breach and it was seemingly unconnected to elections, people would just have a generalized shrug.
I mean, I feel like I have some data to support your hypothesis because in 2010-
I led the series at the Journal, “What They Know.”
I was setting you up, but okay. Good.
Yes. I know. Thank you for setting me up. Basically we did a lot of these same stories. We found all of the top apps on Facebook were stealing user data. We found a company, RapLeaf, that was basically doing exactly what Cambridge Analytica did.
I stepped over this while you were telling me. This is important. You did a series for the Journal called What They Know was you, Emily Steel.
Very distinctive ...
Awesome group of people. The big idea was you were explaining to a broad audience how data collection works, specifically on the internet and internet advertising, but just broadly how people are tracking what you’re doing.
Yeah. We did a lot of these stories, similar to Cambridge Analytica, about how third parties were getting all the Facebook data and then Facebook was like, “Oh, I’m so sorry. We’ll fix it.” RapLeaf, which was doing political targeting using the voter list and finding people online using that information.
RapLeaf. I remember RapLeaf.
Yeah. It just didn’t have the resonance at the time. First of all, it was 2010, so we were maybe all not using our phones quite as much and so embedded in Facebook at the time. Also, it was just a little abstract and techie. I think you’re right though, the political moment, what happened at that election was everybody felt so surprised. Like, “Oh my God. What happened in 2016?”
Remember back in 2016. Yeah. Who can we blame? What happened? What went wrong?
Yeah. I feel like there’s ... The thing that makes me crazy about it is we don’t know. People say, “Oh, it’s all because of the dark ads on Facebook or the Russians.” We’ll never know because only Facebook has that data.
That is a crazy thing. That’s not true in any other part of our election process. You can see radio ads, TV ads, print ads. All of those things are available for the public to see.
It’s still hard to parse exactly why someone in Wisconsin who voted for Obama x number of years and turned around and voted for Trump, but at least you can see who bought ads there.
Yeah, right. Exactly. My point is you’re never going to really fully know, but I think there was also some question about whether it was some black magic online.
Yeah. I’m glad you teed it up, because I did want to ask you about that series because it’s a great series. I was reading it this week in preparation for this. A lot of it, you guys were approaching this as investigative journalists. You’re saying, “Hey, I want to show you something and something you should know about.” The insinuation is this is something, by the way, that’s wrong. A lot of it reads like a primer for “this is just how digital display advertising works. It’s how ad tech works.” I was wondering if you felt like, “Boy, I wish we’d found the devastating thing that really would’ve gotten people’s attention.” Because you’d say, “We showed this person how this company was able to track her down to the pixel and figure out her age and what she rented and all this stuff.” Again, like you said, there was no resonance for it.
Do you think there was ... You needed a Cambridge Analytica to make that pop?
Yeah. No. This is where you and I have some fights about ad tech.
Because when I first came out with that series, you were like, “That’s just how ad tech works.”
That is how it started, right? Our very first story was really only about online tracking.
Which, again, is a good thing that you were showing people how it works because by the way, they still don’t know how it works.
Correct. Very quickly, we did find many abuses, right? We found all those Facebook apps that were stealing data. We tested 100 iPhone and Android apps, many of which were taking user data. In fact, Apple came out with a new way of setting identifiers as a result of that type of abuse. We found Google was tricking the Safari web browser into allowing it to set cookies. They paid a $22 million fine to the FTC. We started off with more of an explanatory, but we really did end up in a place where we found things that really surprised us. In fact, one of my favorites stories was the one where we looked at when you log in to a website, put your username and password, who was sending that information to third parties. The Wall Street Journal was one of the top offenders of sending that to the third parties.
I mean I do remember talking to a lot of ad tech execs. I used to pay more attention to that business. They were all very nervous and upset about what you were doing and they were really worried you were going to find something terrible. Then they had a collective sigh of relief by the end of it. Their main defense was, for most of what you were reporting, was, “Well, yeah. That’s what we do and one, we need to be able to do this kind of advertising to support whatever we’re publishing, and two, it’s better than ...” Then they would name the credit card industry or something else. “We’re less invasive than x, y, or z.” I’m assuming you heard a lot of that.
Yeah. Do you think that the general consumer is more knowledgeable than they were about this stuff and/or do you think they care about this stuff?
I don’t know. I think people have a little bit more knowledge than they did then. I think people still don’t know how much to care. I think collectively as a society, we actually just don’t know how much to care because we don’t know how much it matters. If it swung the election, then it really matters.
If it was just margin at the edge, maybe it doesn’t matter or it’s something we can mitigate against. That’s why I feel that I’m just really committed to taking a data driven approach to these questions because I feel like we just don’t have enough evidence to know how much they matter. I’m open to the idea that everything I’ve done is meaningless, but I would still like to do it to prove that.
Just so we’re clear, I’m not suggesting what you’re doing is meaningless, but I am saying it takes-
No, but I’m actually being serious about that.
It takes a certain kind of fortitude to go, “This is important. I’m going to keep doing it and eventually people are going to appreciate” — it sounds wrong, the way I’m framing it. Very often I think it’s hard to do this kind of work and not get pats on the back from regular people or to see people handing over their personal information for the equivalent of a candy bar. Also just fundamentally, I’m interested in the idea of whether people ... I think my gut is that when people say they care about privacy, they really mean pornography, health/insurance, and maybe voting. If any of that they think is being exposed, they’d be really worried. Beyond that, they probably don’t care that much.
Yeah. I think the reason is that we just don’t know how much it matters. Right?
We literally don’t know. If we were in China, where they have an algorithm that determines if you’ve gone in the Western region where there’s a Muslim group that they’re worried about them being terrorists. They have an algorithm that if you go to the gas station, then you do a certain social media post or whatever, and then they throw you in a re-education camp because you’re too risky. Now if that was happening here, people would suddenly be a lot more concerned about their personal data.
That’d be very bad.
Right. My point is we just don’t know. If that happened, then we would all retroactively like, “Oh my God. I wish I’d never signed up for any of those apps because the government would take that data.”
Right, but the idea that Google sees my email, I’m just not going to think about it and maybe I’ll notice that they’ve sent me an ad that seems to be reflective of my email, and that upsets me. Or you go all the way to the world of make-believe where — I know tech reporters who believe this — that Facebook is listening to what I’m saying on my phone and serving me ads, which is not true. And if you’re a tech person you should know better than that. But if you’re a regular person, that sounds like a reasonable thing to assume, that Facebook is that smart that they’re doing it.
Well, I have to say like that one has come to me so many times.
People say to me specifically about Instagram that they feel like it’s ... The ads they see there are so perfectly related to something they’ve just mentioned and they wonder if it’s voice related. Those are the kinds of things we want to test, right? Because I know that Facebook has said many times it’s not true. You know their record of truthiness is spotty and I would just like to test that premise, right?
And so what we’re going to do is test all such things, and half of them will never prove out and half of them will turn out to true.
I mean to me what that says is people don’t understand how much they are collecting about you that even though you don’t think you’re giving them this information, you are either directly through Facebook or through any other web interaction. And they can knit together this portrait of you that knows you better than you think.
And yet every once in a while, it turns out to be true. Like Uber was totally taking data off your phone about your movements when you weren’t using Uber, right? Even though their thing said they weren’t. So every once in a while, those guys are cheating the system.
Oh yeah. I’m not saying they’re paragons of virtue. I’m just saying in this case the reason they’re able to deliver targeting advertising is because they’ve developed this really good targeted advertising.
Yeah. And also the other thing none of us want to admit is that we’re not as unique as we think. Like we’re just like kind of obvious and predictable.
This is data-driven journalism. So the way you want to collect the data is how?
So we’ll do-
You want to walk around with a clipboard and poll people?
Maybe. I hope it doesn’t come to that, but it might.
That’s a way, right?
Yeah. No. It is. We’re going to do all the usual things, right? We will file public records requests. We will use automated data collection across the internet. We will do crowdsourcing. One thing did a lot at ProPublica was build tools that people could use to donate their data to us in very specific ways. So with the Facebook political ads we built like a browser extension. People could add it to their browser and then when they were on Facebook it would identify which ads were political and send it to us.
So you’re asking people to flag stuff for you.
Yeah. But we would build the tools so they didn’t have to actually do anything, right? Like the only thing they have to do is install the tool. So I imagine we’ll do a bunch of that. You know we’ve done crazy things like Jeff and Surya went on a boat outside of Mar-a-Lago and scanned the WiFi remotely to show that it was totally vulnerable to hackers.
That was you guys?
I thought is was Gizmodo.
It was a joint thing with Gizmodo, yeah.
It was a great story.
It was the most fun ever. So we are trying to find a way to do more boating investigations.
Yeah. I mean it’s so terrifying. So terrifying.
I mean if we could do it, like-
Exactly! That’s the point of you just got, literally got on a boat and were able to pick up like Trump, pornography, printer, WiFi, sex.
Also to open ports basically. Yeah. Yep.
... from Mar-a-Lago. That’s a great story. But one of the things you want to do is collect this stuff in mass. You’ve got a conflict with Facebook and some of the other tech platforms as well about the ways you want to do automated data collecting?
Well, yeah. I mean, so, automated data collection on the internet is time honored technique. Lots of journalists do it. But it technically can violate the terms of service.
So what is automated data collecting?
So basically you build a little thing. It’s called a spider or a crawler that goes out and basically ... The way Google indexes the web, you go out, you look at every web page and collect the information from it, and then they do it to index. We don’t usually crawl the whole web. We would do a crawl of something very specific. But we build tools like that to find data, right? So a kind of classic thing we would probably do is look for, you know if there’s a new creepy way that people are being tracked online. So there was a couple years back something called canvas fingerprinting. Like you couldn’t detect it in your browser. You could crawl the web, look for all the websites that are using that technique, right? And publish a list of those sites.
And Facebook says you can’t do this because?
So crawling on Facebook is prohibited by their terms of service. So specifically Facebook is a walled garden. You’re really only supposed to see your own feed.
And this is not built as an anti-journalism feature?
No. It’s basically so that other people don’t copy all the data and they can make a rival social network.
Right. It’s for competitive, commercial reasons.
Right. You know, they face a lot of competition.
Yeah. But that’s the idea, right? The thing that fuels us is this data. We aren’t going to monetize it. I mean geez. They still insist on this whole thing about they don’t have your data and somehow magically appears in other sources. It’s so tiresome. But it’s what makes the thing go, right? So they want to protect that and limit the access to it.
Yeah. So they and others have limitations on what you can collect using automated means. And theirs are probably the most strict that I know of. And so we don’t do crawling across Facebook. It would break their terms of service and it would also actually be technically difficult to do, because they have very good technical measures against it. So for instance, when we do data collection on Facebook like the one we did with political ads, we just built a tool so that users could send us their data. So we feel like that’s much less intrusive because a user theoretically should have some right to have their data and send it to somebody.
So the cheap journalist in me wants to say, “Oh. Well, see there’s irony here. This is the sort of stuff Cambridge Analytica something, something, something was doing ...” And by the way, Facebook thought they were doing good by letting the Cambridge Analyticas of the world collect this data and now you can see why they want to, in addition to commercial reasons, why they want to be extra protective about this stuff.
I say the difference between us and Cambridge Analytica, although there are many, but one really key one is, so for instance, the Facebook political ad collector tool we built, it literally strips out every single bit of identifying information. So the only thing we received was the ad. We didn’t know which, your Facebook ID, we didn’t actually even know your IP address, what location, what country you were in. So we knew nothing about the people who used it and contributed [to] it, and we built the tool to comply with the European data protection standards. So we tend to be really targeted in our data collection. I wrote a book called Dragnet Nation, right? And the idea is dragnets, where they indiscriminately collect data are sort of, you know, they make you feel like it’s unfair. And so we deliberately try not to build dragnets to build really targeted data collection, just answer one specific question.
Do you feel like, and I guess this is probably true of any journalist doing anything, frankly, that the asymmetry between what your resources are and the resources of a Facebook that you’re trying to learn about, stripping out the David and Goliath, sort of, morality play of it, that just like, you’re just perceptually at this giant disadvantage and you can never really get to the answer you want. And they have it all. And it’s behind a wall. You’ll never get to it.
Yeah. But I think that’s true of every kind of ... Covering private companies is like that because, I mean, I was a business reporter for a long and covering Exxon is the same way. You know, you don’t get a lot of access into what’s going on.
Exxon doesn’t let you in and say, “Hey. Here’s all the oil spills stuff we’ve got.”
I mean who knows what they do, but I don’t think so.
I mean, to be fair, there are some really fun companies to cover, like a News Corp or a Myspace where it’s just full of characters who tell you all kinds of crazy shit.
Oh media is a great industry to cover because no one can keep their mouth shut.
Yes. Yes. Yes.
But I do feel like there’s a huge asymmetry but I also feel like that the fact that those businesses are so opaque means that sort of anything that you get is news. And it can probably open more things. And so I guess I see it more as an opportunity.
Yeah. Well, obviously if it’s hard, someone else has not done it probably.
Right. That’s what I like. Those kinds of things.
You must have a target list, right? You’re going to launch in 2019. You want to lead with a story on ... You’re not going to tell me.
I’m not going to tell you.
But with this stuff, and you talked about this as sort of ... This is one of the stories I saw, it talked about this being sort of scientific type of, you know ...
Thank you. It’s late in the day. The problem with the scientific method is very often you do this experiment and nothing comes of it, right?
Yeah. Right. No results.
It’s boring. And this a problem with lots investigative journalism and journalism, just lots of stuff just doesn’t amount to anything and if you’re good about it, you don’t publish it because you know there’s no story there. But you set an extra high-bar for yourself.
Yeah. No, it’s super hard, right? And that’s true with investigations. It’s definitely true with these types of data investigations. One thing we have been thinking about is whether we, we have to talk to our lawyers about this, but I would really like to publish a “no results” page where we have our sort of failed investigations, because I do think that sometimes our negative findings are actually interesting.
Show your work. “Here’s a bunch of data we collected. It doesn’t prove anything.”
But the only problem with that is it may be legally impossible to do because the question you asked is a little bit of an accusation, right? Like it’s like we were testing if Facebook’s doing a bad thing and then by showing our data and saying, “Well, it doesn’t seem like they are, but you look.”
Yeah. Yeah, yeah, yeah.
That might not be good with the lawyers, so I haven’t quite sorted out whether we can do that.
Could you just do it and say, “Here’s some data we’ve collected? You make of it what you want.”
Yeah. We’ll see how this evolves. I’ll come back in a year and we can talk about that.
And how important is the idea that, I mean we’ve been talking about it on and off throughout this interview and many others that you’ve done, the idea that “this is all data-driven” as opposed to “an interesting story.”
I mean we are not against interesting stories. And one thing that I do think is that we want people to come to us. We still want sources to come to us and we hope that our data savvy means that they can bring us sophisticated stories that they’re worried other people might not understand.
So you’re not going to turn down and say this is not a data story I want to do?
No. We are not going to turn it down. We are open for business.
But I’m not kidding because, right? You’re not doing the same thing as FiveThirtyEight, but there’s some parallels, right? And I think people loved Nate Silver’s work, but particularly when he was accurately forecasting the elections, right? And when he’s not doing that or, by the way, when he gets Trump wrong like everybody else, it’s less interesting, or baseball’s interesting but what’s the best burrito in the world? It’s a cute stunt that you can apply some sort of data driven analysis to. But I don’t really care and I think they’ve struggled a bit with that. And I wonder if you’ve boxed yourself in? But you’re saying no. The box is open.
Yeah. No, I think we want to use data when it’s appropriate, and we also will do stories that will not be data-driven, right? Like that’s always going to be true because we want to do important work, and some important questions. You know I guess the way I would say it is that all journalism is really data-driven, right? Like this interview, this is data collection. You are collecting data about me, right?
And so it’s a sample size of one, but it still matters. And so I guess I would say all of our stories, they will vary in sample size from one to many.
This is like the VCs I talk to who say, “Our thesis is about network effects.” And then they can make anything in network effects.
That’s what I’m doing right now.
Very good. You should become a VC. This is a great conversation. What, you announced this Sunday night/Monday morning. We’re recording this Wednesday night. What is the best response, most surprising response you’ve gotten over the last couple days?
Oh, well I have to say the most heartwarming response is just people who in the past were mean to me are now sucking up to me.
You’re not looking at me, right?
I’m not looking at you.
Good. Okay. We’re good, right?
We’re all good.
Great. Okay good. This is great. Thank you for coming on. I appreciate it. I look forward to the launch. I look forward to you coming back and discussing what you got right and what you got wrong.
Yes. I’ll be happy to.
We’ll see a data dump of failed stuff.
Yes. We’ll do a quantitative analysis of it.
This article originally appeared on Recode.net.