Apple isn't collecting user data in Safari, it's collecting crappy website trends

Apple is using differential privacy in macOS High Sierra to figure out how best to tackle web sites that use excessive power, memory, or crash the browser tab. Basically, looking for more trends it can address, like it's already doing by actively blocking third party trackers.

From TechCrunch

Today's public release of macOS High Sierra brings with it some key updates to Safari — including the ability to disable cross-site cookie tracking and turn off autoplaying ads. Arriving alongside those features is a less publicized new addition to Apple's proprietary browser: data collection. The company is using its newly implemented differential privacy technology to gather information from user habits that will help it identify problematic websites.

This is true but it's also also being misinterpreted by those referencing the article as Apple harvesting your data to — oh, the irony! — protect your data from others. But that's not what's going on.

Here's a super-simple example of how differential privacy works:

You're at a large family dinner and a question comes up: Who likes Star Wars better and who prefers Star Trek? You want to know the split but you don't want to cause any long standing feuds. So, here's what you do: Everyone flips a coin. Anyone who gets heads marks down the true answer. Everyone who gets tails marks down a lie. Then, when you collect the answers, knowing the odds of truth vs. lie, you map back to a fairly accurate ratio. But here's the thing: You have no way of knowing which individuals lied, which means you can't figure out who really likes which franchise. So, their privacy remains inviolate, and there are no ugly food fights over Vulcan Science vs. Jedi Academy at the dinner table.

Differential privacy takes it several steps further, though. For example, if you're answering more often, it'll throttle you down so there's no chance of an identifiable pattern emerging. Likewise, if there are too few samples (perhaps rural vs. downtown in some situations) it can preemptively opt you out to preserve privacy.

In other words, Apple wants the trends — the big picture. It doesn't want the individual details that make it up. It's about stats.

Companies that harvest your data want the details. They want you. They relentlessly record every scrap of data that can to build as precise a profile of you as they can, so that they can better target you for ads.

Apple doesn't care about any of that. All Apple wants to know, in powerfully anonymize aggregate, is which web sites are giving you a bad experience, so the company can do things like suppress their trackers.

It doesn't want you. It want's the websites.

And if you haven't deliberately opted in to Apple's device analytics system, the company doesn't even get that. It gets nothing.

But irony of the fact that the company is collecting more browsing data in order to make its browser more secure won't be lost on some.

Apple using differential privacy to improve user experience and performance in Safari isn't the least bit ironic. It's not even poetic. But it's damn clever.

It's also yet another manifestation of the company's fierce belief in user privacy — or, if you're cynical, the privacy-first strategy Apple knows its competition simply can't compete with.

There are plenty of things you can call Apple to task over, like the abandonware status of Mac mini and Mac App Store, but using differential privacy to improve Safari isn't one of them.

Rene Ritchie

Rene Ritchie is one of the most respected Apple analysts in the business, reaching a combined audience of over 40 million readers a month. His YouTube channel, Vector, has over 90 thousand subscribers and 14 million views and his podcasts, including Debug, have been downloaded over 20 million times. He also regularly co-hosts MacBreak Weekly for the TWiT network and co-hosted CES Live! and Talk Mobile. Based in Montreal, Rene is a former director of product marketing, web developer, and graphic designer. He's authored several books and appeared on numerous television and radio segments to discuss Apple and the technology industry. When not working, he likes to cook, grapple, and spend time with his friends and family.

  • I'm interested in your take on how Apple anonymizes our data. I've read a few articles on Apple being called out for not being transparent on how they secure this information. You've been very vocal on privacy as it relates to other companies.
  • Interesting / completely unsurprising that Rene has completely failed to acknowledge the academics view on this, instead tossing in a bizarre Star Wars / Trek analogy. If Apple are to be trusted with our data then they need to put more effort into protecting it.
  • Apple does a great job of protecting our privacy. It’s their business model. And unless you can offer proof of your assertion you need to stop the ad hominem accusations.
  • Read the 9to5 Mac post linked to in the OP.
  • That is an interesting read. It'll be interesting to see how Apple responds. Still, that's data they have. And they don't sell what they have to anyone else. At least not personally identifiable, though it would be better if they were more careful with the data in their possession. The difference between them and Google and Microsoft are still exponentially better. But that's no excuse for lax implementation of differential privacy. They need to get that number down to 1.
  • Can you actually prove that they do? If you asked them for the actual data file that they transmit back to the mothership that was specific to yourself, would they let you have it?
  • It takes me a while to digest, investigate, test, collect my thoughts, and write something I’m happy with. I’ll have it soon. And hugs to you too!
  • Working on an article on that today. Stay tuned!
  • FYI, flipping a coin to decide whether or not you lie (in your example above, where "heads" means you answer honestly and "tails" means you lie) is precisely the one case where this example fails. You need to divide the people into two groups which do *not* have equal probabilities. Have them roll a die, and answer honestly if they get a 1 or 2, and lie if they get 3-6. Or another common approach is to let them choose a card from a deck, where the card says whether to lie -- but the deck must have unequal proportions of "truth" and "lie" cards. The mathematics breaks down if you flip a fair coin, or have a 50-50 chance of being in either response-type group.
  • You are missing the point about the coin toss: it is not a matter of probability or equality, it is about anonymity. Rene did point out in the article that you could figure out the coin tosses but not which party-goers told the lies. So, if you could figure out that three people had lied, so what? You won't know which three had lied. Just because it's 50/50 does not mean that the outcomes will be equal. You know, you might have to toss that coin a billion times before you 500M heads and 500M tails. You are also likely to find long strings of heads only and long strings of tails only. Probability does not imply possibility. If you were to flip a coin exactly 100 times it is possible to get 100 heads. It is also possible to get 50 heads and 50 tails. The probability, however, of doing either out of 100 flips is extremely low, to the point where you could bet everything you own on either not happening with the confidence that you already won. Anyways, I am just stoked that Rene did it right by using Star Wars vs. Star Trek and not Star Wars vs. Harry Potter like the kids do these days (er, a few years ago).
  • I'm not missing the point. I understand the point, I'm just saying his details are terribly wrong. Suppose you have people flip a biased coin which has probability P of heads and probability 1-P of tails. If they get heads, they answer the question "do you prefer Star Wars," and if they get tails, they answer the question "do you prefer Star Trek". You give n people the question, and you get m "yes" responses. The estimate of the proportion of people who prefer Star Wars is 1/(2P-1)*(P-1+m/n). But if P=1/2 (i.e., you are flipping an unbiased coin), then 2P-1 is zero, and you end up dividing by zero. You need unequal probabilities in order to tease out the proportions in the two groups through the anonymizing/randomizing from the coin toss. (Sorry for going off into such detail, but I am a mathematician.)
  • Oh, I don't know, does one exclude the other? I mean, every other security measure is as full of holes as the previous Mac OSs used to be, nobody even mentions the security anymore.
    I'm not entirely convinced in the official news about Safari 11, they all look suspiciously similar.
    maybe I'm overthinking it, but this crappy site data collection seems rather useless. And it's still user data collection in a way. Not to go tinfoil hat on the matter, but I just don't see the use.