Don't believe the FUD

No, Apple's Machine Learning Engine can't surface your iPhone's secrets

Core ML is from Apple and it's new and sci-fi sounding, and that means some people will try to stick it in a headline and get attention, even when doing so clearly hurts readers and users.

Core ML is Apple's framework for machine learning. It lets developers easily integrate artificial intelligence models from a wide variety of formats and use them to do things like computer vision, natural language, and pattern recognition. It does all this on-device, so your data doesn't have to be harvested and stored on someone else's cloud first. That's great for privacy and security, but it doesn't prevent sensationalism:

Wired, in an article I'd argue should never have made it into publication:

With this advance comes a lot of personal data crunching, though, and some security researchers worry that Core ML could cough up more information than you might expect—to apps that you'd rather not have it.

It's less likely some people worry and more likely they saw a new technology and figured they could stick it and Apple in a headline and get some attention — at the expense of consumers and readers.

"The key issue with using Core ML in an app from a privacy perspective is that it makes the App Store screening process even harder than for regular, non-ML apps," says Suman Jana, a security and privacy researcher at Columbia University, who studies machine learning framework analysis and vetting. "Most of the machine learning models are not human-interpretable, and are hard to test for different corner cases. For example, it's hard to tell during App Store screening whether a Core ML model can accidentally or willingly leak or steal sensitive data."

There's no data that an app can access through Core ML that it couldn't already access directly. From a privacy perspective, there's nothing harder in the screening process either. The app has to declare the entitlements it wants, Core ML or no Core ML.

This reads like complete FUD to me: Fear, uncertainty, and doubt designed to get attention and without any factual basis.

The Core ML platform offers supervised learning algorithms, pre-trained to be able to identify, or "see," certain features in new data. Core ML algorithms prep by working through a ton of examples (usually millions of data points) to build up a framework. They then use this context to go through, say, your Photo Stream and actually "look at" the photos to find those that include dogs or surfboards or pictures of your driver's license you took three years ago for a job application. It can be almost anything.

It could be everything. Core ML could make it more efficient for an app to find very specific data patterns to extract but, at that point, an app could extract that data and all data anyway.

Theoretically, finding and extracting a few photos might be easier to hide than simply pulling a large number or all photos. So could trickle uploading over time. Or based on specific metadata. Or any other sorting vector.

Just as theoretically, ML and neural networks could be used to detect and combat these kinds of attacks as well.

For an example of where that could go wrong, thing of a photo filter or editing app that you might grant access to your albums. With that access secured, an app with bad intentions could provide its stated service, while also using Core ML to ascertain what products appear in your photos, or what activities you seem to enjoy, and then go on to use that information for targeted advertising.

Also nothing unique to Core ML. Smart spyware would try to convince you to give it all your photos right up front. That way it wouldn't be limited to preconceived models or be at risk of removal or restriction. It would simply harvest all your data and then run whatever server-side ML it wanted to, whenever it wanted to.

That's the way Google, Facebook, Instagram, and similar photo services that run targeted ads against those services already work.

Attackers with permission to access a user's photos could have found a way to sort through them before, but machine learning tools like Core ML—or Google's similar TensorFlow Mobile—could make it quick and easy to surface sensitive data instead of requiring laborious human sorting.

I get putting Apple in a headline garners more attention but including Google's TensorFlow Mobile only once and only as an aside is curious.

"I suppose CoreML could be abused, but as it stands apps can already get full photo access," says Will Strafach, an iOS security researcher and the president of Sudo Security Group. "So if they wanted to grab and upload your full photo library, that is already possible if permission is granted."

Will is smart. It's great that Wired went to him for a quote and that it was included. It's disappointing that Will's quote was included so far down and unfortunate for all involved that it didn't get Wired to reconsider the piece entirely.

The bottom line here is that, while machine learning could theoretically be used to target specific data, it could only be used in situations where all data is already vulnerable.

Beyond that, Core ML is an enabling technology that can help make computing better and more accessible for everyone, including and especially those who need it the most.

By sensationalizing Core ML — and Machine Learning in general — it makes people already fearful or worried about new technologies even less likely to use and benefit from them. And that's a real shame.

Share on Facebook