Hey Siri, why are humans reviewing our voice files?

Digital assistants like Alexa, Google, and Siri use humans to help train them. This has been going on since the beginning but it's only hit the mainstream media and consciousness this year. Siri, in particular, has caused controversy because, for the last few years, Apple has been pushing privacy as a top-down, front-line feature.

So, what's going on?

Matt Day , Giles Turner , and Natalia Drozdiak writing for Bloomberg on April 10th, 2019:

Amazon Workers Are Listening to What You Tell Alexa: A global team reviews audio clips in an effort to help the voice-activated assistant respond to commands. 

Obviously, the report focused primarily on Amazon

Amazon.com Inc. employs thousands of people around the world to help improve the Alexa digital assistant powering its line of Echo speakers. The team listens to voice recordings captured in Echo owners' homes and offices. The recordings are transcribed, annotated and then fed back into the software as part of an effort to eliminate gaps in Alexa's understanding of human speech and help it better respond to commands. 

But, the reporters did their jobs, asked the obvious follow up question — what about other virtual assistants? — and answered that as well:

Apple's Siri also has human helpers, who work to gauge whether the digital assistant's interpretation of requests lines up with what the person said. The recordings they review lack personally identifiable information and are stored for six months tied to a random identifier, according to an Apple security white paper. After that, the data is stripped of its random identification information but may be stored for longer periods to improve Siri's voice recognition.At Google, some reviewers can access some audio snippets from its Assistant to help train and improve the product, but it's not associated with any personally identifiable information and the audio is distorted, the company says. 

Lente Van Hee, Ruben Van Den Heuvel, Tim Verheyden, Denny Baert, writing for VRT NWS on July 10:

Google employees are eavesdropping, even in your living room, VRT NWS has discovered

Putting the name of your publication in the title — and in almost every graf — is tight!

Not everyone is aware of the fact that everything you say to your Google smart speakers and your Google Assistant is being recorded and stored. But that is clearly stated in Google's terms and conditions. And what people are certainly not aware of, simply because Google doesn't mention it in its terms and conditions, is that Google employees can listen to excerpts from those recordings

But they do offer this as well:

Most recordings made via Google Home smart speakers are very clear. Recordings made with Google Assistant, the smartphone app, are of telephone quality. But the sound is not distorted in any way.

Then, just this weekend, Alex Hern, writing for The Guardian:

Apple contractors 'regularly hear confidential details' on Siri recordings.Although Apple does not explicitly disclose it in its consumer-facing privacy documentation, a small proportion of Siri recordings are passed on to contractors working for the company around the world. They are tasked with grading the responses on a variety of factors, including whether the activation of the voice assistant was deliberate or accidental, whether the query was something Siri could be expected to help with and whether Siri's response was appropriate.

Also:

Apple differs from those companies in some ways, however. For one, Amazon and Google allow users to opt out of some uses of their recordings; Apple offers no similar choice short of disabling Siri entirely.

Although that's been disputed: There is an iCloud Analytics toggle in the privacy that says it includes Siri analytics. But none of these things are crystal clear and that's largely the problem.

Here's how Amazon responded to Bloomberg:

"We take the security and privacy of our customers' personal information seriously. We only annotate an extremely small sample of Alexa voice recordings in order [to] improve the customer experience. For example, this information helps us train our speech recognition and natural language understanding systems, so Alexa can better understand your requests, and ensure the service works well for everyone."We have strict technical and operational safeguards, and have a zero tolerance policy for the abuse of our system. Employees do not have direct access to information that can identify the person or account as part of this workflow. All information is treated with high confidentiality and we use multi-factor authentication to restrict access, service encryption and audits of our control environment to protect it."

And Google to VRT News:

"This happens by making transcripts of of a small number of audio files", Google's spokesman for Belgium says. He adds that "this work is of crucial importance to develop technologies sustaining products such as the Google Assistant." Google states that their language experts only judge "about 0.2 percent of all audio fragments". These are not linked to any personal or identifiable information, the company adds

And Apple to The Guardian:

"A small portion of Siri requests are analysed to improve Siri and dictation. User requests are not associated with the user's Apple ID. Siri responses are analysed in secure facilities and all reviewers are under the obligation to adhere to Apple's strict confidentiality requirements." The company added that a very small random subset, less than 1% of daily Siri activations, are used for grading, and those used are typically only a few seconds long.

Even though Bloomberg reported on it all back in April, and I'm fairly sure it's been talked about off and on for over a decade, The VRT and now The Guardian's pieces really caught fire. Especially the latter.

Maybe because, unlike the others, it front-loaded the part about sex, crime, and business. Or maybe just because it came out after Apple started putting big privacy billboards up in Las Vegas, Toronto, and Hamburg.

So, while some say Apple is being held to a higher or different standard here, it's not by anyone other than Apple themselves.

At issue is whether Amazon, Google, and Apple properly disclose the process — in other words, explicitly say other humans are part of the process — whether they effectively allow you to opt-out and not just of the service itself but specifically the human AQ, if you want to, and whether or not that should be a specific opt-in instead.

That's on top of the larger arguments about whether or not security white papers and privacy policies or terms of service agreements are even human discoverable and legible to begin with, and at the opposite extreme, whether the concept of privacy in the digital age is viable or beneficial.

And, even more broadly and, I'd argue, more importantly, we're all still only at the beginning of this debate.

It's about location and behavioral and voice and video data capture and analysis right now but, soon enough, the entire world — including all of us — are going to be constantly ingested, all the time, by a wide range of sensors for AR, VR, and autonomous technologies.

It won't be very different from living on the Grid or in the Matrix, and if we don't figure out how to handle personal privacy now it's going to be even more problematic with everything coming next.

To help me sort through all of this, I have voice-first expert Brian Roemmele on the line. Hit play on the video above to watch our discussion.

Apple is putting up billboards, literal billboards, saying how seriously they take our privacy, yet look at where we are with these Siri stories, both from back in April and just now, this weekend.

Facebook and Google say they're making their products much more private but so far they only seem to mean private from developers — developers who compete with them on their own platforms. And ongoing investigations and Facebook's recent $5 billion fine make exactly zero dent in their policies or with their investors.

Some find this unacceptable and demand changes and penalties severe enough to compel changes. Other find the very concept of privacy in the data age ludicrous, even constraining.

So, we've summarized the articles, the accusations, the concerns, the responses, and the dismissals, and we've talked about why it's currently done this way and how it could be done better in the future.

Now I want to hear from you. What do you think about all of this, especially privacy, the right or ridiculousness of it, now, today?

○ Video: YouTube
○ Podcast: Apple | Overcast | Pocket Casts | RSS
○ Column: iMore | RSS
○ Social: Twitter | Instagram

Rene Ritchie
Contributor

Rene Ritchie is one of the most respected Apple analysts in the business, reaching a combined audience of over 40 million readers a month. His YouTube channel, Vector, has over 90 thousand subscribers and 14 million views and his podcasts, including Debug, have been downloaded over 20 million times. He also regularly co-hosts MacBreak Weekly for the TWiT network and co-hosted CES Live! and Talk Mobile. Based in Montreal, Rene is a former director of product marketing, web developer, and graphic designer. He's authored several books and appeared on numerous television and radio segments to discuss Apple and the technology industry. When not working, he likes to cook, grapple, and spend time with his friends and family.

37 Comments
  • I am really angry.
    I missed the article in April, and so this was news to me. I feel incredibly betrayed by Apple.
    Maybe it was naive of me to take them at their word, but they *should* be held to a higher standard if they keep pushing "iPhone, that's Privacy."
    I don't care it's not connected to my Apple ID!! If you've got my voice, my location data, what app I'm using, and my contact info — then you've got me! That's not privacy.
    To be honest I'm a little upset here that you seem to be equivocating, like "should we expect privacy at all, Amazon and Google do it." I'm upset the top story on iMore is still about the Intel acquisition.
    Since when is the Amazon and Google model acceptable?? That's why I bought Apple products!! So that I wouldn't have anyone listening in. What's the big difference now, why should I refuse an Echo in my home if my HomePod is also spying on me?
    I turned off Siri on my phone when I read this. I'm suggesting to my friends and family to do the same.
    Shame on Apple.
  • I think listening to recordings is necessary for improving the voice assistant, it just needs to be done in the more secure and private way possible, which is difficult because the very nature of listening to someone’s request and voice is already is telling you things about that person (or _a_ person, as the person should be anonymous aside from their voice). I know Alexa saves recordings even before you activate the trigger work, whereas Siri only saves recordings when you have activated it (e.g. by using the Siri button or using “Hey Siri”). Could Apple do this in a way without listening to recordings? Maybe. Or maybe they could allow people to “opt-in” for helping improve the service via recordings, rather than just taking them from anyone
  • "I know Alexa saves recordings even before you activate the trigger work, whereas Siri only saves recordings when you have activated it.." How so? Clearly each AI system must listen all the time to capture the trigger word. That can, and should, be done on device. That's one reason the trigger words are fixed and distinctive. It also makes sense that what caused the trigger would be buffered and sent with the request. It seems reasonable to me to do that so that false triggers can be analyzed. Of the three system, it appears that Echo is the only one that provides me an optional audible indication that 'she' was triggered. I 'know' when Alexa is listening, whether I intended for her to or not. She beeps. She does seem to be the one 'falsely' triggered the most as TV and radio personalities are constantly using Alexa in stories. False trigger rejection does seem to be getting better. Both Siri and Cortana require me to be looking at their boxes to see if they are listening. If I wasn't speaking to them, I likely am not looking their way. More likely those systems would be listening unintentionally, without my knowledge, resulting in more unintended sound bites getting to their servers.
  • To my knowledge, the sound bites that Apple take from Siri, are from when the user activates Siri, whereas Alexa’s sound bites can be from before that. Of course, both have the microphone recording at all times to listen to the trigger word, but the device knows the difference between when its listening to give a response or listening to hear the trigger word, which can then be used to get the correct sound bites
  • That’s to your knowledge. You might want to have a read of this.....
    https://www.theguardian.com/technology/2019/jul/26/apple-contractors-reg... Before you say it’s not Apple, they have the oversight and should be policing it.
    If it was a small sales partner not abiding by the rules Apple would cut them dead.
  • The Guardian has a very bad reputation, so I can't take that article seriously
  • If you are dumb enough to put an internet-connected box in your house that listens to everything that goes on in the house, then you deserve whatever happens. Of COURSE it’s all being recorded. Of COURSE people will be listening to it. How GD naive are you? Why don’t you leave all your doors and windows open with all the lights in the house on, THEN whine about “privacy”? Sounds pretty stupid, right? That’s exactly how stupid you sound after installing listening devices in your house.
  • Do you have a phone? If you do, then you also have a listening device. If whomever wanted to listen to you, they could easily do it.
  • Well, I imagine that you also walk all day with an internet-connected little box that listens to you and knows where you are. Does that make you stupid?
  • A RR piece advocating weather or not we should expect privacy any more in an attempt to defend apple. Didn’t see that coming.
  • It’s really just about why voice assistants take recordings so they can be improved. If you’re really concerned, turn Siri off
  • It’s pretty simple: Apple can be trusted, the others cannot. This is well documented. Time to move on. Next story, please.
  • How do you get that? They got caught doing exactly the same thing Amazon and Google are doing.
  • VAVA, that does not matter to the crazed in here. Apple gives absolutely no ***** what so ever to their users privacy just like amazon and google. It's just they turn on the reality distortion field and bam, the flock comes running to defend. I use apple, but I am not stupid enough to think they are any more concerned with privacy. Remember, it was not android phones that had millions of so called "private" photos stolen off them. Its siri "engineers" listening in as well. activated or NOT it does not matter. So, the people ******** that the apple engineers caught on audio, they wanted to search something up in the heat of passion? hmmmm. sure!
  • I think @emjayess was simply being sarcastic. Maybe he should have added a '/s' at the end of it. Seemed pretty obvious to me at least.
  • “Remember, it was not android phones that had millions of so called "private" photos stolen off them” Neither did the iPhone, what are you talking about? “Its siri "engineers" listening in as well. activated or NOT it does not matter” How do you know it doesn’t matter whether it’s activated or not? You’re just assuming the worst. I’m sure Apple wants to get information from customers, like the rest of companies, it’s very valuable information, it’s just how you go about doing it, and also being transparent as well which is something Apple could work on
  • Are you really that far up apples ***? The entire fappening happening was from celebs iPhones dummy.
  • https://en.wikipedia.org/wiki/ICloud_leaks_of_celebrity_photos
    The images were initially believed to have been obtained via a breach of Apple's cloud services suite iCloud,[1][2][3] or a security issue in the iCloud API which allowed them to make unlimited attempts at guessing victims' passwords.[4][5] However, access was later revealed to have been gained via spear phishing attacks.[6][7] Nothing to do with iPhones, nothing to do with iCloud, nothing to do with Apple
  • Curious to hear how a breach of the cloud services or API has nothing to do with those three things.
  • Spear phishing attacks can happen with any service regardless of security. Kojackjku was trying to say Apple’s service is weaker than others, so it’s nothing to do with Apple
  • Let's stop with the comparisons to Google and Amazon. Apple are held to a different standard because of what they promise in their marketing. They use terms like "differential privacy" and phrases like "what happens on your iPhone stays on your iPhone", so we need to judge them with that, and (if the whistleblower's claims are true) they've clearly failed here. I disabled Siri (and dictation, along with certain location and analytics settings) pretty early on in my iOS life, partly because Siri was a halfwit, but mostly for privacy reasons (Siri queries don't respect the On-Demand mode in your VPN settings unless your iPhone is in full Supervised Mode or MDM-locked, in case you didn't know), but the average Joe/Jane on the street doesn't know privacy basics, nor do they know realize how EULA/TOS documentation is deliberately obfuscated with mountains of legalese jargon in order to prevent them from understanding that whatever rights they think they might have are explicitly ceded in exchange for access to the software that they are licensing. They buy expensive iDevices because of what they're promised in Apple's marketing. I'm complaining on behalf of these folk, not myself. I hope this gets blown out of proportion, just like the battery issue last year, because that's the only way Apple will improve their process. Apple, like most other big tech companies, fixes their sh*t based on media coverage, not human decency.
  • I love to complain about Amazon and Google spying on us through their "smart devices." But in this context, I don't see what the problem is as long as the audio files are anonymized, and dumped after their initial use. Seriously, do people imagine why artificial intelligence isn't ready to work unassisted by humans? Seriously, I don't think they'll be able to do that for decades. Human brains may be slower, but their bandwidth is much wider.
  • Is this really a big deal? Not really. When you let in your home/office/school etc services that rely on things learning about you to give you more tailored results a certain level of privacy should be expected to be given up. The more accurate and tailored the results the more privacy is given up. For me it's all about tradeoffs and what is done with the information I give them. If they use that data about me to give me results that are relevant to me then I'm ok with it. If however they start selling that information (or something else I haven't given permission to do with) then that's where the problem lies. I know Apple, Alphabet & Amazon collect data on me and I expect that. For me the services I use from them I consider to be a good value for the "privacy" I give them.
  • Easy enough to turn off Siri. And that's the way its going to stay until Apple puts in an opt-out option. Even if its only one percent of the Siri users out there who are being recorded this way, that's still millions of people. Sooner or later something bad is going to happen with that data.
  • But doesn’t the data get destroyed after so long? Most businesses store personal data, but it’s about consent, transparency, and destroying the data as soon as it’s not needed anymore
  • Deflect and defend as always dannyjjk. **** you would know if you were getting paid by Apple to be their shill.
  • Well unless you can prove they’re not deleting data, but GDPR is pretty strict
  • Exactly and has utterly failed in being transparent here.
  • We need GDPR for the US, we are way past time. There is no way to hold corporations accountable for our privacy until it is the law. Until that time companies will not be transparent with what they do with our data like Apple has been.
  • It’s amazing how suddenly, privacy hardliner Rene wants to sit down and have a thoughtful, more nuanced discussion about technology and privacy. I wonder what caused him to soften a bit? Hmmmmmmmmmmmmm...
  • Really! Once apple is found out not to be the be all end all of privacy they claim to be, Danny and Rene "soften up" and defend the fruit to the end. Apple is NO BETTER than any other company. Just they have the Reality Distortion Field up!
  • How do you know they’re no better than any other company? You keep repeating things without any evidence
  • Um...?? They literally got caught...you know what? Forget it.
  • Yeah, forget it because you don’t understand
  • Let’s put down the marketing talk for a minute and let’s consider a situation: Two companies each have a recording of your voice. Company X tells you that all Assistant processing is done on device and tells you whatever you do on your device stays on your device, though Assistant doesn’t work in airplane mode. There is no record of what they have recorded in your account that’s available for you to access for yourself. You have no idea what they have recorded of you. The only way to suspend recording is to suspend your use of the service. They are a closed, opaque and secretive company in general. Company Y tells you that they process the data associated with your account on their servers. They give you a complete ledger of what’s been recorded for you to audit for yourself and it’s available to you 24/7 from practically any device in the world. They give you opt in/out choice at will, so if you want to suspend voice records at any time, you can. Their software is open sourced and they publish multiple white papers about their software activities that are available to anyone in the world at anytime. But Company X is on the "right side" of the industry and consumer privacy protections?
  • It's crazy. Was thinking about going back to ios but I think this has validated my choice to move off of it.
  • But move to where? Cortana, Alexa, Google Assistant and Siri all use voice recordings, although actually Apple has stopped doing that now, so you’d technically be best staying with Apple, unless you want to use some obscure voice assistant or make your own