/ Privacy

Thoughts on App Developers Reading User Emails

Yesterday, The Wall Street Journal reported that some email apps have quietly been reading users’ emails using APIs for services like Google’s Gmail and Microsoft’s Outlook.com. Users of those apps, unsurprisingly, have generally reacted with concern and outrage.

I’m not a user of any of the apps mentioned, but as someone who cares deeply about digital privacy and wanted some insight as to what led us to this point, where developers feel fine reading user emails.

How Email Works

For some context here, we first need to understand how emails have traversed the series of tubes for the past three decades and change. Emails mostly get from point A to point(s) B using three protocols: IMAP, POP, and SMTP. If you’ve manually set up an email account or dug around your email software’s configuration, you’ll probably recognize those initials.

Basically, those protocols tell servers how to send or receive your message. They do so, at a fundamental level, by establishing a direct connection between your device and the server. While whoever controls that server—Google, Microsoft, Yahoo!, your employer, etc.—can see what messages are stored on the server as they’re stored unencrypted, the only parties involved when you send or receive a message are you and whatever server you’re connected to.

But lately, there’s been a little change in how email works. Services like Gmail and Outlook.com are opening up APIs for developers to access users’ messages. Rather than using the three protocols directly, developers place requests to the API for the email service which in turn handles the request using the three protocols.

The apps do this by registering with the service in question. An app that wants to integrate with Gmail’s API registers with Google, for example, and then when you sign in, you’ll see a browser page asking if you want to grant access to the app. When the app wants to make some action relating to your account, it obtains a token from the service that authorizes it to do whatever it needs to do.

But unfortunately, this route opens a door for the developer, or potentially someone with more malicious intent, to take these tokens and use them for purposes you don’t know about. Tokens usually expire after a matter of minutes, but they can just be regenerated as needed. Ideally, a user could trust a developer to access their emails only in response to requests in the software.

When in Doubt, Opt Out

There’s nothing that’s inherently more or less secure about accessing email through the three protocols compared to the more modern method of using an API. Your messages are still (normally) encrypted in transit and unencrypted wherever they’re stored. Your account is about equally as likely to be compromised either way.

But it does reveal an issue with the API-based method. Using the three protocols, third-party apps have no realistic way to access your email inbox except from devices where you’re signed in. They could send your password to a central server somewhere, but that’s a horrendously bad idea. With an API, though, a third-party app can access your messages as long as it has a valid token, and you don’t have to do a thing for a token to be obtained.

In the end, using the API-based method is a lot like giving an acquaintance (the developer) your email account information and trusting them not to go nosing around. Sadly, nosing around is exactly what Edison did.

To test this out, I downloaded Edison tonight, before they’d have a chance to release an app update. When the app was installed, Edison Mail greeted me with “By continuing you agree to accept our terms of service and privacy policy.” Links are given to those documents. Below that, a single “Accept” button.

Then, you’re able to add, among other services, a Gmail account. This presents a screen with the following:

Email wants to access your Google Account

This will allow Email to:

  • Read, send, delete, and manage your email
  • Manage your contacts

Below that, two buttons: Allow and Cancel. Now, there’s no usage description given for exactly how Edison will use its access to your email. My assumption is that the ability to “read” emails applies to the app itself, that the app inherently has to have read access to my emails to show me my emails. After all, any app accessing the Gmail API gets that same description. Surely they’re not able to actually…read my emails…right? Right?!

By default, Edison collected users’ email messages for use in training a machine learning model for their Smart Replies feature. No option is immediately available to avoid this. This behavior, to be clear, is mentioned in their privacy policy, but we also know that most people don’t read privacy policies.

To disable this data collection, a user has to find the Settings link in the app’s hamburger menu, scroll all the way to the bottom to find a “Manage Privacy” link. Then, at the bottom of that screen, you’re finally able to opt out of “data sharing.”

But wait, there’s more! You’re then pestered to make sure that you really want to turn off data sharing. It’s never really communicated that actual humans other than you are able to access messages in your account, either:

We use data shared with us (that does not identify you) to invent new app features and create research about national purchase trends. We never share your emails, or any data that can be used to track you personally for advertising.

You can opt-out of data sharing at any time. Keep in mind, our data practices allow us to offer you [...]

For some reason, the app cuts the text off there, but I assume they’re begging users to keep data sharing on because their exploitation of the data that they get allows them to make a “free” app. Oh, and the Cancel button is bolder than the button that actually allows you to opt out of data sharing.

(If you choose to opt back into data sharing, you’re unsurprisingly not pestered one bit.)

Also, remarkably, only here would a user who had not read the privacy policy (so basically, like, everyone) find out here that their emails are being processed to identify “national purchase trends” and train machine learning models. Surprise!

Edison in Denial

Soon after this story broke, I decided to engage Edison on Twitter about this, just to get some insight—as a fellow developer, if nothing else—on where they were coming from with these decisions that, on the surface, seem appalling.

Their responses didn’t help a bit. Here’s what they said, tweet-by-tweet:

Most companies creating AI-based features like our Smart Reply use real data. We always strive to do the right thing with our users. Our employees will no longer read sample de-identified emails for creating new AI features.

This is true! It’s next to impossible to train a machine learning model without real data. But why did that data have to come from your users, and why were they not explicitly informed from the outset that their messages were being used for this purpose? And guess what, the last sentence is basically meaningless because, now that that machine learning model is trained, they don’t need to read any more emails.

Instead we’ll offer an opt-in for explicit consent in the app. We had shared that update with the WSJ but they decided not to include it in their story. It is worth noting that our Smart Reply feature runs only on the handset, not through our network.

So you’re doing what you should’ve done from day one. Here’s a gold star. The Wall Street Journal wrote an article about the mistake you made and didn’t include what you were doing to save face? Tragic. And cool, but emails are still leaving the device and falling into hands that users may not know about.

We have also always offered users the ability to easily opt-out of data sharing with no degradation to the app experience. We also never target users for ads, prevent other companies from tracking them while they are in our app, and always anonymize any user info.

Funny story. An Edison user replied to this thread with a screenshot of a support email exchange he had with Edison from April of this year, with Edison’s support staff saying that opting out would result in a degraded user experience. So that argument’s in the trash.

Now, here’s where the argument went off the rails. I pointed out that these responses actually created more questions than they answered, so I posed some directly:

  • How many users knew that Edison was quietly accessing their messages?
  • If they didn’t know, why not?
  • Was making this opt-out rather than opt-in an oversight or a deliberate choice?

And their response:

We are an email app so all users know we have access. We use that access to provide our services and to build new features. We communicate that clearly and users have always had to give us explicit permission. What we described above is a second reconfirmation of that permission.

As I alluded to in the How Email Works section, all users don’t know that the app has a back door into their emails. That isn’t how email apps have traditionally worked—instead, most software just facilitates direct connections between you and mail servers. It’s these modern APIs into services like Gmail that allow developers to quietly read users’ emails.

Ultimately, this PR doublespeak-laden response is sorely lacking. When you authorize Edison Mail to access your Gmail account, it says the app can read and send emails. It doesn’t explicitly say that company employees might also be reading your emails, and at no point in the signup process is this communicated to the users.

And given that Edison has faced backlash from its users, regardless of how clearly they think they’ve communicated this behavior, they failed to do it clearly enough. Where they did describe it, they didn’t describe the extent. At no point did they actually say that an actual human being might be reading users’ emails.

Users deserve to know as much.

It’s Their Trust, Stupid!

In the United States, if you open someone else’s physical mail without their permission, you’ve committed a federal crime punishable by up to five years in prison. In many ways, Edison did the electronic equivalent, all the while failing to honestly tell users what they were up to. Exactly what they were up to.

Edison claims that they removed identifiable information from user emails before making them visible to employees. I don’t buy it. Why? Because it’s really hard to programmatically remove all identifiable information from emails.

I assume that their removal of “identifiable information” was limited to stripping email headers, meaning that they’d be unable to see the email addresses of who sent and received the message. Unfortunately, though, it’s not nearly that simple to remove all identifiable information from a message.

As an example, let’s imagine some newsletter that you’re subscribed to which provides an unsubscribe link at the bottom. For your convenience, it automatically fills your email in the unsubscribe form. That email address has to come from somewhere, and chances are the unsubscribe link contains your email address so that the autofill can occur.

Even though the email headers were stripped from the message, and the message is otherwise something that was sent out to hundreds or thousands of people, that link contains information that can directly identify you. That’s not all, though. Sensitive or identifiable information of all sorts changes hands via email millions of times daily. ID numbers, appointment notifications, purchase receipts, and more.

For another example, just look at an iTunes email receipt. Your full name, billing address, and itemized receipt are there for the taking. Are we to seriously believe that Edison found some magic code that automatically detects and removes identifiable information from messages? Or were they using humans for that, too? In reality, the messages they were reading almost certainly still contained troves of identifiable information.

I get that I’m stricter about digital privacy than many, but it’s just because I’d rather not have dozens of strangers poring over—or able to pore over—my daily life. Decades ago, we didn’t hand our grocery shopping receipts back to the companies so that they could track “national purchase trends,” so why start now? What happens with our data is stunning, and most users unfortunately don’t have a clue.

They’re starting to find out, though. A user’s trust is more important now than ever, and companies like Facebook and Edison continue to abuse it. While I’ve yet to see it, other companies’ abuse of trust will inevitably impact users’ ability to trust anyone with their data, even if they’re not doing anything shady.

And maybe, especially in the aftermath of the Cambridge Analytica scandal, it’s time for developers to have a serious talk about the ethics of wanton data collection. I know where I stand. Can developers like Edison say the same?