Account Deletion / Data Retention Policy

We have recently received our first account deletion request. This means we now need to iron out the specifics of how we should handle the data we store. @Salt and I discussed this in the team meeting on 2022-08-26. Here’s what we came up with:

Data is split between two silos: The database on the main website, and the info here in the Discourse forum. They share a login and are linked together (by id), but are otherwise separate systems.

Main Website

The only personal data we store is the user’s email address and password hash.
We’ll obviously delete those.

Other data we store, and what we’ll do with it:

  • Date of account creation and last password reset → Keep it, so we know some account existed
  • Current pledge information:
    • Outstanding balance → Must be $0 already in order to allow account deletion. Leave it in place indefinitely.
    • Payment processor token (it one exists) → remove it*, same as if the user had removed it through the site (a record of the removal remains, see below)
    • “Pledged since” (only exists if they are currently pledged) → If this exists, un-pledge them* and remove it.
  • Pledge information history:
    • Dates of pledges/unpledges to the snowdrift project → Keep indefinitely for our records
    • Dates and amounts of monthly donations (adding to balance) → Keep indefinitely for our records
    • Dates and amounts of payments → Keep indefinitely for our records
    • Dates of adding a payment processor (Stripe) token (includes a copy of the token) → For now: Keep indefinitely. But we should probably not do this. Needs research, see below.**

*Removing current pledge / payment information

Right now there’s no easy way for us to do these things on the backend, so the workaround is to either need to ask the user to do it or reset the user’s password (to something we know) so we can log in to their account and do it ourselves (let the user know to ignore the reset email first).

**Retaining payment processor token history

To us, a payment processor (Stripe) token is just a random string of characters. However, Stripe associates it with a credit card, so it could potentially be used to de-anonymize the account if someone got ahold of Stripe’s data to cross-reference. So, if we don’t need to hold on to this for auditability / legal reasons, we should probably get rid of it after then. However, I don’t know what the requirements in this area are, so for now we keep it indefinitely, just in case. Legal/accounting advice would be helpful here.

Technically, this concern also applies to charge history, which you could probably cross-reference with Stripe’s charge history (amount + timing + it coming from Snowdrift is probably enough to identify the card). However, I suspect there’s no reasonable way for us to aggregate this data and still have sufficiently detailed numbers for our own records (especially for auditability). Maybe after a very long time has passed. Again, accounting/legal advice would be helpful.


Discourse offers two options for accounts: anonymize and delete

With Anonymize, posts stay visible and anyone can see that they were made by the same user, but any other information associated with the account itself is wiped. For example, here’s an account that was anonymized: Profile - anon73864480 - Community Discussions . Because posts stay visible, a member of the public could de-anonymize the account, if any of the posts contains personal information (which many people include in their #welcome post).

With Delete, posts are dis-associated with the account and are “deleted” as if you clicked the delete button at the bottom, which really means they become visible only to moderators. This would allow mods only to de-anonymize specific posts which contained personal info, but not to link them with any other post.

Both options dis-associate the forum account with the main website account, so it cannot be used to re-link the two.

This is kind of an unfortunate situation. Our preference would be:

  • Keep posts visible. Many of them have valuable content, and deleting them makes old discussions hard to follow. Plus, this is the internet; the content is likely already archived somewhere else, and not truly private any more, anyway.
  • Don’t make it possible to see which posts came from the same (now-“anonymized”) user. Just because the info may be out there, doesn’t mean we should make it easy to find.

For now, I think the best compromise would be using “anonymize”, plus manually deleting/hiding any intro posts in #welcome which would make it trivial to de-anonymize the account. For anyone who requests a deletion, we’ll let them know that’s what we plan to do by default, but delete their account/posts instead upon request.

Perhaps we could put in a feature request with Discourse for stronger anonymization. Although, now that I’m thinking about it, Discourse does have a feature to merge accounts. As a workaround, we could merge all the anonymized accounts into one, which would effectively dis-associate them from the account while leaving them visible. The downside here is that with an anonymized account, we have the option to delete all posts from the account later, if needed. Once we’ve merged them, that’s no longer possible. So a discourse option of “weakly anonymized to the forum staff; strongly anonymized to the public” would still be useful.

We currently receive deletion requests by email. Unfortunately, it’s possible to spoof who the email is from, so we’ll need to reply to their email and get confirmation that the request was legit before proceeding with deletion, as outlined above.

1 Appreciation

Next steps: incorporate this into our privacy policy (or make it into a new “Data retention policy” which we should link from the privacy policy). In either case, this is in line with the current policy, it just has more detail that is currently missing from the policy.

1 Appreciation

Some thoughts: I’d argue that having anonymized posts appear to come from separate anonymous users, when they in fact were the same person who said something previously in the conversation, also makes historical conversations hard to follow.

The connection between posts from the same user is often a critical piece of context needed for successfully parsing posts - in English at least.

So, the default anonymize option makes the most sense to me - this is the way people are used to on e.g. Reddit, an established paradigm going back years. We can of course still have the forum display name be “Deleted User” or such - as long as the auto-generated avatar makes the deleted users distinct.

Also, perhaps we should invite the user wishing to be deleted to proactively delete any posts (such as the Welcome post) on their own before deleting their account. Not every user who wants to get rid of their account also wants to be forgotten/disassociated with their history - sometimes they just don’t want an account anymore. We can of course take manual post deletion steps as well, but this way they are reminded/empowered to make their own contributions anonymous to the extent they care about.

Yeah, there’s definitely a trade-off between privacy (for the person who has deleted their account) and usefulness. I think your conclusion is consistent with mine:

It resets them to the default avatar, so they will all be the same. The name will be different, though. E.g. @anon73864480