Why you shouldn’t use Diaspora if you care about privacy
The social network problem
Social networks like Facebook and Google+ have always been known as huge data mining machines and that they don’t have very strict privacy policies, meaning that:
- you are not informed what happens with your data (what it is used for) when you enter it,
- you don’t have full control over your data (deletion is very hard to impossible, you can’t rely that “deleted” data is really erased etc.),
- data may be given to third parties (like application providers) or wrong people without your (explicit) consent.
“Don’t use social networks” is not a solution in my opinion because social networks are media like any other media, and they have advantages (that’s why they shall – and will – be used) and dangers (that’s what should be minimized).
Diaspora – a solution?
So, I was very happy when I heard the first announcement of Diaspora*. It is an open-source social networking software with the aim to give you full control over your data.
Being excited about this, I recently set up my own pod (so Diaspora nodes are called, as its architecture is decentralized) and began playing around. However, after a few experiments I found out that Diaspora provides way less privacy and data safety than even Facebook. Let me explain why:
Diaspora’s message system basically works like email: as soon as you publish something (not for the public but some of your contacts / so-called aspects), all people that you are sharing with receive your update. That means, that your friends don’t see your status updates for the time before they have been added.
The next relevant thing is Diaspora’s federation protocol. Diaspora users can be on different pods, and these pods are able to talk to each other and exchange messages. So, if you have the Diaspora ID firstname.lastname@example.org and share some status update with email@example.com, the status update is sent to server2.org via the federation protocol. In my opinion, this behaviour causes severe problems:
From a user’s perspective, your data (and it may contain personal, private and sensitive data) is sent to other servers without your consent. When you register at a node, you know that the node operators have access to the data and you have to trust them. However, you don’t know that the data will be sent to other servers. Diaspora pods will send your data to servers you don’t know without your explicit consent. (I think that operating a Diaspora* node in the current version without the explicit agreement of your pod users that their data is sent to other servers in other countries may even be a legal problem. Just think about the worst-case scenario that someone you like spied out some sensitive information about you and uses it against you – you can’t even know where (s)he has got the information from. In case of Facebook/Google+, you know who the operator is.)
You may argue that email works the same way and nobody cares about privacy when emailing. However, this isn’t true:
- Normally, mail servers store the message only until it is delivered to the recipient. Of course, that’s not the case when IMAP or Webmail is used, but even then mails normally aren’t stored forever.
- There’s no illusion that you can delete an already sent email. You can’t draw it back and you know that. In contrast, you can delete your status updates and your profile in Diaspora*.
- Email is only about messages and not about profile and other information.
- Emails may have more than one recipient, but most emails are sent to a limited number of recipients / recipient servers. In a social network, you will share your status updates with dozens or hundreds of friends, so the impact of data distribution for your privacy is much higher.
- Last but not least, there are people who care about privacy when e-mailing and use encryption etc.
If you have friends on different pods, your data will become scattered over all these servers. You don’t know which status updates are on which servers and after some time, you won’t even know which servers know something about you and what they know. So, after some time you will lose every control over your already posted data.
In case that you delete some of your content or even your whole profile, you don’t know if the data are deleted from every server. For instance, I tried this scenario:
- Run pod A with ID user1@podA and pod B with ID user2@podB.
- Let user1 and user2 add and share with each other.
- For some reason, pod B goes down.
- user1 decides to delete her/his profile. pod A sends the deletion message to pod B, but pod B is down so the message can’t be delivered.
- Pod B goes up again.
- Now user1 believes her/his data is deleted, because the profile is gone. In reality, data remain stored on pod B and can be read by the operators (or hackers who compromise pod B) and maybe won’t be deleted ever.
Even if there would be some kind of eventual consistency, this wouldn’t be a trustworthy architecture: If you have hundreds of friends on dozens of pods, you don’t know if there isn’t a pod operator who disables this consistency feature or who is unable to secure his/her server against hackers or who mis-uses the collected data.
So, you don’t know if deleted data is really removed from all servers. In fact, some copies will stay scattered on servers you don’t know.
Also, you can’t defend yourself against misusing pod operators (neither practically nor legally).
Suggestions for improvement
I think the only way to ensure an acceptable level of data and privacy protection is that any user data must only be stored on the node where the user has registered. Other nodes should have to fetch this data (after being notified, for example). For performance reasons, caching could be allowed, but all data transfers should have explicit definitions of what the fetching node is allowed to do with this data and for how long. This provides no protection against mis-using operators, but
- then misuse is clearly against the law (making the misusing person accountable)
- when there is no illegal misuse by operators, your data will be under your control because when you delete something or change the permissions, the respective nodes are not allowed to fetch the data anymore (meaning that the data will be completely removed after the will-defined cache period has expired),
- these well-defined permissions for data usage and transfer can be shown to and edited by the users (somehow like users can show app permssions on smartphones), improving knowledge and carefulness.
Encryption would also be a good idea, but this is unrelated to the protocol discussed above.