In the data-driven marketing sector, linking data sources and centralizing data marks an important step towards making optimal use of collected data and providing the best possible service to customers. But according to Springbok’s Jimmy de Vreede, a new wind is blowing. A more open approach with decentralised data is on the horizon. In this article, de Vreede looks ahead to the future and shares several practical applications of decentralised data.
Jimmy de Vreede is Data Director at Springbok and a member of the DDMA Data, Decisions & Engagement Committee. This article previously appeared on ddma.nl.
In 2014, Facebook, now Meta, bought WhatsApp for $19 billion. At the time, it was a messaging app with 450 million active users that hardly made any money. Although Meta’s new purchase landed it an interesting user base and eliminated a competitor, the real reason was clear: data. At the time of the acquisition, Mark Zuckerberg promised not to do anything with the data, before reneging on his promise in 2016 and in early 2021 by changing the terms and conditions. After all, centralised data collected from different sources and from more users and apps is incredibly valuable.
Alphabet is pursuing a similar data strategy, Apart from Google, Alphabet also owns Nest (smart cameras, thermostats and smoke detectors), Waymo (self-driving cars), Verily (health and research) and, since this year, Fitbit (fitness trackers), among a host of other companies. Even Google Chromebooks, which are now used at 70% of Dutch elementary school almost free of charge, are part of this data growth strategy. More centralised data from different sources and sectors help build more comprehensive user profiles, which makes them more interesting for advertisers, for instance.
Most companies have now started collecting, centralising and connecting data for a wide range of uses, even though actually valorising data and proving its value is easier said than done. Still, it sounds like an appealing prospect and I admit I also believe that data provides opportunities to build insights and drive more effective decision making.
Data growth cannot be stopped. People produce data, but so does everything around us, from our cars and smart thermostats to children’s toys. This factsheet outlining the European data strategy says that the global data volume will amount to 175 zettabytes by 2025. But where do all those data go, who do they belong to and who has the right to use them?
Currently, it’s fair to say that we easily grant permission and have pretty much relinquished control of our own data by accepting unilateral conditions when visiting websites or using smartwatch apps, for instance. So, what’s wrong with our current practices? After all, we have legislation like GDPR to give us rights to protect against abuse. There is nothing inherently wrong with collecting and using data, provided it is done the right way. Indeed, given the major challenges we face these days, from climate change to inequality, we will need lots of data available to many.
We would, however, do well to take a step back and wonder whether the situation we find ourselves in now is what we want, with data being centralised and data silos being created that can only be accessed and used by a single party, such as the Chinese government or Big Tech (FAANG) in the US. In addition to the damaging effects of concentrating power, keeping control of data, which the GDPR promises us, is actually not feasible (just try to port your data!) and will slow down the successful application of AI, for example. This issue is also highlighted by the WRR in its recent advisory report Opgave AI, explaining that for AI to work, it needs access to vast amounts of data. Today FAANG’s closed data silos and a Chinese data dictatorship have already materialised, but could we find a more democratic, public solution for data that reduces our dependency whilst making our data capabilities more effective?
A new trend has emerged in how we deal with, collect and apply data, with some even heralding it as a paradigm shift. Because it runs counter to today’s standard of centralised, siloed data, it is also referred to as “decentralised data”, but other names include “the new data economy” and “web3”, a term referring to the new “decentralised” phase of the Internet. For now, I will stick to decentralised data because the data is kept close to its source (the owner) and/or is distributed across a network of different locations.
To paint a picture of this new phase of data, here are three examples of interesting initiatives that use decentralised data:
Solid is a product of Inrupt, an organisation that is co-headed by none other than the inventor of the World Wide Web, Tim Berners-Lee. So what is it? Solid is an open source platform that allows users to have and keep full authority and control over data via personal data stores called pods. The personal is decentralised, which means users cannot “lose” data by sharing them and can always revoke a third party’s right to use their data. For example to attract younger users, the BBC has committed itself to a greater degree of content personalisation, for which they are using Solid to connect user data from Netflix, BBC and Spotify for better personalised recommendations. Through the pod, the data remains local, i.e. decentralised, and under the user’s control.
Tim Berners-Lee: “Solid changes the current model where users have to hand over personal data to digital giants in exchange for perceived value. As we’ve all discovered, this hasn’t been in our best interests. Solid is how we evolve the web in order to restore balance – by giving every one of us complete control over data, personal or not, in a revolutionary way”.
Ocean Protocol is an open source protocol that allows individuals and companies to exchange, sell and use data. On the Ocean Market, you can buy a dataset of online (anonymous) click behaviour of 5000 consumers for about 16K€. Although this sounds particularly privacy-sensitive, all these consumers have agreed to sharing their behaviour. The only difference is that they now get tokens, or value, in return. Anyone can participate in this initiative by installing a browser plugin from Swash, the world’s first Data Union. These so-called “data unions” allow users to share and combine their data with others. After all, individual data aren’t very interesting, but a dataset of 5000 individuals is.
Early last year, the European Commission launched a data strategy to stave off fragmentation and ensure that Europe improves its data-driven decision-making capabilities in concert, improving the quality of life of all European citizens. One of the initiatives is the facilitation of so-called shared data spaces and data pools, organised according to European values and guidelines. Another is Gaia-X, a European cloud serving as a counterpart to the American cloud providers, Google, Amazon and Microsoft that we now depend on. It is supposed to be a so-called federated cloud, set up by different member states, with a decentralised, open source infrastructure.
Solid and Ocean are examples of international cooperation and fit within the framework of the European data strategy. Ocean Protocol also contributes to developing use cases for Gaia-X. This is a positive sign, because the question remains whether these initiatives will have the necessary speed and traction to compete with China and the US, who are well ahead of Europe in terms of data and its application.
The decentralised data model is a fascinating development that goes hand in hand with the so-called new phase of the Internet (web3). This is not the metaverse that was recently announced by Meta, but an open, community-driven decentralised Internet, just as Tim Berners-Lee once envisioned it (web1). We are currently in the second, centralised, phase of the Internet with data silos owned by the likes of Google and Facebook. I am very curious to see how web3 and the new data economy will develop in the coming years.