Data Accumulation Prevention

We’ve got it all wrong

Over the past few weeks I’ve spent quite a bit of time on Data Loss Prevention (DLP) in Microsoft 365, improving our DLP policies to stop specific data types from being lost or transferred somewhere they shouldn’t. Information Security professionals spend a lot of time and money on controls like this to prevent the loss or disclosure of data.

This is all perfectly reasonable, because much of that data is personal data, yours and mine, and we would be impacted by it’s disclosure. However, I wonder what would happen if organisations applied as much time and money on the ultimate preventative measure, which is not gathering that data in the first place.

Data is not an asset, it’s a liability

Many years ago I attended an information security conference at the Murray Edwards College in Cambridge, where Lauri Love (a computer scientist, hacker and activist) made what seemed at the time to be a revolutionary point. “Data is not an asset” he said, “it’s a a liability”. His point was, if you absolutely have to gather personal data, treat it like toxic waste; put it in secure containers, and get rid of it as soon as you can.

At the time it seemed like an extreme position, but (like many others since), he was challenging the simplistic view that personal data = asset, an idea which is still prevalent today. Even in the Data Protection legislation designed to protect our personal data, the term used is ‘Information Asset’, with Information Asset Registers and Information Asset Owners. An asset is something of value, but that term can obscure the other aspect of that data, which is the obligations it places on the organisation storing it.

Do you really need it?

Last week I was enquiring about the availability of a short term let, and was very surprised by the quantity of information requested by the agent before they would even confirm availability (let alone arrange a viewing). They wanted name, email address, mobile phone number (so far so good), my job title (a bit much) and my gross annual income (I’m sorry what now?)

I challenged this, and pointed out that they only needed to know whether I could afford to pay the rent, not how much money I earned. They eventually concurred, and explained they had to take this approach as people frequently did viewings but then turned out to be unable to afford the monthly rent on the property. Personally, I doubt the ‘frequency’ of this behaviour, but regardless I suggested some alternate ways of answering this question without gathering such sensitive personal information.

This simplistic approach to gathering data is far too common, and is ingrained in many organisations (both public and private sector), and the problem is that it can get you into trouble.

The consequences of data accumulation

If you asked most organisations what the downsides of gathering lots of personal data are they’d probably mutter something about GDPR, the difficulty of maintaining an accurate information asset register, or the time spent on subject access requests. Granted, these can be challenging and time consuming activities , but this response indicates they may be missing the point. If those activities are difficult, is it because;

  1. the legislation is overbearing, or
  2. your organisation has only a limited grasp on what personal information you’ve gathered, where it is, when you obtained it and why?

I think you know which one it is.

That lack of insight or control over the personal data your organisation has accumulated over the year is now creating costs, and not just the cost of responding to SARs.

  • It can increase the cost of storing the data; backing it up and encrypting it.
  • It can increase the cost of processing it, because unmanaged personal data is difficult to extract value from.
  • If you’re trying to build a new information system which will process it, personal data accumulated over long periods is often unstructured or poorly sanitised, causing delays in the project whilst those issues are worked around.

There’s also the ever present threat of an information security breach.

  • Small scale breaches are often the consequence of personal data accumulated in the wrong place without good reason.
  • The financial and reputational cost of large scale breaches is amplified when it affects far more individuals than was necessary because you’ve retained data for far too long without purpose.

if you absolutely have to gather data, treat it like toxic waste

Time to clean up

The principle of Data Minimisation has been enshrined in the Data Protection Act 2018 since it’s inception, and whilst it may not have received the attention it deserves (possibly being filed under ‘too difficult’) I suspect it’s only a matter of time before an ICO judgement relating to a data breach refers to a failure by the controller to minimise data as a contributory factor in their decision (and potentially financial penalties). It may look like too big a job, compared to say developing an Information Asset Register or training your staff, but if you don’t make any effort to minimise data you run the risk of significant fines and reputational damage.

Coming back to the start of this post, this difficult task is where the tools in Microsoft 365 and Azure can come in useful, because you can use them to discover, identify and classify personal data, which then makes it far easier to begin the process of review and disposal. Think of those tools as the digital equivalent of Curtis Dowling, Marianne Cammack and Joanna Riley from Hoarder SOS, come to help you clean house to the point where you can finally see the couch again!

Correction: In my original post I’d incorrectly flagged the Data Minimisation Code coming into force in September 2021 as a reason to address minimisation urgently . That was incorrect, because as part of the ICO Age appropriate design: a code of practice for online services this only applies if your service is designed for use by children.

Leave a comment