BLOG | OFFICE OF THE CTO

Dirty Data is a Business Risk

Lori MacVittie Miniatura
Lori MacVittie
Published February 03, 2020

As we barrel ahead toward business models based largely on data and our ability to glean actionable insights from it, the issue of data integrity will increasingly impact business and, inevitably, its bottom line.

In the latter half of 2019, controversy erupted in the eSports industry over a video game (Fortnite) player. At the heart of the controversy was his age. You see, while you can play the game at any age, to play competitively in the game one must be 13 years old. Prize winnings are not trivial; cash cups can net winners hundreds of thousands and even millions of dollars, giving talented children millions of reasons to lie.

The problem last year was one of those winners admitted to doing just that. This set off a storm throughout the gaming community as the player was subsequently banned from Twitter and the popular streaming platform, Twitch. This then led to what the Fortnite community dubbed "Age Gate." En masse, young players were forced to verify their age to better enable developer Epic Games to ensure only eligible players competed in tournaments.

The problem is, of course, that if a player lied once, there's really no way to stop them from doing so again. Which is problematic in general because it illustrates the difficulty in verifying data provided by a user.

Data Integrity

The inability to verify the integrity (correctness) of data should be of significant concern to those organizations looking to or moving into the third phase of digital transformation. This phase relies heavily on data. That data will be used not only to conduct business but forms the basis for pattern and behavior recognition. It will power advanced analytics that automatically make operational and business decisions without human intervention.

And it may be tainted.

The business implications of dirty data cannot be understated. This is more than an SQL injection designed to exfiltrate or taint your business data. This is a significantly more subtle type of attack that takes the form of legitimate transactions with the intent to mislead the machines and people making business decisions.

It's a run on product X that leads analytics and business analysts to believe demand is higher than it is. Resources are shifted to ensure supply and subsequently resounds throughout the entire supply and distribution chains.

But then … nothing. There's no real demand, because the demand was created under false pretenses. Orders are suddenly canceled or ignored, because they were placed by bots—not real people. The business lurches to respond, re-targeting resources and trying to deal with the fallout.

Business impacts from dirty data will be real and significant in their impacts to the bottom line.

It is this type of attack—business layer attacks—that will be a priority to combat and prevent in the latter stages of digital transformation. To do that, we'll need to fight fire with fire, as it were, with data, more data, and even more data that helps advanced analytics recognize and reject attacks disguised as legitimate transactions.

This is more than a business problem. This is a data integrity challenge that will require the combined data sets from app services, infrastructure, and apps. Every insertion point that can be instrumented must be instrumented to emit the robust set of telemetry that will enable advanced analytics to thwart business layer attacks. The subtle signals and patterns that can identify an attempt to compromise the integrity of data can only be discovered through intense analysis of large data sets by employing machine learning and harnessing the significant compute power available from the cloud. From identifying bots to recognizing atypical behavior, the ability to differentiate between legitimate business and malicious attack will be critical to business not just succeeding but surviving.

We may never be able to stop kids from lying about their age online—at least not with technology. But we will be able to use the lesson it imparts with respect to dirty data to build better app services and systems capable of providing the proper defense against business layer attacks.