What is Data Classification?
In its simplest form, Data Classification is the process of categorizing datasets (e.g. files, databases, etc.) into logical groupings that makes what’s inside the data instantly understandable and contextual on the surface. In other words, it’s labeling. As a practical example, instead of having to open a file to examine its content and determine there’s sensitive data inside, a classification could be applied to one of the file’s available metadata tags to denote the contents within are “Sensitive” or contains data subject to “HIPAA” compliance.
Sounds Great! So, Why Hasn’t Everybody Implemented a Data Classification System?
Based upon the definition, one could surely think of a slew of reasons why classifying data would be important and beneficial. However, real-world adoption of the practice has been underwhelming, at best. Among other valid reasons, failure to implement a successful Data Classification program has largely been attributed to a refusal to participate in the process by the data creator/owner. Data Classification solutions have traditionally focused on classifying data at the time of creation, thus requiring a fundamental change in business processes, which users typically resist.
Additionally, traditional Data Classification solutions have ignored the troves of files that already exist within the environment (many terabytes in most medium and large businesses). This has created a proverbial “line in the sand” between new and old data, of which new data represents only a very small minority with respect to the overall amount of files.
So to summarize…
Process Change + Minimal Coverage = Non-Starter
Why Should You Classify Your Data?
But that was then, and this is now. Solutions now exist to examine and classify all of that legacy data automatically. There have also been drastic improvements made to the “classification-at-creation” process that make it far less arduous for the user to do what’s right for the organization without changing their behavior significantly. And it’s just in the nick of time, because what organization doesn’t want to do the following?:
- Protect sensitive data from theft
- Move to the cloud
- Adopt Behavioral Analytics solutions that cut down the noise and produce meaningful alerts with context
How is any of this ever going to work if you don’t know what’s inside these files and you don’t provide a means by which other technologies can easily understand as well? How can you protect your most sensitive data if you don’t know where it is? Conversely, how can you move all the “non-critical” data to the cloud if you don’t know which files are indeed “non-critical”? How will you be able to take advantage of the advanced capabilities of a User and Entity Behavior Analytics (UEBA) solution to effectively risk-rank your threats when it has no idea which files put you at risk in the first place?
Everyone agrees that you can’t manage what you don’t know about, so denying the necessity to undertake data discovery and classification is really just denying the successful implementation and completion of any number of critical initiatives now and in the future.
5 Tips to Help Justify a Data Classification Policy
If you’re convinced of (or at least curious about) the concept, here are some tips to get others on board with the program.
Tip #1 – Use Sensitive Data Findings to Demonstrate Security Risk
There is perhaps no more effective way to justify the need for a data classification policy than to demonstrate how little visibility there is into what’s inside the files you already have. Pick a small handful of file shares and scan them with a sensitive data discovery tool. It’s likely to light up like a Christmas tree with sensitive data that could be much more easily tracked and secured if it was tagged properly. Also, be sure to highlight how many individuals have access to the data and what level of permission they have.
Tip #2 – Tie Business-critical Data Exposure to Ransomware
Ransomware poses a persistent and serious threat to file system data, and the costs associated with a successful ransomware attack extend far beyond the ransom itself (e.g. backup software and storage, manpower, lost productivity, etc.). Being that not all data is created equal, it’s particularly important to know the whereabouts of your most critical files in order to make proper decisions about how to recover from a ransomware scenario. Is it possible you’d handle the situation differently if you knew there was (or was not) critical data at risk?
Using the same sensitive data results dataset as before, highlight the number of users that have Modify permissions or higher to the files containing business-critical data (and that doesn’t just mean sensitive data). A mapping could quickly be assembled of where the greatest exposure is and which accounts pose the greatest risk in a ransomware context.
Tip #3 – Use Data Classification to Plug Gaps in Strategic Programs
Classifying data has immediate and substantial impact on a variety of strategic programs within any organization, from Data Loss Prevention (DLP) to cloud migration initiatives. Here are just a few examples of how data classification increases value in existing investments and accelerates high profile programs:
- Data Classification and DLP – Classifying files with clear, consistent metadata tags makes DLP’s job much less complicated. It becomes a binary process to determine whether or not a file can or cannot leave or move around an organization when the file’s contents are already known. If you leave it all up to DLP, data will continue to slip through the cracks.
- Data Classification and Cloud Migration – One of the biggest hindrances to cloud adoption and all the efficiencies and cost savings that come with it is the fear of losing control of sensitive data. If your files are classified, it becomes a simple decision as to what will stay and what can go.
- Data Classification and UEBA – The whole promise of User and Entity Behavior Analytics (UEBA) revolves around machines being able to piece together large sums of activity data with contexts that make that activity data meaningful. As a result, if the machine has no understanding of which files matter and which don’t (in addition to where they’re located, etc.), the usefulness of an otherwise groundbreaking technology is otherwise drastically diminished.
Tip #4 – Automate Compliance Fulfillment, especially General Data Protection Regulation (GDPR)
With a slew of compliance regulations like EU GDPR, HIPAA, SOX, and PCI-DSS to adhere to, knowing which files contain data subject to any standard, where they are, who has access to them, and who is interacting with them could be made much simpler by incorporating proper data classification into the mix. Knowing where the data “should” be is ultimately inconsequential when the data is assuredly contained in places it should not be. Proactively identifying data contained in the wrong location with simple file attribute searches is exponentially easier and faster than continually scanning file contents to achieve the same result.
Tip #5 – Reclaim Valuable Data Storage Space
The fact that every organization hoards data is only exacerbated by the myth that “storage is cheap”. In reality, storage is far from cheap and accounts for hundreds of thousands – any in many cases, millions – of dollars spent every year for any individual organization.
Part of the reason so many organizations opt to hold on to stale data that likely provides no meaningful business value is the fear of deleting data that needs to be retained for legal purposes. Data Classification makes the process of identifying stale files that lack data subject to retention hold a simple task, enabling organizations to confidently delete or retire troves of data that no longer needs to be consuming valuable, costly storage space. The cost savings from storage reclamation alone are likely to justify the cost of a data classification program many times over.
Is Data Classification Worth the Effort?
Is Data Classification worth the effort? You be the judge. Are you concerned about security and compliance? Would you like to increase the ROI of multiple technology investments you or your organization have made? Do you think there’s untapped potential and intelligence to be derived from the terabytes of data withering away in your environment already?
Not unlike anything else that’s worth doing, Data Classification requires some tough decision-making and planning. It’s important to keep in mind, however, that every compliance standard, every breach, and virtually every initiative comes down to the same thing; your data. Knowing your data is knowing what to do with it.
To further the conversation, please follow Adam Laub on Twitter: @LaubAdam.
If you’re ready to find open shares that contain sensitive data for classification, please download our free trial: https://go.stealthbits.com/open-share-assessment.
Don’t miss a post! Subscribe to The Insider Threat Security Blog here:
Adam Laub is the Senior Vice President of Product Management at STEALTHbits Technologies. He is responsible for setting product strategy, defining future roadmap, driving strategic sales engagements, supporting demand generation activities, enabling the sales organization and all aspects of product evangelism.
Since joining STEALTHbits in 2005, Adam has held multiple positions within the organization, including Sales, Marketing, and Operational Management roles.
Adam holds a Bachelor of Science degree in Business Administration from Susquehanna University, Selinsgrove, PA.