Data Access Governance (DAG) has many different types of use cases, with most falling into three main categories: data security, regulatory compliance, and operational efficiency. There has been a lot written about security due to the increasing frequency of ransomware attacks, and a lot is being written about compliance, most recently around privacy – but we haven’t talked much about the operational efficiency use case.
A good DAG program allows organizations to manage more data with fewer people and to identify data that can be safely archived or deleted, freeing up storage resources. In this blog series, we will walk through the Best Practices for storage reclamation by identifying and deleting data that is no longer in use throughout your organization.
The amount of unstructured data managed by organizations continues to grow at a rapid pace. File servers provide a simple, scalable approach for sharing documents; however, file systems are rarely cleaned up. Over the years, the data stored on these file servers builds up and becomes dated, leading to difficulties in finding files when they are needed, increased costs for storage space, and significant risk if sensitive information is stored within these files.
Eventually, it becomes necessary to clean up unnecessary files, which can be a daunting task when there are millions of files and thousands of users who may be accessing them. Most organizations fail to address this problem and eventually face the consequences including increased storage costs or loss due to a breach or insider threat.
Here are the five capabilities needed to efficiently clean up a file server with minimal impact to end users.
Setting the Priority
A file server cleanup can be performed for any number of reasons, but it is important to have a clear goal in mind prior to engaging in a clean-up project. Some of the scenarios where file cleanups may be necessary include:
- Storage Reclamation and Cost Avoidance – Inactive and stale data can consume a large amount of file server space. By removing unnecessary data, it is possible to repurpose this space for active content.
- Migration to New File Servers – There are a variety of reasons for migrating to a new file system. If using Windows, migrating to a newer server OS provides valuable storage and security features. Also, moving to a more modern Network Attached Storage (NAS) or cloud storage solution can offer cost and management benefits. In any case, this is a great opportunity to find out what is no longer being used and migrate only what’s necessary.
- Consolidation – Often times, through organic growth, changes in the company, and improvement in storage technology, there may be an unnecessarily large number of file servers. Consolidating these servers makes it easier for employees to find and share data, improves the efficiency of managing the data, reduces cost, and tightens security.
- Risk Reduction – Unstructured data stored on file servers is one of the most valuable assets an organization has. Ransomware and other malware target file servers in order to steal, encrypt or destroy this data. Attackers or malicious insiders target these files during data breaches. Undergoing a storage reclamation project to clean up file servers reduces the overall attack surface and makes it easier to manage and govern the data that is absolutely necessary.
The following capabilities will enable a successful file cleanup effort, regardless of the business driver.
Capability 1 – File Discovery
The simple question, “Where is business data stored?” can be unexpectedly hard to answer. Employees can store data wherever they have rights, and it can be difficult to enforce standards as to what data goes where. To start a file server cleanup project, some basic visibility is needed.
Identifying all file servers within the organization is a necessary first step for storage reclamation. Ideally, this information is centrally managed in a Configuration Management Database (CMDB). If not, a discovery scan of the entire environment will be required. Beyond just identifying where servers exist, it is useful to understand the size of the file servers, their operating systems, the number of shared folders, and the sharing protocols used (CIFS/NFS), all of which will be needed to plan the cleanup.
Once the file servers have been identified, it is necessary to investigate the files stored within them. Typically, this will be done by approaching a small subset of servers at a time rather than inspecting files across all file servers simultaneously. Some of the attributes that should be inspected are outlined below.
|File Extension||File extensions can identify which file types are stored on the system and help identify application, business, and personal file types.|
|File Size||Knowing where the largest files exist is useful for achieving the most storage savings.|
|Owner / Author||Communication with the person who created the file may be needed during the cleanup process.|
|Date Modified||This indicates how stale a file is. This does not take into account file reads.|
|Date Accessed||This indicates the last time a file was accessed. This may not be accurate in all cases.|
|Date Created||Useful for finding where files are actively being created.|
|Tags / Keywords||Provides data classification details to determine where sensitive or confidential data resides. This is only valuable if an organization is leveraging a data classification solution.|
Capability 2 – Sensitive Data
Any files containing sensitive information such as personally identifiable information (PII), Payment Card Industry (PCI) data, Personal Health Information (PHI) or other confidential data should be treated with special care during a cleanup campaign. These files should be managed under their own retention and security policies, and monitored more closely than non-sensitive files. Common approaches for identifying sensitive data include:
- Full Text Search – Searching within the content of all files for known patterns and keywords that represent what is sensitive is an effective way to find where the most important data resides.
- File Classification Tags – If classification products are in use to tag documents at the time of creation, the ability to collect these tags can provide an efficient way to understand where confidential data is located.
- DLP – Some DLP products offer scanning of data at rest. If those scans have already been performed, it isn’t necessary to rescan those files. Tapping into the results of DLP products may offer a quick insight into where sensitive data can be found.
Awareness of sensitive data enables a much more secure cleanup campaign. Policies can be enacted for where sensitive data is stored, who has access to it, how long it is maintained, and how the data is monitored.
In the next blog, we will continue working through the list and will continue the discussion starting with monitoring Activity and File Usage and how they relate to storage reclamation.
As the VP of Product Marketing, Darin is responsible for product messaging and positioning as well as generating industry and market awareness for Stealthbits products. He is an experienced leader who has worked in software for over 21 years.
Prior to joining Stealthbits, he was VP of Marketing for Quorum and SecureAuth, and has held positions in product management & product marketing at Oracle, and Quest Software.