Data Lifecycle Management (DLM) can be defined as the different stages that the data traverses throughout its life from the time of inception to destruction. Data lifecycle stages encompass creation, utilization, sharing, storage, and deletion. Each stage of the data life cycle is controlled through a different set of policies that control data protection, resiliency, and regulatory compliance.
Companies rely on different types of data to generate and grow revenue, create new market opportunities, and compete in the marketplace. The limitless potential of data can be harnessed by focusing on data protection, data security, data resiliency, and compliance. Data can be treated like any other physical asset in a company and plays a key role in the business decision-making process.
It is common to hear DLM referred to as Information Lifecycle Management (ILM), however, they are not synonymous and there is a subtle difference between the two. DLM refers to raw data that is either stored in a relational-database or NoSQL database. It could be both structured and unstructured data. ILM refers to a tangible piece of information that is constructed using one or more pieces of data and its associated metadata. For example, the different stages that a purchase order goes through from the time it is created, fulfilled, invoiced, archived, and finally destroyed can be referred to as ILM. Multiple pieces of raw data from different data silos that constitute the purchase order and the stages that data will grow through are referred to as DLM. It is fair to say that ILM will drive various stages of DLM as DLM cannot exist without ILM.
The cycle starts with the inception or generation of data. In today’s digital age, just about everything we do will result in the generation of some type of data. For example, Walmart collects 2.5 petabytes of unstructured data from 1 million customers every hour (DeZyre, 2015).
What is Hot, Warm, and Cold Data?
During its lifetime, data can be classified using a multi-temperature scale. Frequently accessed data is referred to as hot data, less-frequently accessed data is warm data and the least frequently accessed data is cold data. Data classification typically depends on business rules and can vary from company to company.
- Data that is required to perform day-to-day business activities and is frequently accessed is referred to as hot data. This is the most expensive phase in the data lifecycle as the data must be optimized for quick access by storing it on Tier 1 type storage.
- Data that is accessed infrequently, however, is required to be online to satisfy certain business rules and regulatory requirements and is referred to as warm data.
- Data that has served its business purpose and has no intrinsic value to the business is referred to as cold data. Cold data is typically archived off or deleted.
Three Main Goals of Data Lifecycle Management
The number one challenge that companies face while growing and amassing data is a data breach, which means that the data must be managed effectively throughout its lifecycle. The three most important data lifecycle management goals can be categorized as follows:
- Data Storage & Security – Once the data is acquired it needs to be stored securely thus limiting the misuse of data. Structured data can be stored in on-premise databases or in the cloud while unstructured data is typically stored in file servers and or in the cloud. Regardless of where it is stored, the data needs to be secured against unauthorized access and theft.
- Data Availability – Since the business is essentially driven by data, it’s crucial to ensure its availability to the business. Availability also includes processing and visualization of data as required by the business.
- Data Resiliency – As the data ages, it can morph over time due to modifications, cleansing activities. Such activities can also result in data sprawl, meaning the same data can exist in multiple locations in slightly different forms. Therefore, it’s necessary to put a process in place to ensure the integrity and resiliency of data.
Different Phases of Data Lifecycle Management
Data will go through four different key phases during its lifecycle. Each phase revolves around the purpose and value of data and to whom the data is valuable. Other factors that will influence each phase include – data privacy, data security, and data compliance.
- Generate & Collect – Just about any day-to-day activities result in the generation of some type of data. For example, buying groceries at a grocery store, or withdrawing money from ATM will result in generating data in one or more systems. Data is also created by on-premise IT systems to aid further analysis of the data generated by business actions. The type of data generated could be both structured and unstructured.
- Process & Manage – After the creation, data is stored in relational databases, NoSQL databases, and file shares based on the nature of the data. Data could be further processed to suit business needs, such as finance, marketing, customer relationship management, etc., As part of data processing, data is also classified as internal, sensitive, restricted, and public. Data protection policies including access control, data encryption, data masking, and data loss preventions are also applied to the data as part of managing it. At this stage depending on the age of the data and its relevancy to business processes, it is classified as hot, warm, or cold.
- Analyze & Visualize – In this stage, data is cleansed and validated after which it is shared with business users, consumers, and other third parties. Data protection policies including access control, data encryption, data masking, and data loss preventions are also applied to the data prior to sharing it. Enterprise resource planning (ERP), Human Resources (HR), Customer Relationship Management (CRM), Data Warehouse (DW), and inventory systems are some of the IT systems that are used to provide access to the data.
- Archival & Destruction – For long-term availability, cold data is typically archived to tape, disk, cloud storage preferably in an encrypted format. Archival could either be on-line or off-line. On-line means that the cold data is stored in the same exact format as the hot and warm data and is usually due to regulatory compliance reasons. Off-line means that the data is either stored in a file format or database dumps or exports and are typically encrypted. Depending on the nature of the data, all or most of the archived data can be targeted for deletion. This is especially true for derived data rather than raw collected data.
Managing the Data Lifecycle
It is common for companies to generate petabytes of structured and unstructured data and have it stored in different data silos. Some of the issues that impact this data are privacy laws, data security, data ownership, data quality, legal liability, and public perception. Data governance and security solutions offered by Stealthbits can make it easy to manage the entire data lifecycle from a single-pane-of-glass. StealthAUDIT can discover all the structured and unstructured data silos, both on-premise and in the cloud. It also provides detailed reports on data access permissions, sensitive data discovery classification,and database vulnerability assessments. It will also help in remediating any adverse findings.
To learn more about how Stealthbits can help with your data lifecycle management needs, visit our website: https://www.stealthbits.com/solutions
Sujith Kumar has over 25 years of professional experience in the IT industry. Sujith has been extensively involved in designing and delivering innovative solutions for the Fortune 500 companies in the United States and across the globe for disaster recovery and high availability preparedness initiatives. Recently after leaving Quest Software/Dell after 19 years of service he was working at Cirro, Inc. focusing on database management and security. His main focus and area of interest is anything data related.
Sujith has a Master of Science in engineering degree from Texas A&M University and a Bachelor of Science in engineering degree from Bangalore University and has published several articles in referred journals and delivered presentations at several events.