In our previous entries, we’ve discussed WHY it’s so important to have policies and procedures in place for proper data management. In our last article (Part 1), we discussed identifying WHY the application of a data governance workflow is critical to business success and WHAT you should remove. In this article, we’ll look at the key factors in HOW that data needs to be disposed of.
Figuring out what data can and should be removed from your data stores can seemingly be a daunting task, especially if you have a lot of data and limited resources to tackle it. It’s reminiscent of the old analogy of trying to eat an elephant: if you sit there looking at the entire elephant you might begin to think it’s impossible. But the key is this: Take it one bite at a time. So, in this article, we look at what’s involved in tackling the project in front of you.
Part 2: Data Inventory and Classification
You can’t begin to clean your data stores if you don’t know what’s in them to start with.
- Inventory: The necessary first step is to compile an inventory of your data stores, then catalog the data stored inside of them. Cataloging that information into a central repository provides you with the foundation you need to begin the classification of data.
- Classification: Inventoried data should then be classified based on defined policies that include its sensitivity, value to the business, regulatory requirements, and so on. Bear in mind, this is very likely not a one-person job. Often, we need to bring in the owners of that data (Subject Matter Experts, or SMEs) to provide their expertise on the value and necessity of data. As a result, an organized communication system is critical to tackle the classification of data. We need to be able to communicate with team members, data owners and custodians, management, and others to make effective decisions on the data.
Redundant, Obsolete, and Trivial (ROT) Data: Beyond identifying ownership and categorization, ROT data can be identified in a category of its own and bulk deleted or set aside for a managed SME review. ROT files sometimes include backups of irrelevant data, generic Windows files, or data that is beyond the retention timeline.
Included in ROT is the identification of duplicate data (multiple copies of the same file) or near-duplicate data (iterations of the same file, like versions of a contract) that may require action. Remember that even if the duplicate or near-duplicate files are deleted, there may be a rationale to keep a log of where each of the copies or near-copies were stored in order to provide auditing or technical support later.
- Defensible Workflow: Once the data has been classified and data is tagged for deletion or archival, it’s time to act on it. Having a methodology for how data is effectively and properly disposed of or transferred to archive storage is key. In future posts, we’ll discuss what goes into choosing a disposal method, what role security plays in the process, appropriate documentation, and compliance considerations as we take a proactive approach to our data lifecycle.
Look for upcoming posts to continually expand your knowledge about data, retention and optimizing data workflows.