With the exponential growth of data, businesses today are faced with challenges with how to successfully create, structure, and manage the content within their organization to make it meaningful, accessible, compliant, and secure. As we’ve continued to look at where customers struggle the most when it comes to maintaining proper information governance, the root cause often centered around having a system for defining data by purpose, ownership, and applicable regulatory restrictions – a schema and strategy for data classification.
The Business Side of Things
Regulatory needs aside, data classification through properly attaching or embedding metadata to and within content can help tremendously when it comes to creating a structure that is meaningful to the business. Data classification makes it easier to maintain proper lifecycle management of content by helping to ensure relevant content is available and non-relevant content is retired per business rules. It also helps with finding content quickly. For example, Microsoft SharePoint is a collaboration platform that’s been adopted by organizations large and small around the world. For the search functionality in SharePoint to return meaningful results, it relies on data being properly classified. And for data that is still on file shares, having it properly tagged can not only be found more effectively, but it can also be properly protected.
In order to maintain classification long term, not only will you need a schema that aligns your IT, business, and compliance teams, you’ll also need a scalable method for execution. That’s where technology comes into play.
How Do You Automate Data Classification?
Data classification is one of the key areas of information governance we address with AvePoint Compliance Guardian. In order to handle data classification for the demanding volume of data being created every day without slowing down productivity, we built automation into areas where that decision makes the most sense, and placed it in the hands of those that actually know the data best.
First, to prevent sensitive information from being overlooked, our software detects certain information – such as Personally Identifiable Information (PII) – and automatically attaches or embeds mandatory metadata tags of your choosing. Additionally, there are options to prompt the business user to apply suggested or custom tags as files are created or uploaded. That way, you’re able to collect and attach the most amount of relevant information to each file as they enter the system so you can feel confident that all of your content is properly classified. Let’s take a further look with this next example.
Auto-Tagging (Classification) of Data using Compliance Guardian
Using Compliance Guardian, we can add tags to an item that become metadata, which helps classify the data where it lives. We can scan content within and across a number of different systems:
File systems
Microsoft SharePoint
Microsoft Office 365
Box
Lync (now Skype for Business)
Yammer
Structured databases (i.e. Oracle and SQL)
Website content
Data Classification in File Systems
Historically, file systems have been used by IT to store data for an enterprise. These storage systems are accessed by the business for day-to-day interaction as well as safe keeping. The data itself can contain meaningful metadata that can be used to create structure for classification of content. Here is an example of how metadata can be added to a file in a file system.
First we create the connection to the file system:
Next, define the matching criteria for scanning the content. Here we are searching for payment card industry (PCI) content. Once the content is found, we will add a tag (metadata) to the file which will create our classification. Once classified, the metadata can be used to assist with data analysis for further actions such as migration. If a file contains sensitive information, you may need to or prefer to store it on-premises versus in the cloud.
Data Classification for Microsoft SharePoint (and Office 365)
Now let’s take a look at content stored in SharePoint. Keep in mind, Compliance Guardian supports both SharePoint and Office 365 – SharePoint Online to support hybrid environments.
We first configure the type of tag to add as well as its location. With a Static tag set up for the File and SharePoint for “SensitiveInfo,” any files containing content that you specify as sensitive (whether that’s PII, credit card information, health information, etc.) will have the tag “SensitiveInfo” added to its metadata. Scans can be customized at the Check level (where you can set a specific type of information to look for) or at the Test Suite level (where you can combine multiple checks that together may indicate a specific type of content). Compliance Guardian can also scan newly created content in real time or existing content on a schedule to regularly ensure nothing slips through the cracks.
Here, we now see our new metadata classification applied to content in a SharePoint library.
Applied and managed at the content level, you can be sure that your data classification standards are carried through for all of your content throughout its lifespan.
Data Classification for Yammer
Lastly, I’ll use Yammer as an example since enterprise social is an increasingly popular medium for collaboration.
Here we are adding hashtags in Yammer posts based on content matches found.
In this example, matching content was found and referenced in a SharePoint list.
From there, associated metadata was created in Yammer as hashtags. Now, this data can be referenced for historical and cross reference.
What can you do with classified data?
Perform compliant migration
Automate permissions management in SharePoint and Office 365
Recertify metadata for SharePoint and Office 365 workspaces
Automate records retention in SharePoint and Office 365