I’ve had the opportunity over the last several years at Microsoft events to watch customers go from a state of distrust in the security of the cloud to cautiously optimistic adoption. Industries that used to place nearly all their stock in on-premises intellectual privacy protection are engaging on Office 365, even in some of our customer accounts with more 100,000 users. It wasn’t just a matter of convenience winning out over IT overhead, or the famous capital expense to operational expense shift – I believe that a lot of this had to do with the incredible job Microsoft did of pushing the security of its cloud. Microsoft has hosted regular sessions showcasing everything from “Fort Knox” encrypted storage to the robust layers of security and monitoring added for their data centers.

Of course, many still had the same thought: “I’m glad you’re protecting my data, but I still don’t trust what my end-users will do once they get their hands on this technology!” As if they anticipated this last objection: Microsoft announced its new Data Loss Prevention (DLP) suite for Office 365. It’s their way of pushing the question back to you: “We’ve done everything we can to physically protect your content –what are you doing to protect it?”

The goal of this article is to give you an understanding what Microsoft is offering, and explain how AvePoint Compliance Guardian can help you strengthen your DLP strategy even more while extending it beyond Office 365 to support other business-critical information systems as well.

Content-aware DLP

Regardless of what platform you’re on, whether Office 365 or on-premises software, you’re going to want to take steps that both ascribe value to your content and protect it accordingly. We need to go beyond simple access control lists (ACLs) managed by end-users and implement DLP software. In Office 365, we’re likely thinking of content-aware DLP. Gartner defines content-aware DLP technologies as those that perform content inspection of data at rest or in motion, and can execute responses — ranging from simple notification to active blocking — based on policy settings.

At its heart, we’re looking to identify content based on some sort of business intelligence and decide what governance policy belongs with each category of data we find. See the chart below for examples:

Examples of types of content and surrounding governance policies

Making DLP Content-Aware

The methodology used for content-aware DLP remains very similar across the industry, and we can use this same approach when examining Microsoft’s offering:

Identify the regulations, audits, and controls by which your organization is governed.
Define appropriate zones where this content should exist.
Implement a strategy for dealing with exceptions.
Monitor compliance over time.

Identify your regulations

This section is strongly linked with defining your governance policy – the subject covered in great detail in the definitive guide on governance written by Jeremy Thake, Randy Williams, and Richard Harbridge. To answer the first question in our checklist, start making a list of:

Audits you have to complete internally, typically for privacy officers (who has access to what, and who did what with their access, where does a certain type of content live, etc.)
Information you store fundamental to how you do business (trade secrets, financial information)
Regulations you know that govern your industry:
- Healthcare: HIPAA
- Financial Services: SEC 17a-4, Gramm-Leach-Bliley (GLB) Act
Geographies in which you do business, especially when you’re collecting and storing information about employees or customers:
- States within the US that require strict storage regulations
- Countries with data storage and privacy regulations
Fines levied against other companies in your industry, along with why the data was lost (this one should not be hard to find these days)
- Consumer credit card information leaked
- Protected Health Information (PHI) shared
- Military design shared overseas

Microsoft Exchange has a set of templates out of the box that ties back to many of the region- and industry-specific checks listed above. You can augment this list by creating your own, or using a third-party DLP solution to provide an additional level of checks. Compliance Guardian, for instance, comes with dozens of checks and dictionaries tailored for many regions. A few examples include:

HIPAA and HITECH
International Traffic in Arms Regulations (ITAR)
Federal Information Systems Management Act (FISMA)

When implementing Microsoft’s solution, you would either select these policy templates in your Exchange DLP policy or in your eDiscovery search for SharePoint and OneDrive. The new Office 365 interface provides the ability to leverage this same policy across all systems on that platform. Many organizations are relying on systems beyond Office 365, however, such as file shares, structured databases, websites, other cloud platforms. These should not be forgotten, as the sensitive information within them is held to the same regulations as Office 365. To apply your DLP strategy beyond Office 365, Compliance Guardian is able to reuse these templates for your other business-critical systems.

Narrowing the scope of regulations to those that affect your business is a critical step. I recently worked on a new implementation of DLP in a customer environment, and we ran a check against every possible violation for every industry in every corner of their environment. With more than seven days into the initial scan alone, we were still not even ready to generate a report against a single system. A few questions that can help you narrow this list might be:

Which types of data that we collect carry the biggest fines?
What types of information do we carry that have the biggest impact on sales? (Ex: product roadmap, design, etc.)
What types of data do we collect that affect the largest pool of users? (Ex: lead lists, employee information, etc.)

You can answer many of these through Privacy Impact Assessments or other types of sensitive information inventories. Know your governance policies, but make sure you refine your use to get the best results out of a scan. AvePoint and the International Association of Privacy Professionals (IAPP) teamed up to provide the AvePoint Privacy Impact Assessment (APIA) system, which helps you automate the process of evaluating, assessing, and reporting on the privacy implications of your enterprise IT systems. APIA is available as a free download from IAPP.

Get granular on your policy

One of the main reasons we recommend focusing on the right regulation is that DLP scans can be time-consuming. To use the example from Microsoft’s Channel 9 series on DLP, when searching for credit card numbers in a system, the checks need to:

Get the content to the server that has the classification rules
Run any regex checks (such as 16 digit numbers)
Perform functional analysis to be sure the match isn’t a false positive (such as a 16-digit number that reads 1234 1234 1234 1234)
Check for additional evidence (proximity searches for other keywords, like “Visa” or “Mastercard” or an expiration date)
Provide a verdict, including the confidence of the match and the number of violations identified.

The more detailed the check, the longer it will take. A check for PHI for instance can contain upwards of 18 regular expressions to find common identifiers.

One of the best ways to reduce the scope of your scans specific to your industry is to take a look at what’s covered by each of these policies. Microsoft, like most DLP solutions, will give you the ability to customize which checks are supported for each policy created. Take a look at the list of built-in sensitive information types that the Microsoft DLP publishes, such as PHI, Personally Identifiable Information (PII), and financial information. In this case, if we know that we neither collect nor process social security numbers, you could uncheck those regex expressions – which is typically a more expensive check than a basic keyword check.

Leverage examples whenever possible

If you’re able to find samples for common documents you’re looking to protect, such as an invoice, an HR application, legal contract, or a personnel record, you’re also off to a great start to optimizing your DLP scans.

Both Microsoft and Compliance Guardian allow you to leverage templates, or sample documents, as your criteria for matching documents and information. This is essential when you’re searching for intellectual property or other corporate documents – you might not be able to predict this month’s project code names, keywords, or whether a piece of financial data is simply an invoice or a revenue projection – but the context of that information in a document can tell you almost every time. Example templates include:

Board meeting minutes or report templates
Tax forms and transaction records
Prescription forms and patient history forms

In this case, you’re going to get much faster performance when you don’t have to evaluate each file against a regex, keyword, or other combinations of checks. Simply identifying whether you have found information based on the template’s content gives you better performance and “blanket” coverage against all the types of information that might be in that document, without the cost of checking for each item.

Define acceptable access

If you were to unleash a DLP policy or eDiscovery scan against Exchange or SharePoint at this point, you would most certainly find sensitive information in your environment. The majority of data privacy regulations are not meant to prevent you from collecting data, but are rather intended to govern the acceptable use of it.

Taking a prohibitive posture toward content is the fastest way to get sensitive information out of your system – and it won’t matter what DLP system you have in place. Users who are prevented from doing their job will leave your services for more liberal ones, even those outside the company. One of our customers recently revealed that during an investigation of DLP solutions for their Lotus Notes Mail platform, they found that most of the critical conversations were happening through personal email services where their users felt they could actually get their work done.

Before creating a DLP policy, ensure that every check and regulation listed in the section above has an “acceptable use” column filled in. This is to make sure the scope of acceptable information matches what our intended use is. Examples:

Flag any PII violations unless you found the information in HR’s SharePoint site.
Prevent customer ID numbers from being e-mailed unless they are sent within accounting.
Allow content regarding the development of new intellectual property only when the information is shared with users in the corporate domain.

Microsoft approaches this with the first conditions of transport rules. Example: Apply this rule for content sent outside your organization, except when the sender is from the accounting department.

Creating this filter policy is going to greatly reduce the false positives in your policy checks and keep your users from hitting any walls when doing the right thing with content.

Plan for violations

Now that we know where acceptable information should be, we need to tell the transport rule what to do when the information is found outside those acceptable boundaries. Unless we’re planning to take action from the start, all we’re doing is an eDiscovery. Microsoft transport rules provides an adjustable response as part of the overall DLP policies set in Office 365, which includes actions shown below:

Each tier provides a more invasive response to violations. The majority of these actions can be taken for Exchange today, with future plans of adding them to SharePoint and OneDrive. Compliance Guardian, however, includes additional steps beyond this spectrum, including the ability to quarantine, adjust security, redact, tag, and encrypt documents in place as they’re uploaded or stored to SharePoint or Yammer and moved from File Shares or even structured databases.

It’s worth noting that while you’re still working out the false positives in your system – especially while you’re tailoring the rules and regulations you’ve identified above – Microsoft makes it possible to set these policies in test mode only.

This feature gives you an incremental approach to rolling out DLP from testing to non-invasive and fully-functional.

Identify content that needs immediate response

Typically known as “real-time” scanning and monitoring, we have to consider how well any DLP system is going to be able to stand between an end-user and a mistake. With organizations in the defense sector, for instance, simply the presence of classified materials in certain networks is enough to trigger a full and costly breach investigation.

Microsoft’s focus for immediate response centers around end-user education with in-context policy tips. As a strategy, this makes sense: They plan to educate the user before sensitive content is even authored or sent.

Microsoft in-context policy notifications

Outlook and Office Web Applications (OWA) give you just-in-time notification of potential violations as the information is being authored. In the case of sharing content (such as in OneDrive, SharePoint, or Exchange), there is a clear scope of content with a clear intended recipient. We have the server in place, which can check for any potential violations on behalf of the user as they are sending, and this is why they are called transport rules. The rules can be triggered on everything from a measure of how confident the scans are that sensitive information has been found to the number of violations found in a document.

Microsoft’s planned policy tips for OneDrive documents

Here is another area where we start seeing the limits in relying solely upon the Microsoft implementation, however. What if I’m not sharing directly with a user, but simply uploading a document via OneDrive? What if we’re authoring a blog post or wiki page in SharePoint? In these cases Microsoft is not scanning the document before upload. Microsoft DLP leverages the SharePoint search index to help identify what it considers sensitive in these systems. That means your exposure exists on content until the next incremental crawl triggers in SharePoint.

In many cases, breaches of personal information are not always the result of a simple email containing personal information – but rather the discovery of existing content that had already been published to file shares and SharePoint sites. Fortunately, Compliance Guardian provides a real-time solution that allows you to take action on content before it enters an exposed network – whether it’s Office 365, Yammer, SharePoint, or Lync – preventing the company from taking on any risk at all.

Example SharePoint upload with AvePoint’s Compliance Guardian providing real-time feedback to users based on document classification

Scanning after the incident

In the previous cases, during a send and share or an upload, I have a clear audience that I can check against. In the case of a content repository (SharePoint, File Share, Yammer, etc.), I do not have a clear audience. While the ACL list can be read and checked at the time of sharing, we’re not completely clear on who will end up with access after the information in posted, and how that access list will change over time. Today, this might be shared appropriately, but over time that may change.

Given this approach, Microsoft allows you to leverage the eDiscovery center to run recurring searches against content that has already been posted to a SharePoint site or OneDrive location.

Microsoft’s eDiscovery interface for identifying sensitive content

Combined approach

What this requires is a two-step implementation of DLP:

Proactive measures – including both just-in-time notifications of policy violations as content is being authored, and as it is being shared.
Reactive measures – or scheduled check-ups that will ensure violations are not recurring simply due to age. This is especially important when violations have to do with the expiration of content (mandatory de-classification or mandatory disposal) which would not be apparent as content is initially being created.

Microsoft’s system for this appears fragmented in the current implementation of DLP in Office 365 – their roadmap does indicate unification of these areas in their Compliance Center, but it is currently divided system-by-system in terms of reactive capabilities. Compliance Guardian was designed with unification in mind – including the ability to assign scans, policies, and reports across all Microsoft platforms, whether online or on-premises. Compliance Guardian also provides the ability to scan across your non-Microsoft platforms, including content stored on file shares and in structured databases.

Active monitoring

The last piece of the DLP puzzle is reporting on and monitoring for violations. As we indicated, violations will happen, and your ability to both respond and provide an audit trail will greatly reduce your company’s overall risk, including the ability to prove controls. Microsoft does provide reporting through its compliance center:

Microsoft’s incident management dashboard

This gives a dashboard with high-level information to help identify whether issues exist, where they are generated most heavily, and what is causing the violation. In addition, when forensic investigation is required, Microsoft’s system provides alerts in the form of emails that summarize what occurred, what controls were put in place, and any other audit information.

Microsoft’s incident management report, sent via email

What is severely lacking in Microsoft’s solution is a workflow for incident management. Simply finding sensitive information is the easy part – but there are many follow-up questions that need to be asked:

Is this incident a false positive?
Did we take the right action here based on what we found?
- Was quarantine necessary?
- Was encryption necessary?
- Did the right tag get applied?
Who could have seen this information?
What did they do with the information they saw?

This investigation workflow is a key part of DLP systems, and is the foundation of your incident management reporting requirements. This is truly an important component of successfully implementing a DLP solution because context as well as content is critical. Every organization will have sensitive data within it – whether this data causes as issue is all about the context. Who can access it? Who has accessed it? What have they done with it?

With Compliance Guardian, you can monitor and track violations via an intuitive interface that helps compliance officers view and manage incidents as well as their automated responses, including options to override automated actions taken (such as quarantining offending data). Tracking reports can also be easily exported and sent to external business users for review so that your organization can make informed decisions and take action.

Compliance Guardian's incident response interface, with the ability to take action on sensitive content — AvePoint’s Compliance Guardian incident response interface, with the ability to take action on sensitive content

Compliance Guardian's incident response reports, including real-time actions taken against new content — AvePoint’s Compliance Guardian incident response reports, including real-time actions taken against new content

The history of incidents in Compliance Guardian’s management interface is also directly tied to audit and historic data to answer the following questions (which Microsoft’s solution will not):

Which version of the document introduced the violation?
What did the ACLs look like at the time of the breach?
Who accessed or viewed this information after the exposure happened?

Having the complete picture of an incident and all relevant activities surrounding it is fundamental. For instance, if sensitive information is posted either to a blog or wiki, an immediate audit can be performed to generate a list of who could have viewed that information. Microsoft will give you a 90-day report on violations including audit data – but what do you do when the information was posted 6 months ago? One AvePoint customer uses our products to process more than 3 years and 300 million audit records to identify data spillage in their environment, with response times typically within 24 hours.

Enterprise level risk assessments

Microsoft’s DLP is focused on eDiscovery (e.g. where do specific types of information live today?) and in-context policy notifications for end-users. The incident response provides insight on a policy-by-policy basis, and only for the scope that they currently support (which does not include Lync and Yammer).

Compliance Guardian, on the other hand, offers cross-platform risk monitoring that enables you to implement a DLP strategy across your business-critical information systems. It doesn’t matter whether your investments span Office 365, Lync, Yammer, websites, SharePoint, file shares, or structured databases – the risk score in Compliance Guardian can be tallied and tracked over time across all of these systems to ensure compliance throughout the enterprise.

Compliance Guardian's risk dashboard showcasing HIPAA information across platforms — Compliance Guardian’s risk dashboard showcasing HIPAA information across platforms

Conclusion

Microsoft is taking steps forward in pushing the DLP technology for Exchange across the rest of its Office 365 platform. But when defining your strategy, remember that you need to be evaluating your DLP choices based on the completeness of protection and the solution’s ability to lower your risk scores (and therefore save you money). While using Microsoft’s platform for in-context policy tips and end-user education may help train users on sensible collaboration practices, it needs to be augmented by better real-time preventative controls, cross-platform risk assessments, and incident management workflows, which Compliance Guardian is designed to provide.

To learn more about how Compliance Guardian can help your organization safeguard its sensitive data across information gateways, please visit our product page and request a demo.

Have a specific question about DLP or compliance? Leave a comment on this blog post or join us in our product discussion forums.

What is Your Data Loss Prevention Strategy?

Table of Contents

Content-aware DLP

Making DLP Content-Aware