I recently presented a webinar with Digital Guardian VP of Product Management Tony Themelis called “Why Data Classification Should Drive your Security Strategy.” At the end of the webinar our audience had some really interesting questions for Tony, and I wanted to share those in a Q&A format. You can watch the full webinar on demand here.
Should data classification be done differently for cloud applications?
With our approach, that the same technical infrastructure that is used to discover structured internal repositories like Sharepoint or databases is used to connect to enterprise cloud solutions like Box, Office 365, etc. Data discovery is done in both locations and when you view the results you can slice and dice however you’d like – you can look at your data in total and you can determine how much of it is internal versus how much is in the cloud. As a result, you can classify data the same way, regardless of where it is hosted.
Is it okay to only classify some data or should all data have some type of classification category assigned?
It’s kind of like set theory – you have overlapping combinations of things. You have the whole world or population that you’re looking at and within that you might have classified data and unclassified data. In essence, even if you have unclassified data, it’s classified as non-sensitive by default. So going through the exercise of having to classify everything is not necessarily appropriate, because by definition anything that is not classified you don’t have to worry about, although we would argue that you do have to continuously worry about it to make sure you’re not missing anything. In short, I think that what’s really important is to define the classification hierarchy, to keep it simple so it’s clear what is classified, and to continuously monitor the rest of your data for things that you might have missed.
What are the major challenges of beginning classification of data at rest without examining related data flows?
If you want to classify your data, there’s no challenge – you can do it either way (automatically or manually). It really depends on the context of the company, their compliance requirements, and the environment that they’re in. If you know that you need to be PCI compliant, then discovering all PCI data is a no brainer – there’s no challenge to doing it and there’s no reason to look at data flows because you just want to scan everything and find all the data that you need to classify and protect.
The challenge stems from the common misbelief that you can’t engage in a security program without first classifying your data. That isn’t going to work if you don’t know what classification hierarchy and methodology to use, right? The only way to do it is to examine data flows so that you can learn about the organization and define what is sensitive. The scenario where you don’t know or you’re not subject to compliance makes starting with a discovery program a challenge.
What does data flow visibility look like in Digital Guardian? Is it some kind or report or real data flow diagrams?
We provide reports and data flow diagrams. One of the things we look at is defining sources and destinations. Sources might be structured repositories – Sharepoint websites, file shares, users’ hard drives on laptops or desktops, etc – while destinations might be any of those things but could also include emails, copies to USB or CD/DVD, or uploads to cloud storage. If you think about the matrix you have, you have a list of sources and a list of destinations and then you have files that flow in between. Basically what we do is we visualize that for you in a report. The report allows you to select the sources and destinations that interest you, so you can look at all sources and all destinations or you can say “Show me all database extracts that are burned to CDs.” Then when you look at that subset, you can slice and dice by user, by business unit, by file type, by classified data vs. unclassified data, by application, etc.
That insight shows you what information is important and what isn’t – where the big flows are and where the small flows are. We do a lot of statistical analysis on that as well, so we look at those flows and we can define baselines and variations – by user, computer, application, etc – so that on an ongoing basis you get this set of resources that you can look at in a visual way and those allow you to start managing risk over time. By looking at them on an ongoing basis and observing differences you can inform your policy, your governance, and your security direction. It’s real time information that informs you and the results can help answer questions like “How can I better classify? How can I better control? How can I better report? How can I make sure that I eliminate false positives and negatives?”
How does manual user classification work? Is there a special user interface automated by Digital Guardian or does it require some kind of user expert input?
We provide a special user interface that integrates with Microsoft products like Outlook, Word, Excel, and PowerPoint, with other productivity applications like Adobe PDF or Acrobat, with CAD solutions like AutoCAD, etc. When you launch the program, it provides your users with an added component in the “ribbon” that offers assignable classification tiers. Our solution also allows you to require users to assign a classification. To relate that to the earlier question about whether everything has to be classified, you can have a choice in the dropdown that says “I choose not to classify” along with your typical classifications such as “Top secret” or “Internal only.” Based on what a user chooses in the interfaces of these commonly-used applications, Digital Guardian can then apply policy-based controls to those files according to their classifications.
Do you provide any support with the Digital Guardian’s data discovery and classification tools or do the tools just find and classify data on their own?
It’s a combination of both. When you install our platform, it shows you the data flows out of the box. Those data flows are represented and can be broken out by your activity directory, file type, user group, or egress channel. All of that happens automatically, so in the sense that the tool can observe and find the data on its own, yes.
If you’re conducting a discovery exercise and you’re looking for certain patterns, then again the answer is yes. What you need to do is define those patterns; sometimes they’re available out of the box, but other times they may be specific to your company or industry. Once they’re defined you can just let the tool do the discovery for you.
We provide support for these processes through our Managed Service Program. One of the things that we’ve found is as we build expertise over the years in doing this stuff, a lot of our customers want to use the tool but they don’t want to be involved in the technical specification of it – they want to outsource it to us and just make sure that they guide us to provide them with the right results. If you subscribe to our Managed Security Program, you get a full level of support that allows you to use us to help you find the data and classify it.