A Definition of File Fingerprinting
Document fingerprinting is performed by algorithms that map data such as documents and files to shorter text strings, also known as fingerprints. These fingerprints are unique identifiers for their corresponding data and files, much like human fingerprints uniquely identify individual people. Generally, document fingerprinting is a traditional network-based data loss prevention (DLP) feature.
Document fingerprinting is used to identify sensitive information so that it can be tracked and protected accordingly. As employees continue to produce and handle various types of sensitive data and information in increasingly large quantities, data fingerprinting provides a scalable technique for identifying, monitoring, and applying protective controls to data as it moves across the corporate network.
Document fingerprinting is especially useful for identifying sensitive data within forms, including government forms such as tax documents, Health Insurance Portability and Accountability Act (HIPAA) and other regulatory compliance forms, employee documentation forms used by finance or human resources, and other proprietary forms that a business may use, such as customer order forms or contracts. By extending data fingerprinting to forms, traditional DLP solutions can detect sensitive data such as social security numbers, credit card numbers, and healthcare information within those forms. Recognizing when these documents contain pieces of sensitive data enables DLP solutions to secure those documents during transmission.
Methods of Document Fingerprinting
Different documents and data types often follow unique word patterns. Traditional DLP solutions identify those unique word patterns within the documents and then create document fingerprints based on them. From there the DLP solution uses the document fingerprint to detect network data transmissions featuring the same pattern.
One of the most efficient ways of creating document fingerprints is to upload a form or template first. This enables a DLP solution to detect and match file fingerprints even after a form has been modified or completed, as the original elements of the template will still be present. The DLP solution then can determine the sensitivity of the document based on its fingerprint and secure it if needed.
Benefits of Document Fingerprinting
The primary benefit of file fingerprinting is the ability to automate and scale the process of identifying and tagging sensitive information on a network. After a DLP solution creates data fingerprints and associates files with their appropriate DLP policies, the program detects network traffic – such as emails, TCP and FTP traffic, or web uploads – that contains documents matching fingerprint data in order to apply protections based on those DLP policies. Protections deployed based on document fingerprints may include blocking transmissions, preventing file access, or encrypting sensitive data. Most businesses and organizations have restricted-use or restricted-access policies so that only specific users have access to sensitive information; this can be made possible by data fingerprinting used in conjunction with DLP policies and exceptions.
Document fingerprinting is an important safeguard against insiders intentionally or inadvertently leaking sensitive information to outside or otherwise unauthorized contacts. Intellectual property, as well as government forms and business records, can be protected through document fingerprinting.
Challenges of Data Fingerprinting
The effectiveness of document fingerprinting can limited in certain instances. Fingerprinting does not always work for certain file types, including documents that are encrypted or password protected, images and videos, and data in which the text does not perfectly match a predefined document fingerprint. Document fingerprinting also may not be enough to defend against external cyber attacks and malware. As attackers and malware become more sophisticated, it becomes more difficult for businesses to defend against them using traditional methods.
Additionally, businesses need to utilize malware detection that is dynamic enough to quickly and accurately detect malware, including new variations of malware. New advanced behavioral analysis featured in dynamic endpoint detection and response solutions is better able to detect malware than traditional signature-based or fingerprinting methods. Also, many advanced endpoint DLP solutions have more effective methods of classifying and protecting sensitive data based on contextual factors such as user, file source/destination, and file or application type. Traditional classification solutions using document fingerprinting can also be time consuming, as they require a lengthy process of developing fingerprint models before they are able to detect sensitive data based on those fingerprint templates. Finally, because file fingerprinting is typically performed by a network DLP appliance, these solutions are only viable when users are connected to the corporate network – unlike endpoint DLP solutions, which are installed on devices themselves and can protect data on or off the network.
The Role of File Fingerprinting in a Comprehensive Security Program
While document fingerprinting is useful as an automated, scalable solution for securing data and files on a corporate network, it is not a standalone solution to data security for enterprises today and should rather be used as a component of a defense-in-depth strategy. Network DLP solutions that employ document fingerprinting are a viable solution for many regulatory compliance requirements as well, but the contextual protection and device-level security capabilities offered by endpoint DLP solutions often make endpoint DLP the answer for businesses with more stringent data protection requirements.
For more help on deciding which type of data loss prevention solution is right for your business requirements, check out this video: