Among the vast array of solutions promising data discovery and identification there are a wide variety of data scanning approaches. Each vary in their depth and completeness and come with their own benefits and drawbacks.

Some basic definitions

Before we look at some of the main scanning approaches used for data discovery, it’s important to have a basic understanding of what we mean by a full scan and what makes a scan ‘partial.’

A full scan delivers a complete scan of data targets – which may include structured and unstructured repositories and file stores. Not only do these scans analyze every record or file in these locations, they also scan the full file of each including metadata and contents, regardless of its size.

A partial scan is one that limits the extent of its scanning, typically as a result of technological or time constraints. This may mean that the solution scans only a sample of systems, a sample of files, the first few bytes of a file, or its metadata.

Understanding different partial scan approaches

Sample systems scanning

Tools that provide sample system scanning are configured to select a reduced set of systems to scan from the full network environment. These tools may be quick to run, as the number of systems and therefore the overall data volumes they are scanning are restricted. However, they also lead to data blind spots, because they don’t scan all systems within the environment. 

Sample file scanning

Sample file scanning works by targeting a subset of files within a target location to scan for sensitive information. Like sample systems scanning, this can be a useful technique for gaining a quick understanding of the data types within a location or environment. However, the accuracy and completeness of scanning is limited, since a majority of files will be overlooked each time. As with sample system scanning, this approach may lead to a misleading picture of data risk and exposure. 

Partial file scanning

In partial file scanning, only the first few bytes of a file or the header rows of tables in a database is scanned, to get a broad understanding of its content. While this approach can again help speed up the scanning process, there is a risk of false negative findings - where repositories are deemed ‘clean’ or ‘low risk,’ but actually host significant quantities of sensitive or confidential information. This can lead to false sense of security and unintentional data exposure.

Metadata scanning

Metadata scanning provides a map of data attributes across structured and unstructured data stores. These scans don’t analyze the content of files and records, but read the “data about the data” stored alongside it. This includes information like the file/record size, creation date, last modified date, owner and access permissions. It may also include classification labels. 

Metadata scans can be useful for organizational data mapping, identifying the general types and locations of data across the digital estate. They can also be used to prioritize more comprehensive file scanning of higher risk locations or environments. 

Incremental scanning

Incremental scanning is an approach that follows an initial baseline scan, which must be full and comprehensive. In an incremental scan, only new or modified files within the target locations will be scanned. These scans rely on periodic full scanning to ensure that they provide dependable results. Without this, incremental scans will not include new and unknown data stores missing from its last baseline. 

Some discovery solutions may combine one or more of these approaches when scanning. For example, sample file scanning may be combined with partial file scanning – so only the first few KB of a small number of files in a location are scanned.

The problem with partial scans

Even when combined, partial scanning approaches may fail to identify sensitive data, leaving organizations with unmanaged data risk and exposed to breaches. 

This is because partial scans only ever tell part of the story. They are unable to analyze all file content, of all files and across all target systems and environments. As a result there are always blind spots where sensitive data may be hidden.

Full scanning benefits and common pitfalls

Unlike partial scanning, full scans are deep-level and analyze the full content of each file and its metadata, all files or records within target systems, and across all target locations. These scans provide complete and comprehensive information about an organization’s data landscape, its risks and exposure. 

While full scans can be complex to set up and lengthy to complete, the effort is critical for effective data security posture management (DSPM), data management and data governance. 

The primary challenge with full scans is ensuring not only that all of the digital estate is included in the scan scope, but also that the scans are run continuously to identify and interrogate new files, systems and locations as they are created. This is why some solutions will combine periodic full scans with incremental partial scans, in an attempt to balance coverage, speed and accuracy over time. 

Another common pitfall of full scanning is the alert fatigue that comes with solutions unable to accurately identify and suppress false positive findings. Addressing false positives can be time consuming and diverts focus away from true positives and potential exposure.

Further, full scanning can be resource intensive and affect operational performance while scans are running. In most organizations, any process that causes operational disruption that can’t be constrained to a short, planned timeframe is unacceptable. 

Depth and performance combined: The Ground Labs approach

With Ground Labs, you can get both scan depth and performance without having to compromise. Our DSPM and data discovery solutions offer fast, accurate discovery using full file scanning, with comprehensive estate coverage of structured and unstructured data across on-premises and cloud environments

Using proprietary GLASS Technology™, Ground Labs solutions utilize contextual information to reduce false positive rates and deep file scanning to ensure all sensitive data is uncovered. GLASS enables deep-level scanning without impacting system performance, ensuring operations are unaffected while scans are running. This enables organizations to run scans frequently and across the digital estate, for maximum visibility and control of sensitive data wherever it is stored. 

To find out how Ground Labs can support your business, arrange a demo or book a call with one of our experts today.