Blog Post

Data Discovery for AWS

BY Anne Turner | 13 July 2023

In this post we’ll be discussing the importance of performing regular data discovery for AWS storage across the different storage types available.

The agility and flexibility of networking and storage offered by services such as Amazon’s AWS are hugely attractive to businesses. However, numerous data breaches have been reported resulting from insecure S3 storage buckets and other exposed locations.

With global data protection and privacy legislation placing data security as a key business obligation, and security standards such as PCI DSS v4.0 achievable only with effective segmentation and data management practices, it’s more important than ever to be able to identify, manage and mitigate any potential data risks across all parts of the network. This includes both on- and off-premises physical networks and systems, as well as cloud-based environments such as AWS.

AWS Security Starts With Data Discovery

While there is plenty of advice and guidance available explaining how to configure and architect secure AWS environments from scratch, many organizations have exposed storage locations resulting from legacy deployments.

It’s these locations that pose the greatest risk, because in many cases, the business doesn’t know they exist nor the data they host.

Data discovery scanning for AWS comprises two initial steps:

Knowing where to scan
Knowing what you’re scanning for

Where to scan

With cloud environments like AWS, where business units can set up new services independently, it can be tricky to keep on top of the services and storage locations operating at any one time. However, this step is important because it identifies where you need to scan for your data.

There are a couple of places to start:

Running a billing report from the Master Payer account. This can be separated by AWS sub-account to identify storage services within each account.
Using security reader privileges set up per AWS account to view all services operating under the account. This allows you to identify S3 buckets, RDS (managed database instances), EC2 and more.

Once you have your list of storage services, you can then scan them to understand the data they host.

Scanning for data discovery

This is where data discovery comes in.

Discovery tools scan data storage environments to identify any high-risk data types. For privacy compliance, this would be personal data; for PCI DSS, this is payment card information; and for software development houses, this might be pre-release and proprietary source code.

Data discovery is a crucial step, not only to identify and manage high-risk data but also to highlight the storage repositories in which it’s located so that they can be secured with appropriate resource management controls.

Ground Labs’ highly customizable solutions make cloud-based discovery simple. We help organizations discover their data across AWS services including EC2, S3 storage buckets and AWS-managed databases including MySQL, PostgreSQL, Oracle and more.

Find out how data discovery helps organizations minimize data risk during digital transformation in our free e-guide, A Complete Guide to Minimizing Digital Transformation Risk With Data Discovery

Data Discovery for AWS

AWS Security Starts With Data Discovery

Where to scan

Scanning for data discovery

Share this article!

Want to keep up with all our blog posts? Subscribe to our newsletter!