Data sprawl
is everywhere and is becoming a bigger problem by the second as we move into a
near real-time world. It hits people and organizations, but organizations are
on the leading edge to respond to it. Data sprawl refers to the ever-growing
amount of data produced, dealt with or aggregated from various new contexts,
events, and patterns. It is often mentioned as "big data," but I
prefer calling it monster data because of its sizable increase and speed of
propagation. It is usually spread over multiple data storage types, networks,
and applications that grow as new technologies and data types are introduced. This
short blog will cover the significant sources of data sprawl, the considerable
effects of the sprawl, and the meaningful ways of dealing with sprawl.
Primary
Sources of Data Sprawl
Operational
Applications:
There is often data sprawl built into
many organizations because of application data redundancy. It's common to have
many data sources for the main subject areas such as customers, products,
services, vendors, partners, etc., in base operational data, including their
archives that have been building for years or decades. As these systems
struggle to keep up with change, they need additional sources of data.
Analytic
Potential: Organizations are constantly collecting data
for future analysis to pick up on strategic trends, adjust tactical
policies/rules, or look for operational tweaks for performance improvement.
Often automation opportunities are hidden in the data generated by signals,
events, and patterns occurring in typical contexts and, in some cases,
divergent contexts. The wide variety of data types and context crossing
requires emergent views and new data sources. Many data warehouses, data lakes,
and oceans are being generated for hopefully valuable future analysis. It's
complicated by new and emerging data types such as voice, image, and video.
Edge
Requirements:
As organizations are driven to make decisions earlier at the edge of their
organizations, more immediate decisions, plus the data that support them, must be
gathered. Also, data must be archived for future audits and management review
of edge actions. IoT can complicate an organization's data management strategy
because it drives data issues faster than traditional edge issues. Often the
outcomes of edge decisions feed the analytic and operational data needs over
time.
Significant
Effects of Data Sprawl
Complexity:
For many
organizations, data sprawl compromises the value of the data. For example, all
business and technology professionals have to deal with data from multiple sources in multiple
formats, making operations and analysis difficult. In
addition, data can be misinterpreted or, worse yet, corrupted during data
leverage and rendering the efforts worthless or just plain wrong.
Security:
This ever-growing data monster will be
challenging to keep tabs on, thereby increasing data breaches and other
security risks. In addition, it puts organizations at risk of facing strict
penalties of emerging governance efforts such as GDPR, CCPA, or further data
protection legislation for non-compliance.
Management/Costs:
Keeping all
this data is costly and challenging to manage. The data professionals, owners,
and stewards have their hands full, keeping on top of the morphing emerging
data sources. All of this while assisting all the various uses of proven data,
much less the data with potential whose value is unknown at any point in time.
Significant
Solutions to Data Sprawl:
Shifting
to the Cloud: The data discovery and classification that would occur in a cloud
migration strategy would help organizations get their arms around what they
have. At the same time, there would be a need to build and leverage a
consolidated cloud repository where users and applications can access and store
data files with ease. At the same time, silos of data can be reduced
significantly by removing duplicate and irrelevant data. A Security audit can
be done at the same time.
Building
a Data Mesh/Fabric: A data fabric is an architecture
and a set of data services that provide consistent capabilities across a choice
of data sources that are on-premises plus in multiple cloud environments. The
fabric simplifies and integrates data management across cloud and on-premises
data resources to accelerate long-term digital transformation while serving
immediate uses. In addition, meshes/fabrics make building any data view needed
easier and quicker. A significant first step in building a data mesh/fabric is
acquiring a DBMS to support operational and analytical uses with the same data.
It is now a real possibility that many organizations are acting on at this
moment.
Building
a Meta-Data Catalog: You can’t manage what you can't see or measure. It means
that organizations need a data catalog with significant data descriptors for
data and information resources (meta-data) that is up to date and customizable.
The data discovery and classification required by cloud migration can be
leveraged to build the varied catalog needed for effective data management in
the digital world.
Net: Net:
We all know
that getting ahead of the data sprawl is ideal where policies and
procedures are in effect while gathering new data sources. Unfortunately, the
reality of the day is that it's often too late for the data sources collected
in the past. Organizations need to take steps now as it’s only going to get
worse. We all know that new digital technologies are data-hungry, so getting
them in shape for consumption is an essential business competency that needs to
be grown in most cases. We also know that the hybrid work environment will
generate volumes of new unstructured data to manage while change accelerates.
To manage and govern our growing data sources, recent efforts around data will
have to get top priority. If data is the energy source for digital progress, we
have to get going now.