Jim Sinur: Reducing Data Sprawl

Data sprawl is everywhere and is becoming a bigger problem by the second as we move into a near real-time world. It hits people and organizations, but organizations are on the leading edge to respond to it. Data sprawl refers to the ever-growing amount of data produced, dealt with or aggregated from various new contexts, events, and patterns. It is often mentioned as "big data," but I prefer calling it monster data because of its sizable increase and speed of propagation. It is usually spread over multiple data storage types, networks, and applications that grow as new technologies and data types are introduced. This short blog will cover the significant sources of data sprawl, the considerable effects of the sprawl, and the meaningful ways of dealing with sprawl.

Primary Sources of Data Sprawl

Operational Applications: There is often data sprawl built into many organizations because of application data redundancy. It's common to have many data sources for the main subject areas such as customers, products, services, vendors, partners, etc., in base operational data, including their archives that have been building for years or decades. As these systems struggle to keep up with change, they need additional sources of data.

Analytic Potential: Organizations are constantly collecting data for future analysis to pick up on strategic trends, adjust tactical policies/rules, or look for operational tweaks for performance improvement. Often automation opportunities are hidden in the data generated by signals, events, and patterns occurring in typical contexts and, in some cases, divergent contexts. The wide variety of data types and context crossing requires emergent views and new data sources. Many data warehouses, data lakes, and oceans are being generated for hopefully valuable future analysis. It's complicated by new and emerging data types such as voice, image, and video.

Edge Requirements: As organizations are driven to make decisions earlier at the edge of their organizations, more immediate decisions, plus the data that support them, must be gathered. Also, data must be archived for future audits and management review of edge actions. IoT can complicate an organization's data management strategy because it drives data issues faster than traditional edge issues. Often the outcomes of edge decisions feed the analytic and operational data needs over time.

Significant Effects of Data Sprawl

Complexity:

For many organizations, data sprawl compromises the value of the data. For example, all business and technology professionals have to deal with data from multiple sources in multiple

formats, making operations and analysis difficult. In addition, data can be misinterpreted or, worse yet, corrupted during data leverage and rendering the efforts worthless or just plain wrong.

Security:

This ever-growing data monster will be challenging to keep tabs on, thereby increasing data breaches and other security risks. In addition, it puts organizations at risk of facing strict penalties of emerging governance efforts such as GDPR, CCPA, or further data protection legislation for non-compliance.

Management/Costs:

Keeping all this data is costly and challenging to manage. The data professionals, owners, and stewards have their hands full, keeping on top of the morphing emerging data sources. All of this while assisting all the various uses of proven data, much less the data with potential whose value is unknown at any point in time.

Significant Solutions to Data Sprawl:

Shifting to the Cloud: The data discovery and classification that would occur in a cloud migration strategy would help organizations get their arms around what they have. At the same time, there would be a need to build and leverage a consolidated cloud repository where users and applications can access and store data files with ease. At the same time, silos of data can be reduced significantly by removing duplicate and irrelevant data. A Security audit can be done at the same time.

Building a Data Mesh/Fabric: A data fabric is an architecture and a set of data services that provide consistent capabilities across a choice of data sources that are on-premises plus in multiple cloud environments. The fabric simplifies and integrates data management across cloud and on-premises data resources to accelerate long-term digital transformation while serving immediate uses. In addition, meshes/fabrics make building any data view needed easier and quicker. A significant first step in building a data mesh/fabric is acquiring a DBMS to support operational and analytical uses with the same data. It is now a real possibility that many organizations are acting on at this moment.

Building a Meta-Data Catalog: You can’t manage what you can't see or measure. It means that organizations need a data catalog with significant data descriptors for data and information resources (meta-data) that is up to date and customizable. The data discovery and classification required by cloud migration can be leveraged to build the varied catalog needed for effective data management in the digital world.

Net: Net:

We all know that getting ahead of the data sprawl is ideal where policies and procedures are in effect while gathering new data sources. Unfortunately, the reality of the day is that it's often too late for the data sources collected in the past. Organizations need to take steps now as it’s only going to get worse. We all know that new digital technologies are data-hungry, so getting them in shape for consumption is an essential business competency that needs to be grown in most cases. We also know that the hybrid work environment will generate volumes of new unstructured data to manage while change accelerates. To manage and govern our growing data sources, recent efforts around data will have to get top priority. If data is the energy source for digital progress, we have to get going now.

Jim Sinur

Wednesday, September 8, 2021

Reducing Data Sprawl

1 comment: