Data downtime nearly doubles as professionals wrestle with quality issues, survey finds

Join top executives in San Francisco on July 11-12 to hear how leaders are integrating and optimizing AI investments for success.. Learn more

Data is critical to all businesses, but when the volume of information and the complexity of pipelines grow, things are bound to fail!

According to a new survey Out of 200 data professionals working in the US, data downtime instances (periods in which company data continues to be missing, inaccurate, or inaccessible) have nearly doubled year-over-year, given the increase in the number of quality incidents and the extinction time taken by teams.

The survey, commissioned by the data observability company Monte Carlo and led by Wakefield Inquiry in March 2023, highlights a critical gap that must be addressed as organizations compete to obtain as many data assets as they can to build downstream AI and analytics applications for business-critical functions and decision-making.

“More data more complexity equals more opportunities for data to break. A higher proportion of data incidents are also being detected as data becomes more integral to organizations’ revenue-generating operations. This means that business users and data consumers are more likely to detect incidents that data teams miss,” Lior Gavish, Co-Founder and CTO of Monte Carlohe tells VentureBeat.


transform 2023

Join us in San Francisco on July 11-12, where top executives will share how they’ve integrated and optimized AI investments to achieve success and avoid common pitfalls.

Register now

The drivers of data downtime

Basically, the survey attributes the increase in data downtime to three key factors: an increasing number of incidents, more time to detect them, and more time to resolve problems.

Of the 200 respondents, 51% said they see 1-20 data incidents in a typical month, 20% reported 20-99 incidents, and 27% said they see at least 100 data incidents each month. This is consistently higher than last year’s figures, with the average number of monthly incidents witnessed by an organization growing to 67 this year from 59 in 2022.

As bad data cases increase, teams are also taking more time to find and fix problems. Last year, 62% of respondents said it typically took them four hours or more on average to detect a data incident, while this year the number rose to 68%.

Similarly, to resolve incidents after discovery, 63% said it typically takes four hours or more, up from 47% last year. Here, the average time to resolution for a data incident has gone from 9 hours to 15 hours year over year.

Manual approaches are to blame, not the engineers

While it’s easy enough to blame data engineers for failing to ensure quality and taking too long to fix things, it’s important to understand that the problem is not the talent but the task at hand. As Gavish points out, engineers are faced with not only vast amounts of fast-moving data, but also constantly changing approaches to how it is emitted from sources and consumed by the organization, which cannot always be controlled.

“The most common mistake teams make in that regard is to rely solely on manual testing of static data. It’s the wrong tool for the job. That kind of approach requires your team to anticipate and write a test for all the ways data can be corrupted in each data set, which is time consuming and doesn’t help with resolution,” he explains.

Instead of these tests, the CTO said, teams should consider automating data quality by deploying machine learning monitors to detect data update, volume, schema, and distribution issues wherever they occur in the pipeline.

This can give enterprise data analysts a holistic view of data reliability for critical business and data product use cases in near real-time. Additionally, when something goes wrong, monitors can send alerts, allowing teams to address the issue not only quickly but long before it leaves a significant business impact.

Sticking to the basics is still important

In addition to ML-based monitors, teams must also stick to certain basics to prevent data downtime, starting with focus and prioritization.

“Data generally follows the Pareto principle, 20% of the data sets provide 80% of the business value, and 20% of those data sets (not necessarily the same ones) cause 80% of the data quality problems. the data. Make sure you can identify those high-value and problematic data sets and be aware of when they change over time,” Gavish said.

In addition, tactics such as creating data SLAs (service level agreements), establishing clear lines of ownership, writing documentation and performing autopsies can also be helpful, he added.

Currently, Monte Carlo and big eye sit as major players in the rapidly maturing AI-powered data observability space. Other players in the category are a bunch of upstarts like data band, data fold, Validium, sodaand accessed.

That said, it’s imperative to note that teams don’t necessarily need a third-party ML observability solution to ensure quality and reduce data downtime. They can also choose to build in-house if they have the time and resources. According to the Monte Carlo-Wakefield survey, it takes an average of 112 hours (around two weeks) to develop such a tool in-house.

Although the market for specific data observation tools is still developing, Future Market Outlook Research suggests that the broader market for observability platforms is expected to grow from $2.17 billion in 2022 to $5.55 billion in 2032, at a CAGR of 8.2%.

VentureBeat’s mission is to be a digital public square for technical decision makers to gain insights into transformative business technology and transact. Discover our informative sessions.


Scroll to Top