Ten CEOs describe what goes into turning a world of data into a data-driven world

  • By 2025, there will be 175 zettabytes of data in the global datasphere.

  • In 2022, more than 20.5 quintillion bytes of data were created every day.

  • The amount of data in the world was estimated to be 44 zettabytes at the dawn of 2020

  • At the beginning of 2020, the number of bytes in the digital universe was 40 times bigger than the number of stars in the observable universe. 

  • By 2025, the amount of data generated each day will reach 463 exabytes globally.

The Data is the new oil of 21st Century

Dynamics of Data Avalanche

What was previously unknown is now easily learned with a few questions. Decision-makers no longer have to rely just on intuition; instead, they have access to more thorough and accurate facts.

The core of this change is the use of new data sources that are fed into systems with artificial intelligence and machine learning. The amount of information circulating in both the physical world and the global economy is astounding. There are countless sources for it, including sensors, satellite imagery, online traffic, digital apps, films, and credit card transactions. Making decisions can be changed by using these kinds of facts. For example, a packaged food firm might have used focus groups and surveys in the past to create new items. In order to determine where the company should focus, it can now resort to sources like social media, transaction data, search data, and foot traffic. All of these sources may suggest that Americans have developed a love for Korean BBQ.

Every day, the potential is being realized—not just in the corporate sector, but also in the fields of public health and safety, where epidemiologists and government authorities have used data to identify the factors that contribute to the spread of COVID-19 and how to safely restore economies.
But for most firms, the sheer volume of data and a lack of experience with next-generation analytics technologies can be overwhelming.   There were five significant findings from their in-depth talk.

What kinds of new insights are possible and how the landscape is changing, five take aways

New forms of data are giving organizations unprecedented speed and transparency

When a CEO wants an answer to a complex question, a team might be able to get it in a couple of months—but that may not be good enough in a world where competition is accelerating. One of the biggest advantages of an automated, data-driven AI system is the ability to answer strategic questions quickly. “We want to take that down to an hour or so when it’s about something going on in the physical world,” says Orbital Insight founder James Crawford.

Data and AI are not only finding answers faster but creating transparency around issues that have always been murky. Consider a multinational’s desire to ensure sustainability in its supply chain. An input like palm oil is produced on millions of farms in developing nations, and it goes through thousands of refineries and mills before it reaches one of that multinational’s factories. That’s a difficult supply chain to trace. But Orbital Insight has been able to use geolocation data and satellite imagery to track the physical supply chain—not based on paperwork that may not be accurate but based on real-time snapshots of where trucks are driving and where deforestation is occurring.

Specialist firms are refining and connecting data

Since the universe of data is so broad, service providers are carving out specialized niches in which they refine a variety of complex and even messy raw sources, feeding the data into machine learning– or AI-powered tools for analysis.

Consider SafeGraph, a start-up focused exclusively on geospatial data. It specializes in gathering, cleaning, and updating data on points of interest, building footprints, and foot traffic to make it quickly usable by apps and analytics teams. Further, to get around the issue of the many quirky permutations in the way addresses are assigned around the globe, the company has introduced Placekey, a free and open universal identifier that gives every physical location a standard ID. This enables everyone to use a recognizable string when they interact—a step that will ease the merging of data sets. In the first six months after its rollout in October, more than 1,000 organizations began using and contributing to the initiative.

“We’re just an ingredient in any one solution,” says SafeGraph CEO Auren Hoffman. “It’s like selling high-quality butter to pastry chefs. The end consumer of the croissant may not even know that there’s butter in the pastry. And they certainly don’t know it’s SafeGraph butter. But the chef knows how important the ingredient is.”

Another example is Orbital Insight’s compilation of data from satellites, mobile devices, connected cars, aerial imagery, and tracking of ships at sea. All of this information feeds into an integrated platform, giving users the ability to pull out whatever is in satellite imagery and even count objects of interest automatically and connect it with other data on the platform. “We can deliver counts so you don’t have to look at every cornfield in Iowa or every road in China to figure out what the agricultural harvest is going to look like or whether people are back on the road after COVID,” says founder James Crawford.

Most non-tech companies are lagging, but new tools can get them in the race

Adapting to an era of more data-driven or even automated decision making is not always a simple proposition for people or organizations. The companies that have been fastest out of the gate already have data science chops. But according to Devaki Raj, CEO of CrowdAI, most non-tech Fortune 500 companies are stuck in pilot purgatory when it comes to sophisticated uses of systems such as computer vision and AI. “It starts with a lack of understanding of where all of their data is.”

Now a growing range of available tools and platforms can help them catch up. The number of companies working with data today is sharply higher than it was even five years ago. Back then, it took a world-class engineer to extract value from that information, and non-tech companies had difficulty attracting the few at the cutting edge of data science. But new platforms and analytics tools are leveling the playing field—as is the vast array of data that is free, open, or available at relatively low cost. Now, according to SafeGraph’s Hoffman, “People are going to be able to dive into data and analyze it in a way that just a few years ago only the most advanced engineer could do.”

For example, CrowdAI’s platform to build custom computer vision models for non-data scientists makes it possible for organizations at all technological maturities to benefit from advances in AI. “The critical test for our product team has always been the ease of use by someone who works on a factory floor, who looks at the imagery day in and day out but has likely never heard of Python,” notes Raj.

It takes domain experts to extract the real value from data

Data science teams can build models with miraculous capabilities, but it’s unlikely that they can solve highly specific business problems on their own. Data engineers and scientists may not understand the subtleties of what to look for—and that’s why it’s critical to pair them with domain experts who do. “To be effective, automation needs to be informed by those closest to the problem,” says CrowdAI’s Devaki Raj.

On-the-ground business knowledge is especially important when it comes to interpreting data from other countries. “As a transactional data provider for emerging markets, we cover places like Southeast Asia, Brazil, and Greater China,” says Measurable AI’s Heatherm Huang. “You need to adopt different languages and compliance standards in different regions. You need to know that people in China don’t use email that much, for instance, or credit card adoption in Indonesia is still pretty low at this moment.” Even if the data provider accounts for those nuances, the end consumer of that information has to go deeper into the local business logic of different cultures to avoid coming away with mistaken conclusions.

Companies need to build in privacy safeguards and AI ethics from the start

The utility of data versus the right to personal privacy is one of the biggest balancing acts facing society. There is enormous value in using personal data such as health indicators or geolocation tracking for understanding trends. But people have a legitimate desire to not be tracked. Companies that work with data typically promise that it is anonymized and aggregated, but not all of them have the same standards and cybersecurity protections.

“The mantra for us is institutional transparency and individual privacy,” says Orbital Insight’s James Crawford. “We created a privacy statement on our website and put it into the terms of use of our platform. And we actually put monitoring into the platform so that we can stop users from tracking individuals.”

Heatherm Huang of Measurable AI approaches the issue by asking consumers to opt in—and giving them an explicit incentive to do so. “If the alternative data economy is to be sustainable, it has to value the people who contribute the data.” His company’s Measurable Data Token rewards users in cryptocurrency for sharing their data points. It’s built on blockchain, which also helps to verify but anonymize transactions.

SafeGraph’s Auren Hoffman is optimistic that technology itself can address this issue, noting recent advances in areas such as differential privacy, homomorphic encryption, and synthetic data. These technologies could conceivably enable the ability to connect individual-level data, analyze it, and then use it in a way that doesn’t give away any individual-level information. “It’s going to yield an incredible amount of innovation. Over the next few years, we’ll be able to have our cake and eat it, too.”