We should all be worried about synthetic data

Making up the world through made-up data

A stylized digital sculpture of a humanoid head, composed of blue and cyan geometric patterns, set against a black background with various abstract shapes and designs.

20th May 2022

Mikkel Krenchel

| Partner at ReD Associates, a social science-based consulting firm and a leader of technology, communications, and media practice.

Maria Cury

| Partner at ReD Associates, a social science-based consulting firm and a leader on technology practice, studying the role of new technology in daily life.

2,622 words

Read time: approx. 13 mins

Synthetic data – the use of AI to create datasets that mimic real world data – is rapidly becoming a much bigger part of our daily lives. But this form of data raises critical philosophical and ethical questions that will shape the future for all of us, write Mikkel Krenchel and Maria Cury.

There’s a data revolution happening and nobody is talking about it. It revolves around synthetic data. Unless you work in the field of artificial intelligence (AI), you may have never heard of it. But this rapidly growing form of data raises critical philosophical and ethical questions that will shape the future for all of us. First, what is synthetic data? There are many types, but the basic premise is the use of AI to create datasets that mimic real world data. These datasets can then be used to feed the insatiable need for data that trains machine learning algorithms to make better predictions. Instead of training algorithms on messy, expensive real-world data riddled with privacy issues and bias, now one can supplement or supplant real-world data with “better,” “cheaper,” or “bigger” datasets constructed using AI. Put simply, synthetic data is artificial data feeding artificial intelligence. It’s similar to deep fakes, yet used for less nefarious purposes, and applied to not only videos and images but any type of data under the sun, from insurance data, to army intelligence, self-driving vehicles, or even patient health care records. It is as awe-inspiring as it is terrifying.

___

Synthetic data is not a new concept, but what’s new is the surging demand for it and the AI capabilities to support it. Organizations across the world are investing massively in training new AI systems in hopes of changing how we learn, heal, trade, drive, buy, wage war and much more. To train these systems, they will need ever expanding quantities of data. Yet, good data is harder than ever to come by, as concerns and regulations around privacy, bias, and questions around responsible AI are finally creating some constraints on data collection. As such, Gartner predicts that by 2024 no less than 60% of all data used for AI will be synthetic. Already, 96% of teams working on computer vision rely on synthetic data and another analysis suggests that the number of companies focused on supplying synthetic data nearly doubled between 2019 and 2020 alone. It’s not hyperbole to think the role of ‘synthetic data engineer’ will be the most in-demand profession one day.

That means we are standing on the brink of a world where many of the technologies that surround us might not be built in response to reality, but to what a machine imagines that reality to be. This begs the questions: What happens if and when there are gaps between the real world the AI operates in, and the synthetic world it was trained in? How do we narrow those gaps, and what are the ethical and safety guardrails we need to put in place? If data is the new oil, as some argue, what happens if large-scale datasets become a cheap commodity that anyone with the right AI can build? What might that mean for the business models of big tech companies centered around their unique access to real-world data? And what will happen to empirical disciplines like the social sciences if we increasingly rely on data that isn’t collected in the real world?

Mikkel Krenchel
Maria Cury
20th May 2022

Want to continue reading?

Get unlimited access to insights from the world's leading thinkers.

Browse our subscription plans and subscribe to read more.

Start Free Trial

Already a subscriber? Log in

Latest Releases

Join the conversation

Anna Taylor 1 30 May 2022

AI is really the technology trend of the future, it makes us think about alternative technology.

APA	Krenchel, M. Cury, M. (2022, May 20). We should all be worried about synthetic data.IAI News. https://iai.tv/articles/we-should-all-be-worried-about-synthetic-data-auid-2138
MLA	Krenchel, Mikkel. Cury, Maria. "We should all be worried about synthetic data." IAI News, 20 May 2022. https://iai.tv/articles/we-should-all-be-worried-about-synthetic-data-auid-2138

news

We should all be worried about synthetic data

Making up the world through made-up data

Mikkel Krenchel

Maria Cury

Related Posts:

The Japanese philosophy that could save us from AI

Water, not silicon, has to be the basis of true AI

Nuclear waste heat can turn deserts into farmland

AI, Moloch, and the race to the bottom

Related Videos:

Technology and freedom

AI and the end of humanity

Competition destroys collective intelligence

Environmental science needs new politics

Want to continue reading?

Unelected elites and the limits of democracy

The age of mania and conformity, with Lionel Shriver

Morality is nothing but a story we tell ourselves

Join the conversation

We should all be worried about synthetic data

Making up the world through made-up data

Love the iai?

Related Posts:

Related Videos:

Want to continue reading?

Related Videos

Join the conversation