6 reasons why Data matters for AI


Datasphere Initiative


Join Our Newsletter

Subscribe to the Datasphere Pulse for updates on data governance news and activities.

If you work in the world of digital tech, Artificial Intelligence (AI) has been on your mind for quite a while. And even if you don’t work in the digital tech world, we know you have been hearing a lot about AI recently.

Still, in the last two years, policy initiatives, political attention, investments, and products and services across sectors, from health to agriculture, have been booming with both excitement and fear around AI developments. While more nuance to this discussion would be beneficial, all the attention leads to much-needed policy and technical work in AI.

However, more often than not the debate and actions are sidelining what sits at the core of AI itself: data. How we collect, process, and use data impacts AI tremendously. The data we feed and train AI technologies with heavily determines the outcomes it can produce (both positive and negative). Therefore, data governance not only shapes AI governance but, ultimately, it also shapes how we will fairly distribute the benefits and mitigate the risks associated with AI.

This blog shares six reasons why data and data governance matter for AI and what this could mean for people and the planet.

1. Acquisition and Quality 

Without properly collected and cleaned data, AI algorithms lack the necessary inputs to learn and make accurate predictions. This can be problematic, for instance, when AI is used to predict extreme weather events. Without properly collected and clean data sets, we risk false extrapolations or missed mitigation, leading to losses in agriculture or even natural disasters.

The process known as data labeling helps AI algorithms understand patterns and make informed predictions based on the labeled data. As in the case of the medical sector, precision labeling ensures AI systems can differentiate between healthy and diseased tissues effectively. This accuracy is critical for accurate diagnoses and treatment recommendations to enhance healthcare outcomes and patient well-being. Overall, high-quality data promotes the reliability and trustworthiness of AI systems, enhancing their effectiveness in various applications. AI health applications rely on trust and integrity to provide assurances to patients and their families. If the data is of low quality or does not provide the full picture, disease treatments and cures can be limited.

2. Representation and Fairness

Well-structured representative data and carefully engineered features enable AI models to extract meaningful insights and make better predictions. For example, comprehensive datasets containing diverse socioeconomic and geographic information can help AI algorithms understand urban development patterns and identify areas where infrastructure improvements are needed, resulting in more equitable and sustainable cities.

Addressing biases in training data promotes equitable outcomes and helps AI systems avoid perpetuating existing societal inequalities. For example, in hiring processes, mitigating bias in resume data ensures fair consideration for all applicants regardless of their demographical factors. This is necessary to foster a fairer decision-making process and reduce disparities in opportunities. However, in some cases, data is so biased that the option should be to start again from scratch and ensure the issues addressed here are considered when first designing data collection for a certain purpose. 

3. Privacy and Security

Protecting data from unauthorized access and misuse is crucial for building trust and ensuring compliance with privacy regulations in AI systems. AI is increasingly being applied on social media platforms, and training data that are used to build algorithms could be data on internet users, their browsing history, and whether they click on certain advertisements. To safeguard users’ personal information, improve their online experience, and bolster their trustworthiness in AI-driven content recommendations and interactions, robust data governance practices and transparent data handling procedures could help mitigate risks and foster accountability in AI-driven content.

4. Visualization and Interpretability

Transparent representations of data-driven information foster understanding and trust, enabling effective collaboration between AI systems and human users. In financial analysis, AI-generated data visualizations, such as risk assessment and market trends, alongside straightforward explanations, enable investors to make informed decisions.

5. Data-Driven Decision-Making

AI systems leverage vast amounts of data to provide insights and support informed decision-making across various domains. In environmental conservation, for example, AI could analyze diverse datasets to support habitat restoration efforts. Thus, AI could enable policymakers to develop evidence-based policies, enhancing resilience, sustainability, and global efforts against climate change.

6. Data Governance

Ensuring transparency, accountability, and ethical data use in AI development and deployment promotes trust, fairness, and responsible innovation. Solid data governance frameworks safeguard user privacy and preserve data integrity in the tech sector, fostering trust in AI-driven technologies. Sandboxes for data offer enormous potential to enhance data governance for AI. In fact, ​​sandboxes present a unique opportunity to explore the intersection of AI technologies and regulatory frameworks, such as the AI Act in the European Union, through policy experimentation. By creating controlled environments for testing AI systems within specified regulatory parameters, sandboxes enable policymakers to assess the effectiveness of proposed regulations, identify potential loopholes, and iteratively refine policies to ensure they align with ethical standards, legal requirements, and data governance frameworks.

Approaching data and AI governance through agile methods like sandboxes could further foster governance innovation while ensuring that AI development remains accountable, transparent, and responsible.


We need to talk about data in its own right. Data is a critical element we need to get right if we are to address the most urgent challenges of the 21st century. Understanding its nature and being purposeful with its governance is crucial to ensure benefits exist, are distributed, and harm is curtailed. This is not a simple task, and siloed discussions are wrongly limiting the debate to a simplistic binary alternative of all closed or all open.  

In the spirit of global cooperation, a cross-sectoral global effort is needed to steer humanity towards a future where data serves as a catalyst for collective advancement for responsible AI that works for people and the planet. At the UN level, an International Decade for Data for People and Planet could help countries recognize the potency of data and its pivotal role in fueling advancements in data-intensive technologies. In other fora, such as the G20, an immediate starting point could be the establishment of a Data20 (D20) to bridge different stakeholder groups and countries that engage across the G20 around their common concern for data and develop a common approach to responsibly unlock the value of data for all.

See what else DI is doing on data and AI.

Join Our Newsletter

Subscribe to the Datasphere Pulse for updates on data governance news and activities.

The Datasphere Initiative is a global network of stakeholders fostering a holistic and innovative approach to data governance to build agile frameworks to responsibly unlock the value of data for all.

@2024. Datasphere Initiative. All rights reserved
Privacy Policy