Center for Data Innovation: “AI has a data quality problem. In a survey of 179 data scientists, over half identified addressing issues related to data quality as the biggest bottleneck in successful AI projects. Big data is so often improperly formatted, lacking metadata, or “dirty,” meaning incomplete, incorrect, or inconsistent, that data scientists typically spend 80 percent of their time on cleaning and preparing data to make it usable, leaving them with just 20 percent of their time to focus on actually using data for analysis. This means organizations developing and using AI must devote huge amounts of resources to ensuring they have sufficient amounts of high-quality data so that their AI tools are not useless. As policymakers pursue national strategies to increase their competitiveness in AI, they should recognize that any country that wants to lead in AI must also lead in data quality.
Collecting and storing data may be getting cheaper, but creating high-quality data can be costly—potentially prohibitively so to small organizations or teams of researchers, forcing them to make do with bad data and thus unreliable or inaccurate AI tools, or preventing them from using AI entirely. The private sector will of course invest in data quality, but policymakers should view increasing the amount of high-quality data as a valuable opportunity to accelerate AI development and adoption, as well as reduce the potential economic and social harms of AI built with bad data. There are three avenues for policymakers to increase the amount of high-quality data available for AI: require the government to provide high-quality data; promote the voluntary provision of high-quality data from the private and non-profit sectors; and accelerate efforts to digitize all sectors of the economy to support more comprehensive data collection…”