What is more important when tackling an Artificial Intelligence project, the quality or quantity of data? We’ll tell you about it in this article.
The data boom and the possibilities it offers
About 2.5 trillion bytes of data are generated every day in the world. These figures have been increasing for years due to the hyper-connectivity in which we live, induced by digitalization, Internet of Things and social networks. Big Data ecosystems are capable of capturing, storing and managing large amounts of data. The basis to be able to analyze their information, and extract the value of them. This fact is a real gold mine for companies, which can extract value from the data to improve processes, minimize costs or maximize profits.
According to the Dell EMC Global Data Protection Index, the amount of data organizations manage has increased by 569% from 2016 to 2018. This large amount of available information helps the data analysis process to improve business decision making.
The different advanced analytics and Artificial Intelligence techniques help us to better understand business processes. They help us to know what happened (Descriptive Analytics), why it happened (Diagnostic Analytics), what will happen in the future (Predictive Analytics), and which is the best decision to make among all the possible ones (Prescriptive Analytics).
Data quality is decisive for results
But this large amount of available information is also a challenge. Nearly 80% of the data generated is erroneous or incomplete and therefore of no value to business decision making.
Data quality is important when applying Artificial Intelligence techniques, because the results of these solutions will be as good or bad as the quality of the data used.
Entering erroneous or biased data carries risks. The algorithms that feed systems based on Artificial Intelligence can only assume that the data to be analyzed are reliable. Then, if they are erroneous, the results will be misleading and the decision-making process will be compromised.
What is better then, quality or quantity of data?
In general, more data leads to more reliable models and therefore better results, but as long as the data is real and representative. It is preferable to use less data, rather than more volume but with poor quality. Although sometimes the amount of quality data is insufficient to train and model the problem to be solved, and therefore provide a solution based on Data Analytics and Artificial Intelligence.
Another recurring problem is that, although the data set to be analyzed is sufficient to take full advantage of Artificial Intelligence systems, there is always a tendency to collect additional data due to the low cost of storage and processing power. The current trend of generating and storing large volumes of information does not seem to diminish in the future. That is why it is important for companies to establish a set of rules and procedures that define and regulate how the data will be treated. To facilitate data governance and ensure the success of advanced analytics and AI solutions.