What is data mining?
Data mining (DM) is a computer-assisted process to find patterns in big datasets. Data mining applies intricate algorithms to bring them to the surface so they could be used for solving real-world problems.
Although there are several types of data mining, they usually fall into two general categories: exploratory and predictive.
Exploratory and predictive data mining
Exploratory data mining has been known for more than 50 years. In the past century, it was widely used in statistics to determine the applicability of certain techniques for data analysis. In practical terms, it could be a tool to detect fraudulent insurance claims, such as repeated photographs of damaged goods submitted for multiple insurance cases. Another example is highlighting incorrect sampling – for instance, where 90% of respondents were women instead of the required 50%. In general, exploratory data analysis (EDA) describes data distribution, helping identify anomalies or verify hypotheses based on the graphical or non-graphical presentation of big data.
Predictive data mining is a 21st-century technology that has been around for two decades. The field evolved from the 1980s artificial intelligence research that focused on how computers can learn from large amounts of unspecified data. To stick with the example of an insurance company: by feeding all records (policy numbers, addresses, etc.) into an algorithm, you can detect specific patterns, like the anomaly high number of claims from a particular organization or persona or irregularities in specific cases. Thus, the irregularities in policy prolongation can be a signal of a low level of customer satisfaction.
To clarify the difference between exploratory data analysis and predictive data mining, we can add that the first term refers to the process of the general evaluation of raw data at a more abstract level. It is used to examine whether the collected data has some anomalies or discrepancies and whether it conforms to normal distribution or another distribution law. This can help avoid working with incomplete data samples or statistical methods only applicable to normal distribution arrays.
In predictive DM, the goal is to uncover non-obvious, multi-factor correlations between figures, especially where statistical methods are not applicable.
![](https://codelido.com/assets/files/2022-12-27/1672130915-954313-image.png)