Introduction to Data Mining

Parteek Bhatia

Introduction to Data Mining

2019

Parteek Bhatia

Chapter Objectives ✓ To learn about the concepts of data mining. ✓ To understand the need for, and the applications of data mining ✓ To differentiate between data mining and machine learning ✓ To understand the process of data mining. ✓ To understand the difference between data mining and machine learning. Introduction to Data Mining In the age of information, an enormous amount of data is available in different industries and organizations. The availability of this massive data is of no use unless it is transformed into valuable information. Otherwise, we are sinking in data, but starving for knowledge. The solution to this problem is data mining which is the extraction of useful information from the huge amount of data that is available. Data mining is defined as follows: ‘Data mining is a collection of techniques for efficient automated discovery of previously unknown, valid, novel, useful and understandable patterns in large databases. The patterns must be actionable so they may be used in an enterprise's decision making.’ From this definition, the important take aways are: • Data mining is a process of automated discovery of previously unknown patterns in large volumes of data. • This large volume of data is usually the historical data of an organization known as the data warehouse. • Data mining deals with large volumes of data, in Gigabytes or Terabytes of data and sometimes as much as Zetabytes of data (in case of big data). • Patterns must be valid, novel, useful and understandable. • Data mining allows businesses to determine historical patterns to predict future behaviour. • Although data mining is possible with smaller amounts of data, the bigger the data the better the accuracy in prediction. • There is considerable hype about data mining at present, and the Gartner Group has listed data mining as one of the top ten technologies to watch. Need of Data Mining Data mining is a recent buzz word in the field of Computer Science. It is a computing process that uses intelligent mathematical algorithms to extract the relevant data and computes the probability of future actions. It is also known as Knowledge Discovery in Data (KDD).

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations