Avoiding False Discoveries: A completely new addition in the second edition is a chapter on how to avoid false discoveries and produce valid results, which is novel among other contemporary textbooks on data mining. It supplements the discussions in the other chapters with a discussion of the statistical concepts (statistical significance, pvalues, false discovery rate, permutation
Introduction 1. Discuss whether or not each of the following activities is a data mining task. (a) Dividing the customers of a company according to their gender. No. This is a simple database query. (b) Dividing the customers of a company according to their profitability. No. This is an accounting calculation, followed by the application of a
Chapters 1,2 from the book "Introduction to Data Mining" by Tan Steinbach Kumar. Lecture 2: Data , preprocessing,(ppt, pdf) Chapter 3 from the book Mining Massive Datasets by Anand Rajaraman and Jeff Ullman. Chapter 8 from the book "Introduction to Data Mining" by Tan, Steinbach, Kumar. Lecture 7: Hierarchical clustering, DBSCAN, Mixture models and the EM algorithm (ppt, pdf
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 Data Mining: Data Lecture Notes for Chapter 2
Read MorePolitecnico di Torino Clustering fundamentals DataBase and Data Mining Group 4 19 DBMG Limitations of Kmeans: Differing Sizes Original Points Kmeans (3 Clusters) From: Tan,Steinbach, Kumar, Introduction to Data Mining, McGraw Hill 2006,,
