Data mining can be defined as the process of selection, explorationand modelling of large databases, in order to discover models andpatterns. The increasing availability of data in the currentinformation society has led to the need for valid tools for itsmodelling and analysis. Data mining and applied statistical methodsare the appropriate tools to extract such knowledge from data.Applications occur in many different fields, including statistics,computer science, machine learning, economics, marketing andfinance. This book is the first to describe applied data mining methodsin a consistent statistical framework, and then show how they canbe applied in practice. All the methods described are eithercomputational, or of a statistical modelling nature. Complexprobabilistic models and mathematical tools are not used, so thebook is accessible to a wide audience of students and industryprofessionals. The second half of the book consists of nine casestudies, taken from the author's own work in industry, thatdemonstrate how the methods described can be applied to realproblems.
- Provides a solid introduction to applied data mining methods ina consistent statistical framework
- Includes coverage of classical, multivariate and Bayesianstatistical methodology
- Includes many recent developments such as web mining,sequential Bayesian analysis and memory based reasoning
- Each statistical method described is illustrated with real lifeapplications
- Features a number of detailed case studies based on appliedprojects within industry
- Incorporates discussion on software used in data mining, withparticular emphasis on SAS
- Supported by a website featuring data sets, software andadditional material
- Includes an extensive bibliography and pointers to furtherreading within the text
- Author has many years experience teaching introductory andmultivariate statistics and data mining, and working on appliedprojects within industry
A valuable resource for advanced undergraduate and graduatestudents of applied statistics, data mining, computer science andeconomics, as well as for professionals working in industry onprojects involving large volumes of data - such as in marketing orfinancial risk management.