图书介绍
数据挖掘 概念与技术 英文PDF|Epub|txt|kindle电子书版本网盘下载
![数据挖掘 概念与技术 英文](https://www.shukui.net/cover/46/31439996.jpg)
- Jiawei Han,Micheline Kamber 著
- 出版社: 北京:高等教育出版社
- ISBN:704010041X
- 出版时间:2001
- 标注页数:550页
- 文件大小:31MB
- 文件页数:573页
- 主题词:
PDF下载
下载说明
数据挖掘 概念与技术 英文PDF格式电子书版下载
下载的文件为RAR压缩包。需要使用解压软件进行解压得到PDF格式图书。建议使用BT下载工具Free Download Manager进行下载,简称FDM(免费,没有广告,支持多平台)。本站资源全部打包为BT种子。所以需要使用专业的BT下载软件进行下载。如BitComet qBittorrent uTorrent等BT下载工具。迅雷目前由于本站不是热门资源。不推荐使用!后期资源热门了。安装了迅雷也可以迅雷进行下载!
(文件页数 要大于 标注页数,上中下等多册电子书除外)
注意:本站所有压缩包均有解压码: 点击下载压缩包解压工具
图书目录
Chapter 1 Introduction1
1.1 What Motivated Data Mining?Why Is It Important?1
1.2 So,What Is Data Mining?5
1.3 Data Mining-On What Kind of Data?10
1.3.1 Relational Databases10
1.3.2 Data Warehouses12
1.3.3 Transactional Databases15
1.3.4 Advanced Database Systems and Advanced Database Applications16
1.4 Data Mining functionalities—What Kinds of Patterns Can Be Mined?21
1.4.1 Concept/Class Description:Characterization and Discrimination21
1.4.2 Association Analysis23
1.4.3 Classification and Prediction24
1.4.4 Cluster Analysis25
1.4.5 Outlier Analysis25
1.4.6 Evolution Analysis26
1.5 Are All of the Patterns Interesting?27
1.6 Classification of Data Mining Systems28
1.7 Major Issues in Data Mining30
1.8 Summary33
Exercises34
Bibliographic Notes35
Chapter 2 Data Warehouse and OLAP Technology for Data mining39
2.1 What Is a Data Warehouse?39
2.1.1 Differences between Operational Database Systems and Data Warehouses42
2.1.2 But,Why Have a Separate Data Warehouse?44
2.2 A Multidimensional Data Model44
2.2.1 From Tables and Spreadsheets to Data Cubes45
2.2.2 Stars,Snowflakes,and Fact Constellations:Schemas for Multidimensional Databases48
2.2.3 Examples for Defining Star,Snowflake,and Fact Constellation Schemas52
2.2.4 Measures:Their Categorization and Computation54
2.2.5 Introducing Concept Hierarchies56
2.2.6 OLAP Operations in the Multidimensional Data Model58
2.2.7 A Starnet Query Model for Querying Multidimensional Databases61
2.3 Data Warehouse Architecture62
2.3.1 Steps for the Design and Construction of Data Warehouses63
2.3.2 A Three-Tier Data Warehouse Architecture65
2.3.3 Types of OLAP Servers:ROLAP versus MOLAP versus HOLAP69
2.4 Data Warehouse Implementation71
2.4.1 Efficient Computation of Data Cubes71
2.4.2 Indexing OLAP Data79
2.4.3 Efficient Processing of OLAP Queries81
2.4.4 Metadata Repository83
2.4.5 Data Warehouse Back-End Tools and Utilities84
2.5 Further Development of Data Cube Technology85
2.5.1 Discovery-Driven Exploration of Data Cubes85
2.5.2 Complex Aggregation at Multiple Granularities:Multifeature Cubes89
2.5.3 Other Developments92
2.6 From Data Warehousing to Data Mining93
2.6.1 Data Warehouse Usage93
2.6.2 From On-Line Analytical Processing to On-Line Analytical Mining95
2.7 Summary98
Exercises99
Bibliographic Notes103
Chapter 3 Data Preprocessing105
3.1 Why Preprocess the Data?105
3.2 Data Cleaning109
3.2.1 Missing Values109
3.2.2 Noisy Data110
3.2.3 Inconsistent Data112
3.3 Data Integration and Transformation112
3.3.1 Data Integration112
3.3.2 Data Transformation114
3.4 Data Reduction116
3.4.1 Data Cube Aggregation117
3.4.2 Dimensionality Reduction119
3.4.3 Data Compression121
3.4.4 Numerosity Reduction124
3.5 Discretization and Concept Hierarchy Generation130
3.5.1 Discretization and Concept Hierarchy Generation for Numeric Data132
3.5.2 Concept Hierarchy,Generation for Categorical Data138
3.6 Summary140
Exercises141
Bibliographic Notes142
Chapter 4 Data Mining Primitives,Languages,and System Architectures145
4.1 Data Mining Primitives:What Defines a Data Mining Task?146
4.1.1 Task-Relevant Data148
4.1.2 The Kind of Knowledge to be Mined150
4.1.3 Background Knowledge:Concept Hierarchies151
4.1.4 Interestingness Measures155
4.1.5 Presentation and Visualization of Discovered Patterns157
4.2 A Data Mining Query Language159
4.2.1 Syntax for Task-Relevant Data Specification160
4.2.2 Sysntax for Specifying the Kind of Knowledge to be Mined162
4.2.3 Sysntax for Concept Hierarchy Specification165
4.2.4 Sysntax for Interestingness Measure Specification166
4.2.5 Sysntax for Pattern Presentation and Visualization Specification167
4.2.6 Putting It All Together-An Example of a DMQL Query167
4.2.7 Other Data Mining Languages and the Standardization of Data Mining Primitives169
4.3 Designing Graphical User Interfaces Based on a Data Mining Query Language170
4.4 Architectures of Data Mining Systems171
4.5 Summary174
Exercises174
Bibliographic Notes176
Chapter 5 Concept Description:Characterization and Comparison179
5.1 What Is Concept Description?179
5.2 Data Generalization and Summarization-Based Characterization181
5.2.1 Attribute-Oriented Induction182
5.2.2 Efficient Implementation of Attribute-Oriented Induction187
5.2.3 Presentation of the Derived Generalization190
5.3 Analytical Characterization:Analysis of Attribute Relevance194
5.3.1 Why Perform Attribute Relevance Analysis?195
5.3.2 Methods of Attribute Relevance Analysis196
5.3.3 Analytical Characterization:An Example198
5.4 Mining Class Comparisons:Discriminating between Different Classes200
5.4.1 Class Comparison Methods and Implementations201
5.4.2 Presentation of Class Comparison Descriptions204
5.4.3 Class Description:Presentation of Both characterization and Comparison206
5.5 Mining Descriptive Statistical Measures in Large Databases208
5.5.1 Measuring the Central Tendency209
5.5.2 Measuring the Dispersion of Data210
5.5.3 Graph Displays of Basic Statistical Class Descriptions213
5.6 Discussion217
5.6.1 Concept Description:A Comparison with Typical Machine Learning Methods218
5.6.2 Incremental and Parallel Mining of Concept Description220
5.7 Summary220
Exercises222
Bibliographic Notes223
Chapter 6 Mining Association Rules in Large Databases225
6.1 Association Rule Mining226
6.1.1 Market Basket Analysis:A Motivating Example for Association Rule Mining226
6.1.2 Basic Concepts227
6.1.3 Association Rule Mining: A Road Map229
6.2 Mining Single-Dimensional Boolean Association Rules from Transactional Databases230
6.2.1 The Apriori Algorithm:Finding Frequent Itemsets Using Candidate Generation230
6.2.2 Generating Association Rules from Frequent Itemsets236
6.2.3 Improving the Efficiency of Apriori236
6.2.4 Mining Frequent Itemsets without Candidate Generation239
6.2.5 Iceberg Queries243
6.3 Mining Multilevel Association Rules from Transaction Databases244
6.3.1 Multilevel Association Rules244
6.3.2 Approaches to Mining Multilevel Association Rules246
6.3.3 Checking for Redundant Multilevel Association Rules250
6.4 Mining Multidimensional Association Rules from Relational Databases and Data Warehouses251
6.4.1 Multidimensional Association Rules251
6.4.2 Mining Multidimensional Association Rules Using Static Discretization of Quantitative Attributes253
6.4.3 Mining Quantitative Association Rules254
6.4.4 Mining Distance-Based Association Rules257
6.5 From Association Mining to Correlation Analysis259
6.5.1 Strong Rules Are Not Necessarily Interesting:An Example259
6.5.2 From Association Analysis to Correlation Analysis260
6.6 Constraint-Based Association Mining262
6.6.1 Metarule-Guided Mining of Association Rules263
6.6.2 Mining Guided by Additional Rule Constraints265
6.7 Summary269
Exercises271
Bibliographic Notes276
Chapter 7 Classification and Prediction279
7.1 What Is Classification?What Is Prediction?279
7.2 Issues Regarding Classification and Prediction282
7.2.1 Preparing the Data for Classification and Prediction282
7.2.2 Comparing Classification Methods283
7.3 Classification by Decision Tree Induction284
7.3.1 Decision Tree Induction285
7.3.2 Tree Pruning289
7.3.3 Extracting Classification Rules from Decision Trees290
7.3.4 Enhancements to Basic Decision Tree Induction291
7.3.5 Scalability and Decision Tree Induction292
7.3.6 Integrating Data Warehousing Techniques and Decision Tree Induction294
7.4 Bayesian Classification296
7.4.1 Bayes Theorem296
7.4.2 Naive Bayesian Classification297
7.4.3 Bayesian Belief Networks299
7.4.4 Training Bayesian Belief Networks301
7.5 Classification by Backpropagation303
7.5.1 A Multilayer Feed-Forward Neural Network303
7.5.2 Defining a Network Topology304
7.5.3 Backpropagation305
7.5.4 Backpropagation and Interpretability310
7.6 Classification Based on Concepts from Association Rule Mining311
7.7 Other Classification Methods314
7.7.1 k-Nearest Neighbor Classifiers314
7.7.2 Case-Based Reasoning315
7.7.3 Genetic Algorithms316
7.7.4 Rough Set Approach316
7.7.5 Fuzzy Set Approaches317
7.8 Prediction319
7.8.1 Linear and Multiple Regression319
7.8.2 Nonlinear Regression321
7.8.3 Other Regression Models322
7.9 Classifier Accuracy322
7.9.1 Estimating Classifier Accuracy323
7.9.2 Increasing Classifier Accuracy324
7.9.3 Is Accuracy Enough to judge a Classifier?325
7.10 Summary326
Exercises328
Bibliographic Notes330
Chapter 8 Cluster Analysis335
8.1 What Is Cluster Analysis?335
8.2 Types of Data in Cluster Analysis338
8.2.1 Interval-Scaled Variables339
8.2.2 Binary Variables341
8.2.3 Nominal,Ordinal,and Ratio-Scaled Variables343
8.2.4 Variables of Mixed Types345
8.3 A Categorization of Major Clustering Methods346
8.4 Partitioning Methods348
8.4.1 Classical Partitioning Methods:k-Means and k-Medoids349
8.4.2 Partitioning Methods in Large Databases:From k-Medoids to CLARANS353
8.5 Hierarchical Methods354
8.5.1 Agglomerative and Divisive Hierarchical Clustering355
8.5.2 BIRCH:Balanced Iterative Reducing and Clustering Using Hierarchies357
8.5.3 CURE:Clustering Using REpresentatives358
8.5.4 Chameleon:A Hierarchical Clustering Algorithm Using Dynamic Modeling361
8.6 Density-Based Methods363
8.6.1 DBSCAN:A Density-Based Clustering Method Based on Connected Regions with Sufficiently High Density363
8.6.2 OPTICS:Ordering Points To Identify the Clustering Structure365
8.6.3 DENCLUE:Clustering Based on Density Distribution Functions366
8.7 Grid-Based Methods370
8.7.1 STING:STatistical INformation Grid370
8.7.2 WaveCluster:Clustering Using Wavelet Transformation372
8.7.3 CLIQUE:Clustering High-Dimensional Space374
8.8 Model-Based Clustering Methods376
8.8.1 Statistical Approach376
8.8.2 Neural Network Approach379
8.9 Outlier Analysis381
8.9.1 Statistical-Based Outlier Detection382
8.9.2 Distance-Based Outlier Detection384
8.9.3 Deviation-Based Outlier Detection386
8.10 Summary388
Exercises389
Bibliographic Notes391
Chapter 9 Mining Complex Types of Data395
9.1 Multidimensional Analysis and Descriptive Mining of Complex Data Objects396
9.1.1 Generalization of Structured Data396
9.1.2 Aggregation and Approximation in Spatial and Multimedia Data Generalization397
9.1.3 Generalization of Object Identifiers and Class/Subclass Hierarchies399
9.1.4 Generalization of Class Composition Hierarchies399
9.1.5 Construction and Mining of Object Cubes400
9.1.6 Generalization-Based Mining of Plan Databases by Divide-and-Conquer401
9.2 Mining Spatial Databases405
9.2.1 Spatial Data Cube Construction and Spatial OLAP405
9.2.2 spatial Association Analysis410
9.2.3 Spatial Clustering Methods411
9.2.4 Spatial Classification and Spatial Trend Analysis411
9.2.5 Mining Raster Databases412
9.3 Mining Multimedia Databases412
9.3.1 Similarity Search in Multimedia Data412
9.3.2 Multidimensional Analysis of Multimedia Data414
9.3.3 Classification and Prediction Analysis of Multimedia Data416
9.3.4 Mining Associations in Mutimedia Data417
9.4 Mining Time-Series and Sequence Data418
9.4.1 Trend Analysis418
9.4.2 Similarity Search in Time-Series Analysis421
9.4.3 Sequential Pattern Mining424
9.4.4 Periodicity Analysis426
9.5 Mining Text Databases428
9.5.1 Text Data Analysis and Information Retrièval428
9.5.2 Text Mining Keyword-Based Association and Document Classification433
9.6 Mining the World Wide Web435
9.6.1 Mining the Web s Link Structures to Identify Authoritative Web pages437
9.6.2 Automatic Classification of Web Documents439
9.6.3 Construction of a Multilayered Web Information Base440
9.6.4 Web Usage Mining441
9.7 Summary443
Exercises444
Bibliographic Notes446
Chapter 10 Applications and Trends in Data Mining451
10.1 Data Mining Applications451
10.1.1 Data Mining for Biomedical and DNA Data Analysis451
10.1.2 Data Mining for Financial Data Analysis453
10.1.3 Data Mining for the Retail Industry455
10.1.4 Data Mining for the Telecommunication Industry456
10.2 Data Mining System Products and Research Prototypes457
10.2.1 How to Choose a Data Mining System458
10.2.2 Examples of Commercial Data Mining Systems461
10.3 Additional Themes on Data Mining462
10.3.1 Visual and Audio Data Mining462
10.3.2 Scientific and Statistical Data Mining464
10.3.3 Theoretical Foundations of Data Mining470
10.3.4 Data Mining and Intelligent Query Answering471
10.4 Social Impacts of Data Mining472
10.4.1 Is Data Mining a Hype or a Persistent,Steadily Growing Business?473
10.4.2 Is Data Mining Merely Managers Business or Everyone s Business?475
10.4.3 Is Data Mining a Threat to Privacy and Data Security?476
10.5 Trends In Data Mining478
10.6 Summary480
Exercises481
Bibliographic Notes483
Appendix A An Introduction to Microsoft s OLE DB for Data Mining485
A.1 Creating a DMM object486
A.2 Inserting Training Data into the Model and Training the Model488
A.3 Using the Model488
Appendix B An Introduction to DBMiner493
B.1 System Architecture494
B.2 Input and Output494
B.3 Data Mining Tasks Supported by the System495
B.4 Support for Task and Method Selection498
B.5 Support of the KDD Process499
B.6 Main Applications499
B.7 Current Status499
Bibliography501
Index533