图书简介
This book offers comprehensive coverage of Big Data tools, terminologies and technologies for researchers, business professionals and graduates. This book begins with an overview of what Big Data is and emphasizes all the key concepts of big data end to end. Big Data concepts, technologies, terminologies and storing, processing and analysis techniques and much more - are all logically organized and reinforced by diagrams and case studies. This book refines readers understanding of Big Data with in-depth analysis of key concepts. The case studies provided in this book give insight on key concepts. The initial chapters of the book shed light on various characteristics of Big Data that distinguish it from traditional Database Management systems. Big Data Analytics are covered in detail in a separate chapter. Hadoop, the heart of Big Data is handled in the Big Data processing chapter and a deep understanding of its concepts is provided.
Big Data - concepts, Technology and Architecture. 1 Book Description.. 11 1.1 Understanding Big Data. 13 1.2 Evolution of Big Data. 14 1.3 Failure of Traditional database in handling Big Data. 15 1.3 (a) Data Mining Vs Big Data. 16 1.4 3 V’s of Big Data. 17 1.4.1 Volume. 17 1.4.2 Velocity. 18 1.4.3 Variety. 19 1.5 Sources of Big Data. 19 1.6 Different Types of Data. 21 1.6.1 Structured Data. 22 1.6.2 Unstructured Data. 22 1.6.3 Semi-Structured Data. 23 1.7 Big Data Infrastructure. 24 1.8 Big Data Life Cycle. 25 1.8.1 Big Data Generation. 26 1.8.2 Data Aggregation. 26 1.8.3 Data Preprocessing. 27 1.7.3Big Data Analytics. 31 1.7.4 Visualizing Big Data. 32 1.8 Big Data Technology. 32 1.8.1 Challenges faced by Big Data technology. 34 1.8.1 Heterogeneity and incompleteness. 34 1.8.2 Volume and velocity of the Data. 35 1.8.3 Data Storage. 35 1.8.4 Data Privacy. 36 1.9 Big Data Applications. 36 1.10 Big Data Use Cases. 37 1.9. 1 Healthcare. 37 1.9.2 Telecom.. 38 1.9.3 Financial Services. 39 Chapter 1 refresher: 40 Conceptual short Questions with answers. 43 Frequently asked Interview questions. 45 Chapter Objective. 46 Big Data Storage Concepts. 46 2.1 Cluster computing. 47 2.1.1 Types of cluster. 49 2.1.1.1 High availability cluster. 50 2.1.1.2 Load balancing cluster. 50 2.1.2 Cluster structure. 51 2.3 Distribution Models. 53 2.3.1 Sharding. 54 2.3.2 Data Replication. 56 2.3.2.1 Master-Slave model 57 2.3.2.2 Peer-to-Peer model 58 2.3.3 Sharding and Replication. 59 2.4 Distributed file system.. 60 2.5 Relational and Non Relational Databases. 61 CoursesOffered. 62 Figure 2.12 Data divided across multiple related tables. 62 2.4.2 RDBMS Databases. 63 2.4.3 NoSQL Databases. 63 2.4.4 NewSQL Databases. 64 2.5 Scaling Up and Scaling Out Storage. 65 Chapter 2 refresher. 67 Conceptual short questions with answers. 69 Chapter Objective. 72 3.1 Introduction to NoSQL. 72 3.2 Why NoSQL. 72 3.3 CAP theorem.. 73 3.4 ACID.. 75 3.5 BASE. 76 3.6 Schemaless Database. 77 3.7 NoSQL (Not Only SQL) 77 3.7.1 NoSQL Vs RDBMS. 78 3.7.2Features of NoSQL database. 79 3.7.3Types of NoSQL Technologies. 80 3.7.3.1 Key-Value store database. 81 3.7.3.2 Column-store database. 82 3.7.3.3 Document Oriented Database. 84 3.7.3.4 Graph-oriented Database. 86 3.7.4 NoSQL Operations. 93 3.9 Migrating from RDBMS to NoSQL. 98 Chapter 3 refresher. 99 Conceptual short questions with answers. 102 Chapter Objective. 104 4.1 Data Processing. 104 4.2 Shared Everything Architecture. 106 4.2.1 Symmetric multiprocessing architecture. 107 4.2.2 Distributed Shared memory. 108 4.3 Shared nothing architecture. 109 4.4 Batch Processing. 110 4.5 Real-Time Data Processing. 111 4.6 Parallel Computing. 112 4.7 Distributed Computing. 113 4.8 Big Data Virtualization. 113 4.8.1 Attributes of Virtualization. 114 4.8.1.1 Encapsulation. 115 4.8.1.2 Partitioning. 115 4.8.1.3 Isolation. 115 4.8.2Big Data Server Virtualization. 116 4.9 Introduction. 116 4.10 Cloud computing types. 118 4.11Cloud Services. 120 4.12 Cloud Storage. 121 4.12.1 Architecture of GFS. 121 4.12.1.1 Master. 123 4.12.1.2 Client. 123 4.13 Cloud Architecture. 127 Cloud Challenges. 129 Chapter 4 Refresher. 130 Conceptual short questions with answers. 133 Chapter Objective. 139 5.1 Apache Hadoop. 139 5.1.1 Architecture of Apache Hadoop. 140 5.1.2Hadoop Ecosystem Components Overview.. 140 5.2 Hadoop Storage. 142 5.2.1HDFS (Hadoop Distributed File System). 142 5.2.2Why HDFS?. 143 5.2.3HDFS Architecture. 143 5.2.4HDFS Read/Write Operation. 146 5.2.5Rack Awareness. 148 5.2.6Features of HDFS. 149 5.2.6.1Cost-effective. 149 5.2.6.2Distributed storage. 149 5.2.6.3Data Replication. 149 5.3 Hadoop Computation. 149 5.3.1MapReduce. 149 5.3.1.1Mapper. 151 5.3.1.2Combiner. 151 5.3.1.3 Reducer. 152 5.3.1.4 JobTracker and TaskTracker. 153 5.3.2 MapReduce Input Formats. 154 5.3.3 MapReduce Example. 156 5.3.4 MapReduce Processing. 157 5.3.5 MapReduce Algorithm.. 160 5.3.6 Limitations of MapReduce. 161 5.4Hadoop 2.0. 161 5.4.1Hadoop 1.0 limitations. 162 5.4.2 Features of Hadoop 2.0. 163 5.4.3 Yet Another Resource Negotiator (YARN). 164 5.4.3 Core components of YARN.. 165 5.4.3.1 ResourceManager. 165 5.4.3.2 NodeManager. 166 5.4.4 YARN Scheduler. 169 5.4.4.1 FIFO scheduler. 169 5.4.4.2 Capacity Scheduler. 170 5.4.4.3 Fair Scheduler. 170 5.4.5 Failures in YARN.. 171 5.4.5.1ResourceManager failure. 171 5.4.5.2 ApplicationMaster failure. 172 5.4.5.3 NodeManagerFailure. 172 5.4.5.4 Container Failure. 172 5.3 HBASE. 173 5.4 Apache Cassandra. 176 5.5 SQOOP. 177 5.6 Flume. 179 5.6.1 Flume Architecture. 179 5.6.1.1 Event. 180 5.6.1.2 Agent. 180 5.7 Apache Avro. 181 5.8 Apache Pig. 182 5.9 Apache Mahout. 183 5.10 Apache Oozie. 183 5.10.1 Oozie Workflow.. 184 5.10.2 Oozie Coordinators. 186 5.10.3 Oozie Bundles. 187 5.11 Apache Hive. 187 5.11 Apache Hive. 187 Hive Architecture. 189 Hadoop Distributions. 190 Chapter 5refresher. 191 Conceptual short questions with answers. 194 Frequently asked Interview Questions. 199 Chapter Objective. 200 6.1 Terminologies of Big Data Analytics. 201 Data Warehouse. 201 Business Intelligence. 201 Analytics. 202 6.2 Big Data Analytics. 202 6.2.1 Descriptive Analytics. 204 6.2.2 Diagnostic Analytics. 205 6.2.3 Predictive Analytics. 205 6.2.4 Prescriptive Analytics. 205 6.3 Data Analytics Lifecycle. 207 6.3.1 Business case evaluation and Identify the source data. 208 6.3.2 Data preparation. 209 6.3.3 Data Extraction and Transformation. 210 6.3.4 Data Analysis and visualization. 211 6.3.5 Analytics application. 212 6.4 Big Data Analytics Techniques. 212 6.4.1 Quantitative Analysis. 212 6.4.3 Statistical analysis. 214 6.4.3.1 A/B testing. 214 6.4.3.2 Correlation. 215 6.4.3.3 Regression. 218 6.5 Semantic Analysis. 220 6.5.1 Natural Language Processing. 220 6.5.2 Text Analytics. 221 6.7 Big Data Business Intelligence. 222 6.7.1 Online Transaction Processing (OLTP). 223 6.7.2 Online Analytical Processing (OLAP). 223 6.7.3 Real-Time Analytics Platform (RTAP). 224 6.6Big Data Real Time Analytics Processing. 225 6.7 Enterprise Data Warehouse. 227 Chapter 6 Refresher. 228 Conceptual short questions with answers. 230 Chapter Objective. 233 7.1 Introduction to Machine learning. 233 7.2 Machine learning use cases. 234 7.3 Types of Machine learning. 235 7.3.1 Supervised machine learning algorithm.. 236 7.3.1.1 Classification. 237 7.3.1.2 Regression. 238 Support vector machines (SVM). 239 Big Data Analytics Practical Application. 244 Chapter 7 Refresher. 245 Conceptual short questions with answers. 247 Chapter Objective. 249 8.1 Itemset Mining. 249 8.2 Association Rules. 255 8.3 Frequent itemset generation. 259 8.4 Itemset Mining Algorithms. 260 8.4.1 Apriori Algorithm.. 260 8.4.1.2 Frequent Itemset generation using Apriori Algorithm.. 266 8.4.2 Eclat Algorithm - Equivalence Class Transformation Algorithm.. 268 8.4.3 FP growth algorithm.. 271 8.5 Maximal and Closed Frequent Itemset. 278 Mining Closed Frequent Itemsets: Charm Algorithm.. 284 CHARM Algorithm implementation. 285 Data Mining Methods. 287 8.8 Prediction. 288 8.8.2 Classification techniques. 289 8.8.2.1 Bayesian Network. 289 8.8.2.2 K- Nearest Neighbor Algorithm.. 294 8.8.2.2.1 The Distance metric. 296 8.8.2.2.2 The parameter selection - cross validation. 296 8.8.2.3 Decision tree classifier. 297 Density based clustering algorithm.. 299 DBSCAN.. 299 Kernel Density Estimation. 303 8.9.3 Artificial Neural Network. 303 The Biological Neural Network. 303 8.11 Mining Data Streams. 305 Time Series Forecasting. 306 9.1Clustering. 308 Application of Hierarchical methods. 315 Kernel k-means clustering. 321 Expectation Maximization Clustering Algorithm.. 323 Methods of determining the Number of clusters: 327 Outlier detection. 327 Types of Outliers. 329 Outlier detection techniques. 332 Training dataset based outlier detection. 332 Assumption based outlier detection. 333 Applications of outlier detection. 334 9.6.3 Optimization Algorithm.. 335 Choosing the Number of Clusters. 339 Bayesian Analysis of Mixtures. 342 Fuzzy Clustering. 342 10.1 Big Data Visualization. 345 10.2 Conventional Data Visualization Techniques. 346 10.2.1 Line Chart. 346 10.2.2 Bar Chart. 347 10.2.3 Pie Chart. 348 10.2.4 Scatter Plot. 349 10.2.5 Bubble plot. 350 Tableau. 350 Connecting to data. 354 Connecting to data in Cloud. 355 Connect to a file. 356 Scatter plot in tableau. 362 Histogram using Tablaeu. 365 Bar chart in tableau. 365 Line Chart. 367 Pie chart. 368 Bubble chart. 369 Box Plot. 370 Tableau Use Cases. 371 Airlines. 371 Office Supplies. 372 Sports. 374 Science - Earthquake Analysis. 375 Tableau is used to analyze the magnitude of earth quake and the frequency of occurrence over the years. 375 Installing R and Getting Ready. 377 R Basic commands. 378 Assigning value to a variable. 378 Data Structures in R. 379 Vector. 379 Coercion. 380 Length, Mean and median. 381 Matrix. 382 Arrays. 385 Data frames. 387 Lists. 390 Importing data from a file. 392 Importing data from a delimited text file. 394 Control Structures in R. 394 If-else. 395 Nested if-else. 395 for loops. 396 Example. 396 [1] 4. 397 while loops. 397 Break. 398 Basic Graphs in R. 398 Pie Charts. 398 3D - Pie Charts. 399 Bar Charts. 400 Boxplots. 401 Histograms. 402 Line charts. 403 Scatter plots. 405
Trade Policy 买家须知
- 关于产品:
- ● 正版保障:本网站隶属于中国国际图书贸易集团公司,确保所有图书都是100%正版。
- ● 环保纸张:进口图书大多使用的都是环保轻型张,颜色偏黄,重量比较轻。
- ● 毛边版:即书翻页的地方,故意做成了参差不齐的样子,一般为精装版,更具收藏价值。
关于退换货:
- 由于预订产品的特殊性,采购订单正式发订后,买方不得无故取消全部或部分产品的订购。
- 由于进口图书的特殊性,发生以下情况的,请直接拒收货物,由快递返回:
- ● 外包装破损/发错货/少发货/图书外观破损/图书配件不全(例如:光盘等)
并请在工作日通过电话400-008-1110联系我们。
- 签收后,如发生以下情况,请在签收后的5个工作日内联系客服办理退换货:
- ● 缺页/错页/错印/脱线
关于发货时间:
- 一般情况下:
- ●【现货】 下单后48小时内由北京(库房)发出快递。
- ●【预订】【预售】下单后国外发货,到货时间预计5-8周左右,店铺默认中通快递,如需顺丰快递邮费到付。
- ● 需要开具发票的客户,发货时间可能在上述基础上再延后1-2个工作日(紧急发票需求,请联系010-68433105/3213);
- ● 如遇其他特殊原因,对发货时间有影响的,我们会第一时间在网站公告,敬请留意。
关于到货时间:
- 由于进口图书入境入库后,都是委托第三方快递发货,所以我们只能保证在规定时间内发出,但无法为您保证确切的到货时间。
- ● 主要城市一般2-4天
- ● 偏远地区一般4-7天
关于接听咨询电话的时间:
- 010-68433105/3213正常接听咨询电话的时间为:周一至周五上午8:30~下午5:00,周六、日及法定节假日休息,将无法接听来电,敬请谅解。
- 其它时间您也可以通过邮件联系我们:customer@readgo.cn,工作日会优先处理。
关于快递:
- ● 已付款订单:主要由中通、宅急送负责派送,订单进度查询请拨打010-68433105/3213。
本书暂无推荐
本书暂无推荐