首页计算机书籍计算机语言《Mining of Massive Datasets》Jure Leskovec
垂耳兔

文档

195

关注

0

好评

0
PDF

《Mining of Massive Datasets》Jure Leskovec

阅读 1001 下载 0 大小 3.69M 总页数 513 页 2022-09-27 分享
价格:¥ 10.00
下载文档
/ 513
全屏查看
《Mining of Massive Datasets》Jure Leskovec
还有 513 页未读 ,您可以 继续阅读 或 下载文档
1、本文档共计 513 页,下载后文档不带www.pdfdz.com水印,支持完整阅读内容。
2、古籍基本都为PDF扫描版,所以文档不支持编辑功能,即不支持文档内文字的复制粘贴。
3、当您付费下载文档后,您只拥有了使用权限,并不意味着购买了版权,文档只能用于自身使用,不得用于其他商业用途(如 [转卖]进行直接盈利或[编辑后售卖]进行间接盈利)。
4、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。
5、如文档内容存在违规,或者侵犯商业秘密、侵犯著作权等,请点击“违规举报”。
MiningofMassiveDatasetsJure LeskovecStanford Univ.Anand RajaranMilliway LabsJeffrey D.UllnStanford Univ.Copyright C 2010,2011,2012,2013,2014 Anand Rajaran,Jure Leskovec,and Jeffrey D.UllnPrefaceThis book evolved from terial developed over several years by Anand Raja-ran and Jeff Ulln for a one-quarter course at Stanford.The courseCS345A,titled "Web Mining,"was designed as an advanced graduate course,although it has become accessible and interesting to advanced undergraduates.When Jure Leskovec joined the Stanford faculty,we reorganized the terialconsiderably.He introduced a new course CS224W on network ysis andadded terial to CS345A,which was renumbered CS246.The three authorsalso introduced a large-scale data-mining project course,CS341.The book nowcontains terial taught in all three courses.What the Book Is AboutAt the highest level of description,this book is about data mining.However,it focuses on data mining of very large amounts of data,that is,data so largeit does not fit in in memory.Because of the emphasis on size,ny of ourexamples are about the Web or data derived from the Web.Further.the booktakes an algorithmic point of view:data mining is about applying algorithmsto data,rather than using data to "train"a chine-learning engine of somesort.The principal topics covered are:1.Distributed file systems and p-reduce as a tool for creating parallelalgorithms that succeed on very large amounts of data.2.Similarity search,including the key techniques of minhashing and locality-sensitive hashing.3.Data-stream processing and specialized algorithms for dealing with datathat arrives so fast it must be processed immediately or lost.4.The technology of search engines,including Google's PageRank,link-spamdetection,and the hubs-and-authorities approach.5.Frequent-itemset mining,including association rules,rket-baskets,theA-Priori Algorithm and its improvements.6.Algorithms for clustering very large,high-dimensional datasets.
返回顶部