Mining
of
Massive
Datasets
Anand Rajaraman
Kosmix, Inc.
Jeffrey D. Ullman
Stanford Univ.
Copyright c© 2010, 2011 Anand Rajaraman and Jeffrey D. Ullman
ii
Preface
This book evolved from material developed over several years by Anand Raja-
raman and Jeff Ullman for a one-quarter course at Stanford. The course
CS345A, titled “Web Mining,” was designed as an advanced graduate course,
although it has become accessible and interesting to advanced undergraduates.
What the Book Is About
At the highest level of description, this book is about data mining. However,
it focuses on data mining of very large amounts of data, that is, data so large
it does not fit in main memory. Because of the emphasis on size, many of our
examples are about the Web or data derived from the Web. Further, the book
takes an algorithmic point of view: data mining is about applying algorithms
to data, rather than using data to “train” a machine-learning engine of some
sort. The principal topics cov
data/Web/Ull/large/mining/advanced/book/Anan/Rajaraman/D./
data/Web/Ull/large/mining/advanced/book/Anan/Rajaraman/D./
-->