Vol 45, No 2 (2017) , 179 - 184

Open Access Open Access  Restricted Access Subscription Access

On a book Algorithms for data science by Brian Steele, John Chandler and Swarn Reddy

Krzysztof J. Szajowski

Digital Object Identifier (DOI): 10.14708/ma.v45i2.4369

Abstract

The book under review gives a comprehensive presentation of data science algorithms, which means on practical data analytics unites fundamental principles, algorithms, and data. Algorithms are the keystone of data analytics and the focal point of this textbook. The data science, as the authors claim, is the discipline since 2001. However, informally it worked before that date (cf. Cleveland(2001)). The crucial role had the graphic presentation of the data as the visualization of the knowledge hidden in the data.  It is the discipline which covers the data mining as the tool or important topic. The escalating demand for insights into big data requires a fundamentally new approach to architecture, tools, and practices. It is why the term data science is useful. It underscores the centrality of data in the investigation because they store of potential value in the field of action. The label science invokes certain very real concepts within it, like the notion of public knowledge and peer review. This point of view makes that the data science is not a new idea. It is part of a continuum of serious thinking dates back hundreds of years. The good example of results of data science is the Benford law (see Arno Berger and Theodore P. Hill(2015, 2017). In an effort to identifying some of the best-known algorithms that have been widely used in the data mining community, the IEEE International Conference on Data Mining (ICDM) has identified the top 10 algorithms in data mining for presentation at ICDM '06 in Hong Kong. This panel will announce the top 10 algorithms and discuss the impact and further research of each of these 10 algorithms in 2006. In the present book, there are clear and intuitive explanations of the mathematical and statistical foundations make the algorithms transparent. Most of the algorithms announced by IEEE in 2006 are included. But practical data analytics requires more than just the foundations. Problems and data are enormously variable and only the most elementary of algorithms can be used without modification. Programming fluency and experience with real and challenging data are indispensable and so the reader is immersed in Python and R and real data analysis. By the end of the book, the reader will have gained the ability to adapt algorithms to new problems and carry out innovative analysis.

Keywords: Algorithms; Associative Statistics; Computation; Computing Similarity; Cluster Analysis; Correlation; Data Reduction; Data Mapping; Data Dictionary; Data Visualization; Forecasting; Hadoop; Histogram; k-Means Algorithm; k-Nearest Neighbor Prediction

Subject classification: 62-02; 62-07; 62Pxx; 68-XX

References

[1] A. Berger and T. P. Hill. An introduction to Benford's law. Princeton, NJ: Princeton University Press, 2015. ISBN 978-0-691-16306-2/hbk; 978-1-400-86658-8/ebook. doi: 10.1515/9781400866588.
[2] A. Berger and T. P. Hill. What is... Benford's law? Notices Am. Math. Soc., 64(2):132-134, 2017. ISSN 0002-9920; 1088-9477/e. doi: 10.1090/noti1477.
[3] J. M. Chambers, W. S. Cleveland, B. Kleiner, and P. A. Tukey. Graphical methods for data analysis. The Wadsworth Statistics/Probability Series. Belmont, California: Wadsworth International Group; Boston: Duxbury Press. XIV, 395 p. $ 37.75 (1983)., 1983.
[4] W. S. Cleveland. Data science: an action plan for expanding the technical areas of the field of statistics. Int. Stat. Rev., 69(1):21-26, 2001. ISSN 0306-7734; 1751-5823/e. doi: 10.1111/j.1751-5823.2001.tb00477.x.
[5] T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning. Springer Series in Statistics. Springer-Verlag, New York, 2001. ISBN 0-387-95284-5. doi: 10.1007/978-0-387- 21606-5. URL http://dx.doi.org/10.1007/978-0-387-21606-5. Data mining, inference, and prediction.
[6] B. Steele, J. Chandler, and S. Reddy. Algorithms for data science. Cham: Springer, 2016. ISBN 978-3-319-45795-6/hbk; 978-3-319-45797-0/ebook. doi: 10.1007/978-3-319-45797-0.
[7] X. Wu and V. Kumar, editors. The top ten algorithms in data mining. Papers based on the presentations at the IEEE international conference on data mining (ICDM 2006), Hong Kong, December 18-22, 2006. Boca Raton, FL: CRC Press, 2009. ISBN 978-1-4200-8964-6/hbk; 978- 1-4200-8965-3/ebook. doi: 10.1201/9781420089653.
[8] XindongWu, V. Kumar, J. R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. Ng, B. Liu, P. S. Yu, Z.-H. Zhou, M. Steinbach, D. J. Hand, and D. Steinberg. Top 10 algorithms in data mining. Knowl. Inf. Syst., 14:1-37, 2008. doi: 10.1007/s10115-007-0114-2. URL http://dx.doi.org/10.1007/s10115-007-0114-2.

Pages: 179 - 184

Full Text: PDF (Polski)
مبل راحتی صندلی مدیریتی صندلی اداری میز اداری وبلاگدهی گن لاغری بازی اندروید تبلیغات کلیکی آموزش زبان انگلیسی پاراگلایدر مارکت اندروید تور آهنگ محسن چاوشی مسیح و آرش پروتز سینه پروتز باسن پروتز لب میز تلویزیون

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.

Creative Commons License
Mathematica Applicanda by Polish Mathematical Society is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Based on a work at http://ma.ptm.org.pl/.
Permissions beyond the scope of this license may be available at http://www.ptm.org.pl/.

The journal sponsored by

Ministry of Science and High Education

 

Print ISSN: 1730-2668; On line ISSN: 2299-4009


The journal is abstracted and indexed in:

Print(1973-1999) ISSN 0137—2890
مبل راحتی صندلی مدیریتی صندلی اداری میز اداری وبلاگدهی گن لاغری بازی اندروید تبلیغات کلیکی آموزش زبان انگلیسی پاراگلایدر مارکت اندروید تور آهنگ محسن چاوشی مسیح و آرش پروتز سینه پروتز باسن پروتز لب میز تلویزیون