Learning Data Mining from scratch
This repository is used to record me learning Data Mining from scratch.
Data mining is a process to discover, by automatic or semi-automatic means, interesting, previously unknown, implicit, potentially useful, and non-trivial patterns or knowledge from large quantities of data.
NumPy, short for Numerical Python, is the foundational package for scientific com- puting in Python.
pandas provides rich data structures and functions designed to make working with structured data fast, easy, and expressive.
matplotlib is the most popular Python library for producing plots and other 2D data visualizations.
IPython is the component in the standard scientific Python toolset that ties everything together. It provides a robust and productive environment for interactive and explor- atory computing. It is an enhanced Python shell designed to accelerate the writing, testing, and debugging of Python code. It is particularly useful for interactively working with data and visualizing data with matplotlib. IPython is usually involved with the majority of my Python work, including running, debugging, and testing code.
SciPy is a collection of packages addressing a number of different standard problem domains in scientific computing. Here is a sampling of the packages included:
scipy.integrate: numerical integration routines and differential equation solvers
scipy.linalg: linear algebra routines and matrix decompositions extending be-
yond those provided in numpy.linalg.
scipy.optimize: function optimizers (minimizers) and root finding algorithms
scipy.signal: signal processing tools
scipy.sparse: sparse matrices and sparse linear system solvers
scipy.special: wrapper around SPECFUN, a Fortran library implementing many common mathematical functions, such as the gamma function
scipy.stats: standard continuous and discrete probability distributions (density functions, samplers, continuous distribution functions), various statistical tests, and more descriptive statistics
scipy.weave: tool for using inline C++ code to accelerate array computations
Together NumPy and SciPy form a reasonably complete computational replacement for much of MATLAB along with some of its add-on toolboxes.
JupyterLab is the next generation of the Jupyter Notebook. It aims at fixing many usability issues of the Notebook, and it greatly expands its scope. JupyterLab offers a general framework for interactive computing and data science in the browser, using Python, Julia, R, or one of many other languages.
If you are using both Python2.7 and Python3, you may have dependency packet conflict problems using pip
, so I recommend installing conda and then installing jupyterlab with it
JupyterLab can be installed using conda
or pip
. For more detailed instructions, consult the installation guide.
If you use conda, or install miniconda
conda install -c conda-forge jupyterlab
We recommend installing Anacoda3 to help integrate R into Jupyterlab
If you use pip
, you can install it with:
pip install jupyterlab
If installing using pip install --user
, you must add the user-level bin
directory to your PATH
environment variable in order to launch jupyter lab
.
jupyter --version
If the fllowing results appear, that means it works.
jupyter core : 4.6.1
jupyter-notebook : 6.0.3
qtconsole : not installed
ipython : 7.11.1
ipykernel : 5.1.3
jupyter client : 5.3.4
jupyter lab : 1.2.5
nbconvert : 5.6.1
ipywidgets : not installed
nbformat : 5.0.4
traitlets : 4.3.3
jupyter lab
You may access JupyterLab by entering the notebook server’s URL into the browser. JupyterLab sessions always reside in a workspace. The default workspace is the main /lab
URL:
http(s)://<server:port>/<lab-location>/lab
The open-source Anaconda Distribution is the easiest way to perform Python/R data science and machine learning on Linux, Windows, and Mac OS X.
More detail Click Here
Enjoy playing with JupyterLab!!!
[1] J. Han et al., “Data Mining: Concepts and Techniques”, Chapter 1
[2] P. Tan et al., “Introduction to Data Mining”, Chapter 1
[3] McKinney W. Python for data analysis: Data wrangling with Pandas, NumPy, and IPython. “ O’Reilly Media, Inc.”; 2012 Oct 8.