项目作者: kevinzakka

项目描述 :
用于t-SNE可视化的Python代码
高级语言: Python
项目地址: git://github.com/kevinzakka/tsne-viz.git
创建时间: 2017-10-05T06:28:32Z
项目社区:https://github.com/kevinzakka/tsne-viz

开源协议:

下载


t-SNE Visualization

This repository is an easy-to-run t-SNE visualization tool for your dataset of choice. It currently supports 2D and 3D plots as well as an optional original image overlay on top of the 2D points.


Drawing
Drawing

Installation

Ubuntu Installation

First clone this repository, then install the TkInter package by running:

  1. sudo apt-get install python3-tk

Optionally create a virtualenv for this project:

  1. cd tsne-vis
  2. virtualenv -p python3
  3. source venv/bin/activate

Then install the python3 dependecies:

  1. cd tsne-vis
  2. pip install -r requirements.txt

Usage

Example Command

  1. python main.py --num_samples=5000 --num_dimensions=2 --compute_embeddings=False --with_images=False

This will plot a 2D t-SNE plot with no image overlay. Note that the example code uses the Fashion-MNIST dataset which you can download by running:

  1. chmod +x download_data.sh
  2. ./download_data.sh

You’ll only need to modify the load_data method if you’re planning on using your own dataset. Make sure it returns a set of numpy arrays: for example, if embedding grasycale images, you’ll want to return an array of images and their associated labels as follows

  1. X: (100, 32, 32)
  2. y: (100,)

To see all possible command line options, run

  1. python main.py --help

which will print:

  1. usage: main.py [-h] [--num_samples NUM_SAMPLES]
  2. [--num_dimensions NUM_DIMENSIONS] [--shuffle SHUFFLE]
  3. [--compute_embeddings COMPUTE_EMBEDDINGS]
  4. [--with_images WITH_IMAGES] [--random_seed RANDOM_SEED]
  5. [--data_dir DATA_DIR] [--plot_dir PLOT_DIR]
  6. t-SNE Visualizer
  7. optional arguments:
  8. -h, --help show this help message and exit
  9. Setup:
  10. --num_samples NUM_SAMPLES
  11. # of samples to compute embeddings on. Becomes slow if
  12. very high.
  13. --num_dimensions NUM_DIMENSIONS
  14. # of tsne dimensions. Can be 2 or 3.
  15. --shuffle SHUFFLE Whether to shuffle the data before embedding.
  16. --compute_embeddings COMPUTE_EMBEDDINGS
  17. Whether to compute embeddings. Do this once per sample
  18. size.
  19. --with_images WITH_IMAGES
  20. Whether to overlay images on data points. Only works
  21. with 2D plots.
  22. --random_seed RANDOM_SEED
  23. Seed to ensure reproducibility
  24. Path Params:
  25. --data_dir DATA_DIR Directory where data is stored
  26. --plot_dir PLOT_DIR Directory where plots are saved

Image Overlay

The overlay option only works for 2D plots and relies on matplotlib’s AnnotationBox method. Here’s an example of what it outputs:


Drawing