This is part 1 of two parts about dimensionality reduction. This part is about the basics. The part 2 is a bit more advanced, it explores machine learning models in t-SNE embedded spaces.
Dimensionality reduction is handy in many places. It is often used to visualize high-dimensional datasets in 2 or 3 dimensions. In this post we will plot 64 dimensional data in 2 dimensions with t-SNE and PCA.
Dimensionality reduction is also important in predictive analytics and machine learning. Predictive models are prone to so called curse of dimensionality. Reducing the amount of dimensions lets you tackle this curse and get better performance. It is also computationally less expensive to train models in lower dimensions.
I will be using the digits dataset from Scikit-Learn. The dataset consists of 1797 8×8 observations, which are labeled from 0 to 9. This dataset is a classic in pattern recognition.
Dimensionality reduction techniques
I chose to use t-SNE a.k.a. t–Distributed Stochastic Neighbor Embedding and PCA or Principal Component Analysis. Both are techniques to create a new lower dimensional space so that the new embedded space would represent the data as well as possible. Notice that this is not just choosing N features from the original space.
t-SNE and PCA visualizations
I visualized the original dataset in four different t-SNE embedded spaces (from now on just “t-SNE”) with different perplexity parameters. The perplexity parameter is related to the number of nearest neighbors that the algorithm considers during the training. One PCA was fitted to see differences between t-SNE and PCA.
I didn’t experiment with any PCA parameters, because PCA does not have an equally “interesting” parameters to tune as perplexity in t-SNE.
When you look at the plots think about separability. Are the classes separable? Could you draw a separating decision boundary between classes? IMHO it would be hard to draw a decision boundary at least in t-SNE with perplexity of 2 and 5. Classes in the PCA also look pretty overlapping and hard to separate. What do you think?
The goal for the post was to explore the potential various t-SNEs and PCA. I have used these techniques earlier, but while writing this post I really understood how much these techniques have to offer.
Especially the t-SNE part of the experiment was an eye-opener. Before this experiment I always thought that all t-SNE models were more or less the same. However, the parameters really change the way that the data is shown in lower dimensions.
In the next post you will learn how machine learning models perform in t-SNE with different parameters. Heatmaps and deeper insights on the way!