In the part 1 of the series you saw how 64 dimensional data can be plotted in 2 dimensions. In this part I will look at how machine learning models perform in different t-SNE embedded spaces.
Visualizations in lower dimensions made me wonder and I set out for a small experimentation. I had two questions in my mind:
- Does a machine learning model perform better if the dataset “looks” more separable in the given t-SNE?
- If a model performs well in 2 dimensional t-SNE, does it mean that it performs well in t-SNEs with higher dimensions?
Nothing scientific, just one dataset and one machine learning model. No definite rules, but you might find the insights interesting if you are working with high dimensional data.
Experimental Setup and Results
To answer the questions or at least have a bit better understanding on the issue I did these things:
- Cross validated (5-fold) a baseline estimator (A benchmark for sanity checking)
- Created 12 t-SNEs (perplexities 2, 5, 30, and 50, t-SNE dimensions 2, 4 and 8)
- Cross validated (5-fold) a gradient boosted classifier on each t-SNE
- Collected accuracy results in a heatmap.
The classifier had 100 estimators and default parameters. Default parameters might have preferred some t-SNEs over the others. It would be interesting to see how individually optimized models would have performed.
Model Performance: Mean Accuracies
How did the models perform? Below you can see a heatmap of the mean accuracies. Horizontal axis presents t-SNE dimensions. Vertical axis is t-SNE perplexity. Each box is annotated with the measured mean accuracy.
The baseline estimator achieved an accuracy of 0.974. Some of the classifiers did better than the baseline and some failed miserably. Take look at the classifier in 8 dimensional t-SNE with perplexity of 2. Altogether, we can see that higher perplexity gives higher accuracy (one exception). In this case, lower amount of dimensions yields always higher accuracy.
Model Performance: Accuracy Standard Deviations
What about accuracy’s standard deviation? How consistent were the models?
The standard deviation for the baseline was 0.0095. Standard deviations are in line with mean accuracies. The models with higher mean accuracies were more consistent and stable with the results.
It was really enlightening to explore dimensionality reduction. It used to be something a bit fuzzy and shady, but now everything makes more sense. During this experimentation t-SNE and PCA have earned their places in my machine learning toolbox.
This series was intended to introduce different dimensionality reduction techniques. I hope you learned something about the importance of parameter tuning and model performances in embedded spaces.
Thank you for reading!