CIFAR10 classification using Machine Learning and Deep Learning Models

CIFAR10 classification using Machine Learning and Deep Learning Models

by 
Team 21
Vyshnavi Chalechimala, Apurva Mandalika, Priyanka Verma, Vaishnavi Chakradeo, Jawahar Sai Nathani


Image recognition is at the heart of modern machine learning, fueling innovations in fields like self-driving cars, healthcare, and augmented reality. In this blog post, we’ll dive into the world of image classification using the CIFAR-10 dataset—a benchmark dataset of 60,000 tiny images spread across 10 distinct categories.

What Is CIFAR-10?

The CIFAR-10 dataset is among the most well-known benchmarks for image classification tasks, offering researchers and practitioners a comprehensive playground for experimenting with various machine learning and deep learning algorithms. This dataset is a labeled subset of the 80 Million Tiny Images dataset, collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton.

Overview

  • Total Images: 60,000
  • Image Size: 32x32 pixels (color images)
  • Number of Classes: 10
  • Training Images: 50,000
  • Test Images: 10,000

Classes

The CIFAR-10 dataset is divided into the following mutually exclusive categories:

Model Selection

We chose three models:

  • Random Forest: Random Forest is a traditional machine learning model that combines the predictions of multiple decision trees to reduce overfitting and improve robustness. It is particularly effective for structured data but faces challenges with high-dimensional data like images.
    • Key Features:
      • Handles outliers well.
      • Provides feature importance insights.
      • Robust against overfitting but lacks the ability to capture spatial relationships in image data.
  • CNNs: CNNs revolutionized image processing by introducing convolutional layers that extract spatial features like edges, textures, and patterns. They excel at hierarchical feature learning, enabling robust classification of complex datasets.
    • Key Features:
      • Automatically learns features from raw pixel data.
      • Effectively captures spatial hierarchies and patterns.
      • Requires computational resources and careful tuning of hyperparameters.

  • ResNet: ResNet builds upon CNNs by introducing residual connections, which solve the vanishing gradient problem and enable the training of deeper networks. This architecture ensures effective feature learning across layers, making it suitable for complex tasks.
    • Key Features:
      • Deeper architecture with residual connections.
      • Preserves essential features across layers.
      • Outperforms simpler models on large datasets.

By comparing these models, we aimed to understand their relative strengths, weaknesses, and suitability for CIFAR-10 classification.

Data Preparation

Data Statistics after Train, Valid and Test split

To prepare the data for our models:

  1. Train-Validation Split: We split the training data into 80% training and 20% validation.
  2. Normalization: Pixel values were normalized to ensure uniformity.
  3. Augmentation: Applied auto augmentation techniques to enhance diversity in training images.
  4. Dimensionality Reduction: Used Principal Component Analysis (PCA) to reduce dimensionality for the Random Forest model.

These steps ensured that the data was ready for both machine learning and deep learning pipelines.

Visualizations and Insights

Data Analysis:

  • PCA reduced the dataset’s dimensionality, enabling efficient training of Random Forest.
  • t-SNE provided a 2D visualization of class clusters, revealing distinct groupings and separability in the CIFAR-10 dataset.

Confusion Matrices:

  • Random Forest exhibited frequent misclassifications across similar classes.
  • CNN and ResNet showed consistent improvements in distinguishing challenging pairs (e.g., cats vs. dogs).

These visualizations provided insights into the separability of classes and helped streamline the Random Forest model.

Sample Images from each class 

                                                    PCA reconstructed sample images from each class

                  PCA 3D Visualization                                                                          t-SNE 2D Visualizations

                                                                                   PCA Visualizations

Model Selection and Implementation

Random Forest

  • Framework: Scikit-learn
  • Enhancements: PCA was used to reduce dimensionality, and hyperparameters (e.g., max depth, min samples per leaf) were optimized.
  • Training Process: Fit on preprocessed data, followed by validation to fine-tune performance.

Random Forest on Test dataset - Metrics
Random Forest on Test dataset - Confusion Matrix


Convolutional Neural Network (CNN)

CNN Architecture
    • Framework: PyTorch
    • Architecture:
      • Three convolutional layers, each followed by Batch Normalization, activation (ReLU, Sigmoid, or Tanh) and max pooling
      • Two fully connected layers for classification, with a droupout layer in between them for regularization
    • Optimization: Used Adam optimizer with a learning rate scheduler (StepLR) to gradually reduce the learning rate. Batch sizes of 32, 64, and 128 were tested for their impact on performance.
    • Training: Trained over 20 epochs with cross-entropy loss.
    • CNN Confusion Matrix


CNN Metrics

ResNet

    ResNet Architecture
      • Framework: PyTorch (ResNet-18)
      • Enhancements:
        • Customized initial convolution layer with a 3x3 kernel.
        • Residual connections to ensure effective learning across layers.
        • Experimented with optimizers (Adam, SGD) and activation functions (ReLU, Sigmoid).
      • Optimization: Applied batch normalization and global average pooling for stability and generalization.
      • Training: Trained over 20 epochs using cross-entropy loss and StepLR scheduler.

    ResNet - Confusion Matrix

    ResNet - Metrics

    Experimental Results

    The three models—Random Forest, CNN, and ResNet—exhibited varying levels of performance on the CIFAR-10 dataset, reflecting their differing capabilities.

    • Random Forest achieved a testing accuracy of 44.97%. Despite its ability to handle structured data well, it struggled with generalization on high-dimensional image data due to its inability to capture spatial relationships. This limitation highlights the gap between traditional machine learning models and deep learning approaches for image classification tasks.
    • CNN improved significantly over Random Forest, achieving a testing accuracy of 81.1%. The use of convolutional layers allowed CNN to extract spatial features from the images, leading to better classification performance. CNN effectively learned hierarchical patterns from the data, which contributed to its higher accuracy compared to Random Forest.
    • ResNet delivered the best performance among the three models, with a testing accuracy of 83.6%. The architecture leveraged residual connections to address the vanishing gradient problem, enabling the effective training of deeper networks. This allowed ResNet to retain important features across layers, resulting in better generalization and higher accuracy on the test set.

    Conclusion

    This study showcased the progression from traditional machine learning to deep learning for image classification. The results demonstrate the following:

    • Random Forest serves as a baseline but is not ideal for image data due to its lack of spatial feature learning.
    • CNNs deliver substantial improvements by leveraging hierarchical feature extraction, achieving 80.04% accuracy on the test set.
    • ResNet sets the benchmark with 84% accuracy, thanks to its deeper architecture and residual connections.

    Code and Resources

    You can find the complete implementation of this project on GitHub.

    REFERENCES
    1. Cifar10 Dataset - Learning Multiple Layers of Features from Tiny Images, Chapter 3, Alex Krizhevsky, 2009.
    2. Z. Li, F. Liu, W. Yang, S. Peng and J. Zhou, ”A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects,” in IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no.12, pp. 6999-7019, Dec. 2022, doi: 10.1109/TNNLS.2021.3084827.
    3. K. He, X. Zhang, S. Ren and J. Sun, ”Deep Residual Learning for Image Recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016
    4. G. Huang, Z. Liu, L. Van Der Maaten and K. Q. Weinberger, ”Densely Connected Convolutional Networks,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017
    5. Louppe, Gilles. ”Understanding random forests: From theory to practice.” arXiv:1407.7502, 2014.




    Comments