2024-10-13

Introducing the CIFAR-10 dataset

The CIFAR-10 dataset is a commonly used image classification dataset that consists of 60,000 32×32 color images in 10 classes. The classes are: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck. The dataset is divided into 50,000 training images and 10,000 testing images. The dataset is preprocessed in a way that the training set and test set have an equal number of images from each class.

Loading the CIFAR-10 dataset in Python

To load the CIFAR-10 dataset in Python, we will use the cifar10 module from the Keras library. If you don’t have Keras installed, you can install it using the following command:
pip install keras

Once you have installed Keras, you can load the CIFAR-10 dataset using the following code:
from keras.datasets import cifar10
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

The cifar10.load_data() function returns two tuples: (x_train, y_train) and (x_test, y_test). The x_train and x_test tuples contain the input images, while the y_train and y_test tuples contain the corresponding class labels for the input images.

Preprocessing the data for SVM training

In this section, we will first convert the input images from 3D matrices to 2D matrices. We will also normalize the pixel values of the input images to be between 0 and 1. Finally, we will reshape the input images and convert the class labels to one-hot encoded vectors.

The reshape() function is used to reshape the input images from 3D matrices to 2D matrices. The -1 argument tells the function to infer the number of columns based on the number of rows and the size of each row:
# Reshape the input images
x_train = x_train.reshape(x_train.shape[0], -1)
x_test = x_test.reshape(x_test.shape[0], -1)

The pixel values of the input images are normalized to be between 0 and 1 by dividing them by 255, which is the maximum pixel value:
# Convert pixel values to between 0 and 1
x_train = x_train / 255
x_test = x_test / 255

The to_categorical() function is used to convert the class labels to one-hot encoded vectors. The num_classes variable is set to 10, which is the number of classes in the CIFAR-10 dataset:
# Convert class labels to one-hot encoded vectors
num_classes = 10
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

Leave a Reply

Your email address will not be published. Required fields are marked *