Recognizing Handwritten Digits

Lumbini Inkar
3 min readDec 25, 2020

--

The hypothesis to be tested: The Digits data set of scikit-learn library provides numerous datasets that are useful for testing many problems of data analysis and prediction of the results.

Libraries used: scikit-learn, matplotlib

Dataset: Optical recognition of handwritten digits dataset. the Dataset consists of 10 classes where each class refers to a digit from 0 to 9. This dataset has 1,797 images that are 8x8 pixels in size.

Figure 1: An Example of 8x8 dimension Digit Dataset
#Importing the Library
from sklearn import datasets
#Loading the Dataset in a variable named digits
digits = datasets.load_digits()
#Description of the dataset
print(digits.DESCR)
# digits.images array contains images of handwritten digits
digits.images[0]
array([[ 0., 0., 5., 13., 9., 1., 0., 0.],
[ 0., 0., 13., 15., 10., 15., 5., 0.],
[ 0., 3., 15., 2., 0., 11., 8., 0.],
[ 0., 4., 12., 0., 0., 8., 8., 0.],
[ 0., 5., 8., 0., 0., 9., 8., 0.],
[ 0., 4., 11., 0., 1., 12., 7., 0.],
[ 0., 2., 14., 5., 10., 12., 0., 0.],
[ 0., 0., 6., 13., 10., 0., 0., 0.]])
plt.imshow(digits.images[1010], cmap=plt.cm.gray_r, interpolation='nearest')
Figure 2: 1011th image in Digit’s Dataset
# digits.targets array contains labels of handwritten digits
digits.target
array([0, 1, 2, ..., 8, 9, 8])
#Shape of arrays
digits.images.shape
(1797, 8, 8)
digits.target.shape
(1797,)
# Flatten data.images
n_samples = len(digits.images)
data = digits.images.reshape((n_samples, -1))
digits.data.shape
(1797, 64)
# Spliting the dataset into train and test set
x_train, x_test, y_train, y_test = train_test_split(digits.data, digits.target, test_size = 0.2)

In this article, we will be classifying the digits dataset using 3 algorithms.

  1. KNeighbors Classifier
  2. Support Vector Machine (SVM)
  3. Logistic Regression

1. KNeighbors Classifier Implementation:

#Import the necessary libraries
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
# Spliting ratio: 0.2
x_train, x_test, y_train, y_test = train_test_split(digits.data, digits.target, test_size = 0.2)
model = KNeighborsClassifier(n_neighbors = 5)
model.fit(x_train, y_train)
y_pred = model.predict(x_test)
accuracy_score(y_test, y_pred)*100
99.62962962962963

2. Support Vector Machine (SVM) Implementation:

#Import the necessary libraries
from sklearn import svm, datasets
import matplotlib.pyplot as plt
from sklearn import metrics
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn import svm
# Spliting ratio: 0.2
x_train, x_test, y_train, y_test = train_test_split(digits.data, digits.target, test_size = 0.2)
svc.fit(x_train, y_train)
y_pred= svc.predict(x_test)
accuracy_score(y_test, y_pred)*100
99.44444444444444

3. Logistic Regression Implementation:

#Import the necessary libraries
from sklearn.datasets import load_digits
from sklearn import datasets
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression
digits = datasets.load_digits()
# Spliting ratio: 0.2
x_train, x_test, y_train, y_test = train_test_split(digits.data, digits.target, test_size = 0.2)
logisticRegr = LogisticRegression()
logisticRegr.fit(x_train, y_train)
y_pred = logisticRegr.predict(x_test)
accuracy_score(y_test, y_pred)*100
95.83333333333334

Comparison of the 3 Algorithm

Figure 3: Comparision of various algorithm

Conclusion: In this article, we have implemented and compared 3 algorithms with various train and test sets to recognize handwritten digits using sklearn library.

I am thankful to mentors at https://internship.suvenconsultants.com for providing awesome problem statements and giving many of us a Coding Internship Experience. Thank you www.suvenconsultants.com

--

--

No responses yet