Recognizing Handwritten Digits
The hypothesis to be tested: The Digits data set of scikit-learn library provides numerous datasets that are useful for testing many problems of data analysis and prediction of the results.
Libraries used: scikit-learn, matplotlib
Dataset: Optical recognition of handwritten digits dataset. the Dataset consists of 10 classes where each class refers to a digit from 0 to 9. This dataset has 1,797 images that are 8x8 pixels in size.
#Importing the Library
from sklearn import datasets#Loading the Dataset in a variable named digits
digits = datasets.load_digits()#Description of the dataset
print(digits.DESCR)# digits.images array contains images of handwritten digits
digits.images[0]
array([[ 0., 0., 5., 13., 9., 1., 0., 0.],
[ 0., 0., 13., 15., 10., 15., 5., 0.],
[ 0., 3., 15., 2., 0., 11., 8., 0.],
[ 0., 4., 12., 0., 0., 8., 8., 0.],
[ 0., 5., 8., 0., 0., 9., 8., 0.],
[ 0., 4., 11., 0., 1., 12., 7., 0.],
[ 0., 2., 14., 5., 10., 12., 0., 0.],
[ 0., 0., 6., 13., 10., 0., 0., 0.]])plt.imshow(digits.images[1010], cmap=plt.cm.gray_r, interpolation='nearest')
# digits.targets array contains labels of handwritten digits
digits.target
array([0, 1, 2, ..., 8, 9, 8])#Shape of arrays
digits.images.shape
(1797, 8, 8)digits.target.shape
(1797,)# Flatten data.images
n_samples = len(digits.images)
data = digits.images.reshape((n_samples, -1))
digits.data.shape
(1797, 64)# Spliting the dataset into train and test set
x_train, x_test, y_train, y_test = train_test_split(digits.data, digits.target, test_size = 0.2)
In this article, we will be classifying the digits dataset using 3 algorithms.
- KNeighbors Classifier
- Support Vector Machine (SVM)
- Logistic Regression
1. KNeighbors Classifier Implementation:
#Import the necessary libraries
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt# Spliting ratio: 0.2
x_train, x_test, y_train, y_test = train_test_split(digits.data, digits.target, test_size = 0.2)model = KNeighborsClassifier(n_neighbors = 5)
model.fit(x_train, y_train)
y_pred = model.predict(x_test)
accuracy_score(y_test, y_pred)*100
99.62962962962963
2. Support Vector Machine (SVM) Implementation:
#Import the necessary libraries
from sklearn import svm, datasets
import matplotlib.pyplot as plt
from sklearn import metrics
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn import svm# Spliting ratio: 0.2
x_train, x_test, y_train, y_test = train_test_split(digits.data, digits.target, test_size = 0.2)svc.fit(x_train, y_train)
y_pred= svc.predict(x_test)
accuracy_score(y_test, y_pred)*100
99.44444444444444
3. Logistic Regression Implementation:
#Import the necessary libraries
from sklearn.datasets import load_digits
from sklearn import datasets
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression
digits = datasets.load_digits()# Spliting ratio: 0.2
x_train, x_test, y_train, y_test = train_test_split(digits.data, digits.target, test_size = 0.2)logisticRegr = LogisticRegression()
logisticRegr.fit(x_train, y_train)
y_pred = logisticRegr.predict(x_test)
accuracy_score(y_test, y_pred)*100
95.83333333333334
Comparison of the 3 Algorithm
Conclusion: In this article, we have implemented and compared 3 algorithms with various train and test sets to recognize handwritten digits using sklearn library.
I am thankful to mentors at https://internship.suvenconsultants.com for providing awesome problem statements and giving many of us a Coding Internship Experience. Thank you www.suvenconsultants.com