適合初學者的CNN的數位影像識別專案:Digit Recognizer with CNN for beginner

2022-08-01 15:03:41

準備工作

MNIST資料集介紹

MNIST(「Modified National Institute of Standards and Technology」)是事實上的計算機視覺「hello world」資料集。自 1999 年釋出以來,這個經典的手寫影象資料集一直作為基準分類演演算法的基礎。隨著新的機器學習技術的出現,MNIST 仍然是研究人員和學習者的可靠資源。我們的目標是從數萬張手寫影象的資料集中正確識別數位。

資料檔案 train.csv 和 test.csv 包含從零到九的手繪數位的灰度影象。

每張影象高 28 畫素,寬 28 畫素,總共 784 畫素。每個畫素都有一個與之關聯的畫素值,表示該畫素的亮度或暗度,數位越大表示越暗。該畫素值是介於 0 和 255 之間的整數,包括 0 和 255。

訓練資料集 (train.csv) 有 785 列。第一列稱為「標籤」,是使用者繪製的數位。其餘列包含相關影象的畫素值。

訓練集中的每個畫素列都有一個類似 pixelx 的名稱,其中 x 是 0 到 783 之間的整數,包括 0 到 783。要在影象上定位該畫素,假設我們已將 x 分解為 x = i * 28 + j,其中 i 和 j 是 0 到 27 之間的整數,包括 0 和 27。然後 pixelx 位於 28 x 28 矩陣的第 i 行和第 j 列(索引為零)。

例如,pixel31 表示左數第四列、上數第二行的畫素,如下面的 ascii 圖表所示。

從視覺上看,如果我們省略「畫素」字首,畫素組成影象如下:

000 001 002 003 ... 026 027
028 029 030 031 ... 054 055
056 057 058 059 ... 082 083
 |   |   |   |  ...  |   |
728 729 730 731 ... 754 755
756 757 758 759 ... 782 783 

測試資料集 (test.csv) 與訓練集相同,只是它不包含「標籤」列。

您的提交檔案應採用以下格式:對於測試集中的 28000 張影象中的每一張,輸出一行包含 ImageId 和您預測的數位。例如,如果您預測第一張影象是 3,第二張影象是 7,第三張影象是 8,那麼您的提交檔案將如下所示:

ImageId,Label
1,3
2,7
3,8 
(27997 more lines)

本次比賽的評價指標是分類準確率,或者說測試影象被正確分類的比例。例如,0.97 的分類準確度表示您已正確分類除 3% 的影象之外的所有影象。

資料集下載:https://wwp.lanzoub.com/iIUFY08t575a

匯入包

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

讀取資料集

train = pd.read_csv('../input/digit-recognizer/train.csv')
test = pd.read_csv('../input/digit-recognizer/test.csv')

檢視資料特徵

train.head()
label pixel0 pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 ... pixel774 pixel775 pixel776 pixel777 pixel778 pixel779 pixel780 pixel781 pixel782 pixel783
0 1 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
2 1 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
3 4 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

5 rows × 785 columns

train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 42000 entries, 0 to 41999
Columns: 785 entries, label to pixel783
dtypes: int64(785)
memory usage: 251.5 MB

train.isnull().sum()

label 0
pixel0 0
pixel1 0
pixel2 0
pixel3 0
..
pixel779 0
pixel780 0
pixel781 0
pixel782 0
pixel783 0
Length: 785, dtype: int64

sum(train.isnull().sum())

0

預處理訓練集|測試集

#y_train 是數位標籤
y_train = train['label'].copy()
#X_train 是各畫素亮度值
X_train = train.drop('label',axis=1)
y_train.value_counts()

1 4684
7 4401
3 4351
9 4188
2 4177
6 4137
0 4132
4 4072
8 4063
5 3795
Name: label, dtype: int64

y_train = pd.get_dummies(y_train,prefix='Num')
y_train.head()
Num_0 Num_1 Num_2 Num_3 Num_4 Num_5 Num_6 Num_7 Num_8 Num_9
0 0 1 0 0 0 0 0 0 0 0
1 1 0 0 0 0 0 0 0 0 0
2 0 1 0 0 0 0 0 0 0 0
3 0 0 0 0 1 0 0 0 0 0
4 1 0 0 0 0 0 0 0 0 0
#28×28一共784個畫素,其中的數值表示亮度[0,255]
X_train.describe()
pixel0 pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 pixel9 ... pixel774 pixel775 pixel776 pixel777 pixel778 pixel779 pixel780 pixel781 pixel782 pixel783
count 42000.0 42000.0 42000.0 42000.0 42000.0 42000.0 42000.0 42000.0 42000.0 42000.0 ... 42000.000000 42000.000000 42000.000000 42000.00000 42000.000000 42000.000000 42000.0 42000.0 42000.0 42000.0
mean 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.219286 0.117095 0.059024 0.02019 0.017238 0.002857 0.0 0.0 0.0 0.0
std 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 6.312890 4.633819 3.274488 1.75987 1.894498 0.414264 0.0 0.0 0.0 0.0
min 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.0 0.0 0.0 0.0
25% 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.0 0.0 0.0 0.0
50% 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.0 0.0 0.0 0.0
75% 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.0 0.0 0.0 0.0
max 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 254.000000 254.000000 253.000000 253.00000 254.000000 62.000000 0.0 0.0 0.0 0.0

8 rows × 784 columns

#from sklearn.preprocessing import Normalizer
X_train = X_train/255
X_train.head()
pixel0 pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 pixel9 ... pixel774 pixel775 pixel776 pixel777 pixel778 pixel779 pixel780 pixel781 pixel782 pixel783
0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

5 rows × 784 columns

X_train.describe()
pixel0 pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 pixel9 ... pixel774 pixel775 pixel776 pixel777 pixel778 pixel779 pixel780 pixel781 pixel782 pixel783
count 42000.0 42000.0 42000.0 42000.0 42000.0 42000.0 42000.0 42000.0 42000.0 42000.0 ... 42000.000000 42000.000000 42000.000000 42000.000000 42000.000000 42000.000000 42000.0 42000.0 42000.0 42000.0
mean 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000860 0.000459 0.000231 0.000079 0.000068 0.000011 0.0 0.0 0.0 0.0
std 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.024756 0.018172 0.012841 0.006901 0.007429 0.001625 0.0 0.0 0.0 0.0
min 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0
25% 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0
50% 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0
75% 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0
max 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.996078 0.996078 0.992157 0.992157 0.996078 0.243137 0.0 0.0 0.0 0.0

8 rows × 784 columns

X_train = X_train.values.reshape(-1,28,28,1)
X_train

array([[[[0.],
[0.],
[0.],
...,
[0.],
[0.],
[0.]],

test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 28000 entries, 0 to 27999
Columns: 784 entries, pixel0 to pixel783
dtypes: int64(784)
memory usage: 167.5 MB

test.isnull().sum()

pixel0 0
pixel1 0
pixel2 0
pixel3 0
pixel4 0
..
pixel779 0
pixel780 0
pixel781 0
pixel782 0
pixel783 0
Length: 784, dtype: int64

sum(test.isnull().sum())

0

test = test/255
test.head()
pixel0 pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 pixel9 ... pixel774 pixel775 pixel776 pixel777 pixel778 pixel779 pixel780 pixel781 pixel782 pixel783
0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

5 rows × 784 columns

test.describe()
pixel0 pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 pixel9 ... pixel774 pixel775 pixel776 pixel777 pixel778 pixel779 pixel780 pixel781 pixel782 pixel783
count 28000.0 28000.0 28000.0 28000.0 28000.0 28000.0 28000.0 28000.0 28000.0 28000.0 ... 28000.000000 28000.000000 28000.000000 28000.000000 28000.000000 28000.0 28000.0 28000.0 28000.0 28000.0
mean 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000646 0.000287 0.000110 0.000044 0.000026 0.0 0.0 0.0 0.0 0.0
std 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.021464 0.014184 0.007112 0.004726 0.003167 0.0 0.0 0.0 0.0 0.0
min 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0
25% 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0
50% 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0
75% 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0
max 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.992157 0.996078 0.756863 0.733333 0.466667 0.0 0.0 0.0 0.0 0.0

8 rows × 784 columns

test = test.values.reshape(-1,28,28,1)
test

array([[[[0.],
[0.],
[0.],
...,
[0.],
[0.],
[0.]],

訓練CNN Model

import tensorflow as tf
tf.__version__

'2.6.4'

cnn = tf.keras.models.Sequential()

2022-08-01 05:41:16.816392: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 15403 MB memory: -> device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0

#Convolution
cnn.add(tf.keras.layers.Conv2D(filters=256,kernel_size=(5,5),activation='relu',input_shape=(28,28,1)))
#Max Pooling
cnn.add(tf.keras.layers.MaxPool2D(pool_size=(3,3),strides=3))
cnn.add(tf.keras.layers.BatchNormalization())
cnn.add(tf.keras.layers.Conv2D(filters=128,kernel_size=(4,4),activation='relu'))
cnn.add(tf.keras.layers.MaxPool2D(pool_size=(2,2),strides=2))
#Flattening
cnn.add(tf.keras.layers.Flatten())
#Full connection 
cnn.add(tf.keras.layers.Dense(units=256,activation='relu'))
#Output Layer
cnn.add(tf.keras.layers.Dense(units=10,activation='softmax'))
#Compile cnn
cnn.compile(optimizer='adam',loss='categorical_crossentropy')
# Epoch(時期):
# 當一個完整的資料集通過了神經網路一次並且返回了一次,這個過程稱為一次>epoch。(也就是說,所有訓練樣本在神經網路中都 進行了一次正向傳播 和一次反向傳播 )
# 再通俗一點,一個Epoch就是將所有訓練樣本訓練一次的過程。
# 然而,當一個Epoch的樣本(也就是所有的訓練樣本)數量可能太過龐大(對於計算機而言),就需要把它分成多個小塊,也就是就是分成多個Batch 來進行訓練。**

# Batch(批 / 一批樣本):
# 將整個訓練樣本分成若干個Batch。

# Batch_Size(批大小):
# 每批樣本的大小。

# Iteration(一次迭代):
# 訓練一個Batch就是一次Iteration(這個概念跟程式語言中的迭代器相似)。

cnn.fit(X_train,y_train,batch_size=32,epochs=50)

2022-08-01 05:41:18.154328: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)

Epoch 1/50

2022-08-01 05:41:19.541340: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8005

1313/1313 [] - 13s 5ms/step - loss: 0.1159
Epoch 2/50
1313/1313 [
] - 5s 4ms/step - loss: 0.0496
Epoch 3/50
1313/1313 [] - 6s 4ms/step - loss: 0.0367
Epoch 4/50
1313/1313 [
] - 5s 4ms/step - loss: 0.0289
Epoch 5/50
1313/1313 [] - 6s 4ms/step - loss: 0.0256
Epoch 6/50
1313/1313 [
] - 5s 4ms/step - loss: 0.0220
Epoch 7/50
1313/1313 [] - 6s 4ms/step - loss: 0.0192
Epoch 8/50
1313/1313 [
] - 5s 4ms/step - loss: 0.0167
Epoch 9/50
1313/1313 [] - 6s 4ms/step - loss: 0.0146
Epoch 10/50
1313/1313 [
] - 5s 4ms/step - loss: 0.0121
Epoch 11/50
1313/1313 [] - 6s 4ms/step - loss: 0.0133
Epoch 12/50
1313/1313 [
] - 5s 4ms/step - loss: 0.0142
Epoch 13/50
1313/1313 [] - 6s 4ms/step - loss: 0.0119
Epoch 14/50
1313/1313 [
] - 6s 4ms/step - loss: 0.0125
Epoch 15/50
1313/1313 [] - 6s 4ms/step - loss: 0.0103
Epoch 16/50
1313/1313 [
] - 6s 4ms/step - loss: 0.0103
Epoch 17/50
1313/1313 [] - 6s 4ms/step - loss: 0.0130
Epoch 18/50
1313/1313 [
] - 6s 4ms/step - loss: 0.0118
Epoch 19/50
1313/1313 [] - 6s 4ms/step - loss: 0.0093
Epoch 20/50
1313/1313 [
] - 6s 4ms/step - loss: 0.0075
Epoch 21/50
1313/1313 [] - 6s 4ms/step - loss: 0.0075
Epoch 22/50
1313/1313 [
] - 6s 5ms/step - loss: 0.0129
Epoch 23/50
1313/1313 [] - 6s 4ms/step - loss: 0.0105
Epoch 24/50
1313/1313 [
] - 6s 4ms/step - loss: 0.0087
Epoch 25/50
1313/1313 [] - 6s 4ms/step - loss: 0.0097
Epoch 26/50
1313/1313 [
] - 6s 4ms/step - loss: 0.0117
Epoch 27/50
1313/1313 [] - 5s 4ms/step - loss: 0.0051
Epoch 28/50
1313/1313 [
] - 6s 5ms/step - loss: 0.0086
Epoch 29/50
1313/1313 [] - 6s 4ms/step - loss: 0.0100
Epoch 30/50
1313/1313 [
] - 6s 4ms/step - loss: 0.0087
Epoch 31/50
1313/1313 [] - 6s 4ms/step - loss: 0.0096
Epoch 32/50
1313/1313 [
] - 6s 4ms/step - loss: 0.0065
Epoch 33/50
1313/1313 [] - 5s 4ms/step - loss: 0.0082
Epoch 34/50
1313/1313 [
] - 6s 4ms/step - loss: 0.0110
Epoch 35/50
1313/1313 [] - 6s 4ms/step - loss: 0.0063
Epoch 36/50
1313/1313 [
] - 6s 4ms/step - loss: 0.0107
Epoch 37/50
1313/1313 [] - 5s 4ms/step - loss: 0.0048
Epoch 38/50
1313/1313 [
] - 6s 4ms/step - loss: 0.0076
Epoch 39/50
1313/1313 [] - 5s 4ms/step - loss: 0.0154
Epoch 40/50
1313/1313 [
] - 6s 4ms/step - loss: 0.0095
Epoch 41/50
1313/1313 [] - 5s 4ms/step - loss: 0.0052
Epoch 42/50
1313/1313 [
] - 6s 4ms/step - loss: 0.0057
Epoch 43/50
1313/1313 [] - 5s 4ms/step - loss: 0.0080
Epoch 44/50
1313/1313 [
] - 6s 4ms/step - loss: 0.0085
Epoch 45/50
1313/1313 [] - 5s 4ms/step - loss: 0.0108
Epoch 46/50
1313/1313 [
] - 6s 4ms/step - loss: 0.0062
Epoch 47/50
1313/1313 [] - 5s 4ms/step - loss: 0.0118
Epoch 48/50
1313/1313 [
] - 6s 4ms/step - loss: 0.0078
Epoch 49/50
1313/1313 [] - 5s 4ms/step - loss: 0.0083
Epoch 50/50
1313/1313 [
] - 6s 4ms/step - loss: 0.0044

<keras.callbacks.History at 0x7f35f40ac710>

pred = cnn.predict(test)
pred = np.argmax(pred,axis=1)
pred

array([2, 0, 9, ..., 3, 9, 2])

pred = pd.DataFrame(pred,columns=['Label'])
test_id = list(range(1,len(test)+1,1))
test_id = pd.DataFrame(test_id,columns=['ImageId'])
submission = pd.concat([test_id,pred],axis=1)
submission.describe()
ImageId Label
count 28000.000000 28000.000000
mean 14000.500000 4.453036
std 8083.048105 2.896665
min 1.000000 0.000000
25% 7000.750000 2.000000
50% 14000.500000 4.000000
75% 21000.250000 7.000000
max 28000.000000 9.000000

此模型最終準確率為:0.98857

原創作者:孤飛-部落格園
原文地址:https://www.cnblogs.com/ranxi169/p/16540166.html

jupyter格式程式碼檢視|下載https://www.kaggle.com/code/ranxi169/digit-recognizer-with-cnn-for-beginner/notebook