tensorflow学习，建立深度学习网络，识别CIFAR-10数据集中不同类别的物体（16）

前面几节，我们介绍了CIFAR-10数据集，以及 tensorflow 高效率导入数据的方法。本节将尝试将之前学习到的知识用于实践：动手搭建一个深度学习网络，并且利用CIFAR-10数据集训练之，最终达到识别CIFAR-10数据集中不同物体的目的。在搭建网络的过程中，参考了官方提供的例子。

总体思路

经过前面的学习，知道了卷积网络特别适合图片识别，因此本节也决定使用卷积网络。首先，对图片进行两层卷积和池化（卷积和池化可参考第四节），然后将结果拉成一维数组，再建立全连接层，将最终结果输出成 10 类。当然，在此之前，要先将 CIFAR-10 数据集读入内存。具体如下：

将上节读入数据的代码写成函数，方便复用
扩大数据集（图片的镜面翻折，调整亮度色调等，不会影响物体类别，但是图片集数量变多了）
将图片以 batch 的形式传送给 tensorflow
每组图片都会经过两层卷积和池化
卷积和池化的结果经过全连接层，输出 10 类结果
通过 tensorflow 的 sparse_softmax_cross_entropy_with_logits 方法计算 loss
通过 tensorflow 的 AdamOptimizer 优化器训练网络

看了一下官方提供的例子，略庞大，但是原理并不复杂。因此决定自己先搭建一个精简一点的网络并训练，测试其性能，最终再将自己实现的网络与官方例程对比，这样既能锻炼自己的动手能力，也能学到官方例程。

python 实战代码

1. 先定义固定变量

图片的深度为 3，长宽都是 32。CIFAR-10 数据集一共有 10 类物体，训练集一共 50000 张图片，测试集一共 10000 张。计划每次训练传送给 tensorflow 的 batch 为 50。

IMAGE_HEIGHT = 32
IMAGE_WIDTH = 32
IMAGE_DEPTH = 3

NUM_CLASSES = 10
NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN = 50000
NUM_EXAMPLES_PER_EPOCH_FOR_EVAL = 10000

BATCH_SIZE = 50

2. 将数据集读入

这部分的代码上一节说的非常清楚，其实这里就是将上一节的代码封装成一个函数，方便后面的复用。

def ReadImages(filenames):
    # 使用 tensorflow 将文件名 list 转成队列（queue）
    filename_queue = tf.train.string_input_producer(filenames)
    # 标签占一个字节
    label_bytes = 1
    # 图片尺寸 32x32x3
    height = IMAGE_HEIGHT
    width = IMAGE_WIDTH
    depth = IMAGE_DEPTH 
    # 一张图片字节数
    image_bytes = height * width * depth
    # 一帧数据包含一字节标签和 image_bytes 图片
    record_bytes = label_bytes + image_bytes
    # 创建固定长度的数据 reader
    reader = tf.FixedLengthRecordReader(record_bytes=record_bytes)
    key, value = reader.read(filename_queue)
    # 读出的 value 是 string，现在转换为 uint8 型的向量
    record_bytes = tf.decode_raw(value, tf.uint8)
    # 第一字节表示 标签值，我们把它从 uint8 型转成 int32 型
    label = tf.cast(tf.strided_slice(record_bytes, [0], [label_bytes]), tf.int32)
    # 剩下的就是图片的数据了，我们把它的形状由 [深度*高(长)*宽] 转换成 [深度，高(长)，宽] 
    depth_major = tf.reshape(tf.strided_slice(record_bytes, [label_bytes],
                           [label_bytes + image_bytes]),
                            [depth, height, width])                         
    # 将图片从 [深度，高(长)，宽] 转成 [高(长)，宽, 深度] 形状
    uint8image = tf.transpose(depth_major, [1, 2, 0])

    return label, uint8image

最终返回一个标签值和对应的 uint8 型的图片。

3. 对图片进行一定的加工

这样可以增加数据集的多样性，而且一定程度上可以减小过拟合，代码注释部分说的非常清楚了。

def Distorted_inputs(filenames, batch_size):    
    # 读入数据
    label, uint8image = ReadImages(filenames)
    reshaped_image = tf.cast(uint8image, tf.float32)

    height = IMAGE_SIZE
    width = IMAGE_SIZE
    # 按照 IMAGE_SIZE 随机裁剪图片
    distorted_image = tf.random_crop(reshaped_image, [height, width, 3])
    # 随机水平翻折图片.
    distorted_image = tf.image.random_flip_left_right(distorted_image)
    # 随机调整图片的亮度，随机镜面图片
    distorted_image = tf.image.random_brightness(distorted_image, max_delta=63)
    distorted_image = tf.image.random_contrast(distorted_image,
                                             lower=0.2, upper=1.8)
    # 标准差化
    float_image = tf.image.per_image_standardization(distorted_image)
    # 重置形状
    float_image.set_shape([height, width, 3])
    label.set_shape([1])

    min_fraction_of_examples_in_queue = 0.4
    min_queue_examples = int(NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN *
                           min_fraction_of_examples_in_queue)
    # 产生一个图片 batch
    num_preprocess_threads = 16
    images, label_batch = tf.train.batch(
        [float_image, label],
        batch_size=batch_size,
        num_threads=num_preprocess_threads,
        capacity=min_queue_examples + 3 * batch_size)

    # 返回的是加工过的图片和标签的batch
    return images, tf.reshape(label_batch, [batch_size])

4. 建立深度学习网络

图片先经过两层卷积，再经过三层全连接层，最终输出 10 类。

第一层卷积，使用 5x5x3 的卷积核，输出结果为 64 维，第二层显然要使用 5x5x64 的卷积核，仍然让输出结果为 64 维，两层都使用最大值池化。
卷积后，将所有数据拉成一维数组，然后利用 3 层全连接层将结果降低为 10 类。

代码虽然有点长，但是很简单，结合前面几节，应该非常清楚。

def NetworkRes(images):
    # 第一层卷积
    kernel = tf.Variable(tf.zeros([5,5,3,64]))
    biases = tf.Variable(tf.random_normal( [64])) 
    conv = tf.nn.conv2d(images, kernel, [1,1,1,1], padding='SAME')
    preAc = tf.nn.bias_add(conv, biases)
    conv1 = tf.nn.relu(preAc)
    pool1 = tf.nn.max_pool(conv1, ksize=[1,3,3,1], strides=[1,2,2,1], padding='SAME')
    # 第二层卷积
    kernel = tf.Variable(tf.zeros([5,5,64,64]))
    conv = tf.nn.conv2d(pool1, kernel, [1,1,1,1], padding='SAME')
    preAc = tf.nn.bias_add(conv, biases)
    conv2 = tf.nn.relu(preAc)
    pool2 = tf.nn.max_pool(conv1, ksize=[1,3,3,1], strides=[1,2,2,1], padding='SAME')
    # 第一层全连接层
    reshape = tf.reshape(pool2, [BATCH_SIZE, -1])
    dim = reshape.get_shape()[1].value
    weights = tf.Variable(tf.zeros([dim, 384]))
    biases = tf.Variable(tf.random_normal( [384]))
    res1 = tf.nn.relu(tf.matmul(reshape, weights) + biases)
    # 第二层全连接层
    weights = tf.Variable(tf.zeros([384, 192]))
    biases = tf.Variable(tf.random_normal( [192])) 
    res2 = tf.nn.relu(tf.matmul(res1, weights) + biases)
    # 第三层全连接层
    weights = tf.Variable(tf.zeros([192, NUM_CLASSES]))
    biases = tf.Variable(tf.random_normal( [NUM_CLASSES]))
    logits = tf.add(tf.matmul(res2, weights), biases)
    return logits

5. 定义 loss

loss 的定义是参考官方例程的，使用了 tensorflow 的 sparse_softmax_cross_entropy_with_logits 方法。它的原型为：

sparse_softmax_cross_entropy_with_logits(_sentinel=None,  labels=None, logits=None, name=None)

_sentinel:本质上是不用的参数，不用填
labels：标签值
logits：shape为[batch_size,num_classes],type为float32或float64
name:操作的名字，可填可不填

它的功能与

tf.nn.softmax_cross_entropy_with_logits(logits, labels, name=None)

类似。

第一个参数logits：就是神经网络最后一层的输出，如果有batch的话，它的大小就是[batchsize，num_classes]，单样本的话，大小就是num_classes
第二个参数labels：实际的标签，大小同上。

它首先计算 logits 的 softmax，然后计算 logits 的 softmax 与标签值的交叉熵 cross_entropy，返回值是一个向量。如果要求总交叉熵，要做一步 tf.reduce_sum 操作，就是对向量里面所有元素求和。如果求loss，则要做一步 tf.reduce_mean 操作，对向量求均值。

最终，loss 的代码如下：

def Loss(logits, labels):
    # 求交叉熵loss的平均值
    labels = tf.cast(labels, tf.int64)
    cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
      labels=labels, logits=logits, name='cross_entropy_per_example')
    cross_entropy_mean = tf.reduce_mean(cross_entropy, name='cross_entropy')
    return cross_entropy_mean

6. 定义训练方式 train

这一步就比较简单了，就是调用上面定义好的函数。然后创建 session，在 session 里开始训练。千万别忘了 tf.train.start_queue_runners(sess=sess)，否则网络会阻塞，不会有数据被读入。

def Train():
    # 创建文件名 list
    for i in range(1,6):
        filenames = [os.path.join('cifar10_data/cifar-10-batches-bin', 'data_batch_%d.bin' % i)]

    images, label_batch = Distorted_inputs(filenames, BATCH_SIZE)
    logits = NetworkRes(images)
    loss = Loss(logits, label_batch)

    train_step = tf.train.AdamOptimizer(5e-4).minimize(loss)

    init = tf.global_variables_initializer()
    with tf.Session() as sess:
        sess.run(init)
        threads = tf.train.start_queue_runners(sess=sess)
        for i in range(30001):
            # print 'times: %d' % i
            sess.run(train_step)
            if i%10 == 0:
                print 'loss:', sess.run(loss)

定义好以后，就可以训练了，全部代码如下：

#encoding=utf8
import tensorflow as tf
import os
import scipy.misc
import numpy as np
# import matplotlib.pyplot as plt  
from personalTools import *

IMAGE_HEIGHT = 32
IMAGE_WIDTH = 32
IMAGE_DEPTH = 3

NUM_CLASSES = 10
NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN = 50000
NUM_EXAMPLES_PER_EPOCH_FOR_EVAL = 10000

BATCH_SIZE = 50

def ReadImages(filenames):
    # 使用 tensorflow 将文件名 list 转成队列（queue）
    filename_queue = tf.train.string_input_producer(filenames)
    # 标签占一个字节
    label_bytes = 1
    # 图片尺寸 32x32x3
    height = IMAGE_HEIGHT
    width = IMAGE_WIDTH
    depth = IMAGE_DEPTH 
    # 一张图片字节数
    image_bytes = height * width * depth
    # 一帧数据包含一字节标签和 image_bytes 图片
    record_bytes = label_bytes + image_bytes
    # 创建固定长度的数据 reader
    reader = tf.FixedLengthRecordReader(record_bytes=record_bytes)
    key, value = reader.read(filename_queue)
    # 读出的 value 是 string，现在转换为 uint8 型的向量
    record_bytes = tf.decode_raw(value, tf.uint8)
    # 第一字节表示 标签值，我们把它从 uint8 型转成 int32 型
    label = tf.cast(tf.strided_slice(record_bytes, [0], [label_bytes]), tf.int32)
    # 剩下的就是图片的数据了，我们把它的形状由 [深度*高(长)*宽] 转换成 [深度，高(长)，宽] 
    depth_major = tf.reshape(tf.strided_slice(record_bytes, [label_bytes],
                           [label_bytes + image_bytes]),
                            [depth, height, width])                         
    # 将图片从 [深度，高(长)，宽] 转成 [高(长)，宽, 深度] 形状
    uint8image = tf.transpose(depth_major, [1, 2, 0])

    return label, uint8image

IMAGE_SIZE = 24 

def Distorted_inputs(filenames, batch_size):    
    # 读入数据
    label, uint8image = ReadImages(filenames)
    reshaped_image = tf.cast(uint8image, tf.float32)

    height = IMAGE_SIZE
    width = IMAGE_SIZE
    # 按照 IMAGE_SIZE 随机裁剪图片
    distorted_image = tf.random_crop(reshaped_image, [height, width, 3])
    # 随机水平翻折图片.
    distorted_image = tf.image.random_flip_left_right(distorted_image)
    # 随机调整图片的亮度，随机镜面图片
    distorted_image = tf.image.random_brightness(distorted_image,                                               max_delta=63)
    distorted_image = tf.image.random_contrast(distorted_image,
                                             lower=0.2, upper=1.8)
    # 标准差化
    float_image = tf.image.per_image_standardization(distorted_image)
    # 重置形状
    float_image.set_shape([height, width, 3])
    label.set_shape([1])

    # Ensure that the random shuffling has good mixing properties.
    min_fraction_of_examples_in_queue = 0.4
    min_queue_examples = int(NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN *
                           min_fraction_of_examples_in_queue)
    # 产生一个图片 batch
    num_preprocess_threads = 16
    images, label_batch = tf.train.batch(
        [float_image, label],
        batch_size=batch_size,
        num_threads=num_preprocess_threads,
        capacity=min_queue_examples + 3 * batch_size)

    # 返回的是加工过的图片和标签batch
    return images, tf.reshape(label_batch, [batch_size])

def NetworkRes(images):
    # 第一层卷积
    kernel = tf.Variable(tf.zeros([5,5,3,64]))
    biases = tf.Variable(tf.random_normal( [64])) 
    conv = tf.nn.conv2d(images, kernel, [1,1,1,1], padding='SAME')
    preAc = tf.nn.bias_add(conv, biases)
    conv1 = tf.nn.relu(preAc)
    pool1 = tf.nn.max_pool(conv1, ksize=[1,3,3,1], strides=[1,2,2,1], padding='SAME')
    # 第二层卷积
    kernel = tf.Variable(tf.zeros([5,5,64,64]))
    conv = tf.nn.conv2d(pool1, kernel, [1,1,1,1], padding='SAME')
    preAc = tf.nn.bias_add(conv, biases)
    conv2 = tf.nn.relu(preAc)
    pool2 = tf.nn.max_pool(conv1, ksize=[1,3,3,1], strides=[1,2,2,1], padding='SAME')
    # 第一层全连接层
    reshape = tf.reshape(pool2, [BATCH_SIZE, -1])
    dim = reshape.get_shape()[1].value
    weights = tf.Variable(tf.zeros([dim, 384]))
    biases = tf.Variable(tf.random_normal( [384]))
    res1 = tf.nn.relu(tf.matmul(reshape, weights) + biases)
    # 第二层全连接层
    weights = tf.Variable(tf.zeros([384, 192]))
    biases = tf.Variable(tf.random_normal( [192])) 
    res2 = tf.nn.relu(tf.matmul(res1, weights) + biases)
    # 第三层全连接层
    weights = tf.Variable(tf.zeros([192, NUM_CLASSES]))
    biases = tf.Variable(tf.random_normal( [NUM_CLASSES]))
    logits = tf.add(tf.matmul(res2, weights), biases)
    return logits

def Loss(logits, labels):
    # 求交叉熵loss的平均值
    labels = tf.cast(labels, tf.int64)
    cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
      labels=labels, logits=logits, name='cross_entropy_per_example')
    cross_entropy_mean = tf.reduce_mean(cross_entropy, name='cross_entropy')
    return cross_entropy_mean

def Train():
    # 创建文件名 list
    for i in range(1,6):
        filenames = [os.path.join('cifar10_data/cifar-10-batches-bin', 'data_batch_%d.bin' % i)]

    images, label_batch = Distorted_inputs(filenames, BATCH_SIZE)
    logits = NetworkRes(images)
    loss = Loss(logits, label_batch)

    train_step = tf.train.AdamOptimizer(5e-4).minimize(loss)

    init = tf.global_variables_initializer()
    with tf.Session() as sess:
        sess.run(init)
        threads = tf.train.start_queue_runners(sess=sess)
        for i in range(30001):
            # print 'times: %d' % i
            sess.run(train_step)
            if i%10 == 0:
                print 'loss:', sess.run(loss)

if __name__ == "__main__":
    Train()

这里训练 30001 次，每 10 次打印一次结果。开始训练，发现 loss 在不断减小：

$ python train.py 
loss: 139862.4
loss: 75934.01
loss: 73080.2
loss: 65201.547
loss: 57756.45
loss: 53158.58
loss: 44673.254
loss: 40073.754
loss: 39292.81
loss: 42916.02
loss: 41723.44
loss: 35896.89
loss: 38079.15
loss: 33945.703
loss: 32580.35
loss: 29202.9
loss: 23343.062
loss: 24682.982
loss: 36213.305
loss: 29628.717
loss: 24032.852
loss: 22293.295
...

发现 loss 总体是在不断减小的。经过一段时间，loss 更小了：

...
loss: 95.31898
loss: 70.991165
loss: 89.883095
loss: 77.25716
loss: 78.50725
loss: 52.714283
loss: 85.293945
loss: 88.35736
loss: 66.40239
loss: 72.92768
loss: 52.961704
loss: 61.261597
loss: 56.042004
loss: 64.03586
loss: 81.6912
loss: 94.63177
loss: 63.111412
loss: 78.860794
loss: 88.37257
loss: 61.51449
loss: 57.879826
loss: 55.705624
loss: 81.577095
loss: 82.05835
loss: 50.56248
loss: 55.78168
loss: 64.80161
loss: 44.10519
...

loss 不断减小，是网络可以正常工作的必要条件。下一节，将利用 CIFAR-10 的测试集，对我们训练好的网络测试，非常好奇咱们自己建立的网络效果如何。