tensorflow-neural-networks

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

TensorFlow Neural Networks

TensorFlow 神经网络

Build and train neural networks using TensorFlow's high-level Keras API and low-level custom implementations. This skill covers everything from simple sequential models to complex custom architectures with multiple outputs, custom layers, and advanced training techniques.
使用TensorFlow的高级Keras API和底层自定义实现构建并训练神经网络。本内容涵盖了从简单的序列模型到具有多输出、自定义层和高级训练技术的复杂自定义架构的所有内容。

Sequential Models with Keras

基于Keras的Sequential模型

The Sequential API provides the simplest way to build neural networks by stacking layers linearly.
Sequential API通过线性堆叠层提供了构建神经网络的最简单方式。

Basic Image Classification

基础图像分类

python
import tensorflow as tf
from tensorflow import keras
import numpy as np
python
import tensorflow as tf
from tensorflow import keras
import numpy as np

Load MNIST dataset

Load MNIST dataset

(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

Preprocess data

Preprocess data

x_train = x_train.astype('float32') / 255.0 x_test = x_test.astype('float32') / 255.0 x_train = x_train.reshape(-1, 28 * 28) x_test = x_test.reshape(-1, 28 * 28)
x_train = x_train.astype('float32') / 255.0 x_test = x_test.astype('float32') / 255.0 x_train = x_train.reshape(-1, 28 * 28) x_test = x_test.reshape(-1, 28 * 28)

Build Sequential model

Build Sequential model

model = keras.Sequential([ keras.layers.Dense(128, activation='relu', input_shape=(784,)), keras.layers.Dropout(0.2), keras.layers.Dense(64, activation='relu'), keras.layers.Dropout(0.2), keras.layers.Dense(10, activation='softmax') ])
model = keras.Sequential([ keras.layers.Dense(128, activation='relu', input_shape=(784,)), keras.layers.Dropout(0.2), keras.layers.Dense(64, activation='relu'), keras.layers.Dropout(0.2), keras.layers.Dense(10, activation='softmax') ])

Compile model

Compile model

model.compile( optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'] )
model.compile( optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'] )

Display model architecture

Display model architecture

model.summary()
model.summary()

Train model

Train model

history = model.fit( x_train, y_train, batch_size=32, epochs=5, validation_split=0.2, verbose=1 )
history = model.fit( x_train, y_train, batch_size=32, epochs=5, validation_split=0.2, verbose=1 )

Evaluate model

Evaluate model

test_loss, test_accuracy = model.evaluate(x_test, y_test, verbose=0) print(f"Test accuracy: {test_accuracy:.4f}")
test_loss, test_accuracy = model.evaluate(x_test, y_test, verbose=0) print(f"Test accuracy: {test_accuracy:.4f}")

Make predictions

Make predictions

predictions = model.predict(x_test[:5]) predicted_classes = np.argmax(predictions, axis=1) print(f"Predicted classes: {predicted_classes}") print(f"True classes: {y_test[:5]}")
predictions = model.predict(x_test[:5]) predicted_classes = np.argmax(predictions, axis=1) print(f"Predicted classes: {predicted_classes}") print(f"True classes: {y_test[:5]}")

Save model

Save model

model.save('mnist_model.h5')
model.save('mnist_model.h5')

Load model

Load model

loaded_model = keras.models.load_model('mnist_model.h5')
undefined
loaded_model = keras.models.load_model('mnist_model.h5')
undefined

Convolutional Neural Network

卷积神经网络(CNN)

python
def create_cnn_model(input_shape=(224, 224, 3), num_classes=1000):
    """Create CNN model for image classification."""
    model = tf.keras.Sequential([
        # Block 1
        tf.keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same',
                               input_shape=input_shape),
        tf.keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
        tf.keras.layers.MaxPooling2D((2, 2)),
        tf.keras.layers.BatchNormalization(),

        # Block 2
        tf.keras.layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
        tf.keras.layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
        tf.keras.layers.MaxPooling2D((2, 2)),
        tf.keras.layers.BatchNormalization(),

        # Block 3
        tf.keras.layers.Conv2D(256, (3, 3), activation='relu', padding='same'),
        tf.keras.layers.Conv2D(256, (3, 3), activation='relu', padding='same'),
        tf.keras.layers.MaxPooling2D((2, 2)),
        tf.keras.layers.BatchNormalization(),

        # Classification head
        tf.keras.layers.GlobalAveragePooling2D(),
        tf.keras.layers.Dense(512, activation='relu'),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(num_classes, activation='softmax')
    ])
    return model
python
def create_cnn_model(input_shape=(224, 224, 3), num_classes=1000):
    """Create CNN model for image classification."""
    model = tf.keras.Sequential([
        # Block 1
        tf.keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same',
                               input_shape=input_shape),
        tf.keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
        tf.keras.layers.MaxPooling2D((2, 2)),
        tf.keras.layers.BatchNormalization(),

        # Block 2
        tf.keras.layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
        tf.keras.layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
        tf.keras.layers.MaxPooling2D((2, 2)),
        tf.keras.layers.BatchNormalization(),

        # Block 3
        tf.keras.layers.Conv2D(256, (3, 3), activation='relu', padding='same'),
        tf.keras.layers.Conv2D(256, (3, 3), activation='relu', padding='same'),
        tf.keras.layers.MaxPooling2D((2, 2)),
        tf.keras.layers.BatchNormalization(),

        # Classification head
        tf.keras.layers.GlobalAveragePooling2D(),
        tf.keras.layers.Dense(512, activation='relu'),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(num_classes, activation='softmax')
    ])
    return model

CIFAR-10 CNN Architecture

CIFAR-10 CNN架构

python
def generate_model():
    return tf.keras.models.Sequential([
        tf.keras.layers.Conv2D(32, (3, 3), padding='same', input_shape=x_train.shape[1:]),
        tf.keras.layers.Activation('relu'),
        tf.keras.layers.Conv2D(32, (3, 3)),
        tf.keras.layers.Activation('relu'),
        tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
        tf.keras.layers.Dropout(0.25),

        tf.keras.layers.Conv2D(64, (3, 3), padding='same'),
        tf.keras.layers.Activation('relu'),
        tf.keras.layers.Conv2D(64, (3, 3)),
        tf.keras.layers.Activation('relu'),
        tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
        tf.keras.layers.Dropout(0.25),

        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(512),
        tf.keras.layers.Activation('relu'),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(10),
        tf.keras.layers.Activation('softmax')
    ])

model = generate_model()
python
def generate_model():
    return tf.keras.models.Sequential([
        tf.keras.layers.Conv2D(32, (3, 3), padding='same', input_shape=x_train.shape[1:]),
        tf.keras.layers.Activation('relu'),
        tf.keras.layers.Conv2D(32, (3, 3)),
        tf.keras.layers.Activation('relu'),
        tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
        tf.keras.layers.Dropout(0.25),

        tf.keras.layers.Conv2D(64, (3, 3), padding='same'),
        tf.keras.layers.Activation('relu'),
        tf.keras.layers.Conv2D(64, (3, 3)),
        tf.keras.layers.Activation('relu'),
        tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
        tf.keras.layers.Dropout(0.25),

        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(512),
        tf.keras.layers.Activation('relu'),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(10),
        tf.keras.layers.Activation('softmax')
    ])

model = generate_model()

Custom Layers

自定义层

Create reusable custom layers by subclassing
tf.keras.layers.Layer
.
通过继承
tf.keras.layers.Layer
创建可复用的自定义层。

Custom Dense Layer

自定义Dense层

python
import tensorflow as tf

class CustomDense(tf.keras.layers.Layer):
    def __init__(self, units=32, activation=None):
        super(CustomDense, self).__init__()
        self.units = units
        self.activation = tf.keras.activations.get(activation)

    def build(self, input_shape):
        """Create layer weights."""
        self.w = self.add_weight(
            shape=(input_shape[-1], self.units),
            initializer='glorot_uniform',
            trainable=True,
            name='kernel'
        )
        self.b = self.add_weight(
            shape=(self.units,),
            initializer='zeros',
            trainable=True,
            name='bias'
        )

    def call(self, inputs):
        """Forward pass."""
        output = tf.matmul(inputs, self.w) + self.b
        if self.activation is not None:
            output = self.activation(output)
        return output

    def get_config(self):
        """Enable serialization."""
        config = super().get_config()
        config.update({
            'units': self.units,
            'activation': tf.keras.activations.serialize(self.activation)
        })
        return config
python
import tensorflow as tf

class CustomDense(tf.keras.layers.Layer):
    def __init__(self, units=32, activation=None):
        super(CustomDense, self).__init__()
        self.units = units
        self.activation = tf.keras.activations.get(activation)

    def build(self, input_shape):
        """Create layer weights."""
        self.w = self.add_weight(
            shape=(input_shape[-1], self.units),
            initializer='glorot_uniform',
            trainable=True,
            name='kernel'
        )
        self.b = self.add_weight(
            shape=(self.units,),
            initializer='zeros',
            trainable=True,
            name='bias'
        )

    def call(self, inputs):
        """Forward pass."""
        output = tf.matmul(inputs, self.w) + self.b
        if self.activation is not None:
            output = self.activation(output)
        return output

    def get_config(self):
        """Enable serialization."""
        config = super().get_config()
        config.update({
            'units': self.units,
            'activation': tf.keras.activations.serialize(self.activation)
        })
        return config

Use custom components

Use custom components

custom_model = tf.keras.Sequential([ CustomDense(64, activation='relu', input_shape=(10,)), CustomDense(32, activation='relu'), CustomDense(1, activation='sigmoid') ])
custom_model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
undefined
custom_model = tf.keras.Sequential([ CustomDense(64, activation='relu', input_shape=(10,)), CustomDense(32, activation='relu'), CustomDense(1, activation='sigmoid') ])
custom_model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
undefined

Residual Block

残差块

python
import tensorflow as tf

class ResidualBlock(tf.keras.layers.Layer):
    def __init__(self, filters, kernel_size=3):
        super(ResidualBlock, self).__init__()
        self.conv1 = tf.keras.layers.Conv2D(filters, kernel_size, padding='same')
        self.bn1 = tf.keras.layers.BatchNormalization()
        self.conv2 = tf.keras.layers.Conv2D(filters, kernel_size, padding='same')
        self.bn2 = tf.keras.layers.BatchNormalization()
        self.activation = tf.keras.layers.Activation('relu')
        self.add = tf.keras.layers.Add()

    def call(self, inputs, training=False):
        x = self.conv1(inputs)
        x = self.bn1(x, training=training)
        x = self.activation(x)
        x = self.conv2(x)
        x = self.bn2(x, training=training)
        x = self.add([x, inputs])  # Residual connection
        x = self.activation(x)
        return x
python
import tensorflow as tf

class ResidualBlock(tf.keras.layers.Layer):
    def __init__(self, filters, kernel_size=3):
        super(ResidualBlock, self).__init__()
        self.conv1 = tf.keras.layers.Conv2D(filters, kernel_size, padding='same')
        self.bn1 = tf.keras.layers.BatchNormalization()
        self.conv2 = tf.keras.layers.Conv2D(filters, kernel_size, padding='same')
        self.bn2 = tf.keras.layers.BatchNormalization()
        self.activation = tf.keras.layers.Activation('relu')
        self.add = tf.keras.layers.Add()

    def call(self, inputs, training=False):
        x = self.conv1(inputs)
        x = self.bn1(x, training=training)
        x = self.activation(x)
        x = self.conv2(x)
        x = self.bn2(x, training=training)
        x = self.add([x, inputs])  # Residual connection
        x = self.activation(x)
        return x

Custom Projection Layer with TF NumPy

基于TF NumPy的自定义投影层

python
class ProjectionLayer(tf.keras.layers.Layer):
    """Linear projection layer using TF NumPy."""

    def __init__(self, units):
        super(ProjectionLayer, self).__init__()
        self._units = units

    def build(self, input_shape):
        import tensorflow.experimental.numpy as tnp
        stddev = tnp.sqrt(self._units).astype(tnp.float32)
        initial_value = tnp.random.randn(input_shape[1], self._units).astype(
            tnp.float32) / stddev
        # Note that TF NumPy can interoperate with tf.Variable.
        self.w = tf.Variable(initial_value, trainable=True)

    def call(self, inputs):
        import tensorflow.experimental.numpy as tnp
        return tnp.matmul(inputs, self.w)
python
class ProjectionLayer(tf.keras.layers.Layer):
    """Linear projection layer using TF NumPy."""

    def __init__(self, units):
        super(ProjectionLayer, self).__init__()
        self._units = units

    def build(self, input_shape):
        import tensorflow.experimental.numpy as tnp
        stddev = tnp.sqrt(self._units).astype(tnp.float32)
        initial_value = tnp.random.randn(input_shape[1], self._units).astype(
            tnp.float32) / stddev
        # Note that TF NumPy can interoperate with tf.Variable.
        self.w = tf.Variable(initial_value, trainable=True)

    def call(self, inputs):
        import tensorflow.experimental.numpy as tnp
        return tnp.matmul(inputs, self.w)

Call with ndarray inputs

Call with ndarray inputs

layer = ProjectionLayer(2) tnp_inputs = tnp.random.randn(2, 4).astype(tnp.float32) print("output:", layer(tnp_inputs))
layer = ProjectionLayer(2) tnp_inputs = tnp.random.randn(2, 4).astype(tnp.float32) print("output:", layer(tnp_inputs))

Call with tf.Tensor inputs

Call with tf.Tensor inputs

tf_inputs = tf.random.uniform([2, 4]) print("\noutput: ", layer(tf_inputs))
undefined
tf_inputs = tf.random.uniform([2, 4]) print("\noutput: ", layer(tf_inputs))
undefined

Custom Models

自定义模型

Build complex architectures by subclassing
tf.keras.Model
.
通过继承
tf.keras.Model
构建复杂架构。

Multi-Task Model

多任务模型

python
import tensorflow as tf

class MultiTaskModel(tf.keras.Model):
    def __init__(self, num_classes_task1=10, num_classes_task2=5):
        super(MultiTaskModel, self).__init__()
        # Shared layers
        self.conv1 = tf.keras.layers.Conv2D(32, 3, activation='relu')
        self.pool = tf.keras.layers.MaxPooling2D()
        self.flatten = tf.keras.layers.Flatten()
        self.shared_dense = tf.keras.layers.Dense(128, activation='relu')

        # Task-specific layers
        self.task1_dense = tf.keras.layers.Dense(64, activation='relu')
        self.task1_output = tf.keras.layers.Dense(num_classes_task1,
                                                   activation='softmax', name='task1')

        self.task2_dense = tf.keras.layers.Dense(64, activation='relu')
        self.task2_output = tf.keras.layers.Dense(num_classes_task2,
                                                   activation='softmax', name='task2')

    def call(self, inputs, training=False):
        # Shared feature extraction
        x = self.conv1(inputs)
        x = self.pool(x)
        x = self.flatten(x)
        x = self.shared_dense(x)

        # Task 1 branch
        task1 = self.task1_dense(x)
        task1_output = self.task1_output(task1)

        # Task 2 branch
        task2 = self.task2_dense(x)
        task2_output = self.task2_output(task2)

        return task1_output, task2_output
python
import tensorflow as tf

class MultiTaskModel(tf.keras.Model):
    def __init__(self, num_classes_task1=10, num_classes_task2=5):
        super(MultiTaskModel, self).__init__()
        # Shared layers
        self.conv1 = tf.keras.layers.Conv2D(32, 3, activation='relu')
        self.pool = tf.keras.layers.MaxPooling2D()
        self.flatten = tf.keras.layers.Flatten()
        self.shared_dense = tf.keras.layers.Dense(128, activation='relu')

        # Task-specific layers
        self.task1_dense = tf.keras.layers.Dense(64, activation='relu')
        self.task1_output = tf.keras.layers.Dense(num_classes_task1,
                                                   activation='softmax', name='task1')

        self.task2_dense = tf.keras.layers.Dense(64, activation='relu')
        self.task2_output = tf.keras.layers.Dense(num_classes_task2,
                                                   activation='softmax', name='task2')

    def call(self, inputs, training=False):
        # Shared feature extraction
        x = self.conv1(inputs)
        x = self.pool(x)
        x = self.flatten(x)
        x = self.shared_dense(x)

        # Task 1 branch
        task1 = self.task1_dense(x)
        task1_output = self.task1_output(task1)

        # Task 2 branch
        task2 = self.task2_dense(x)
        task2_output = self.task2_output(task2)

        return task1_output, task2_output

Three-Layer Neural Network Module

三层神经网络模块

python
class Model(tf.Module):
    """A three layer neural network."""

    def __init__(self):
        self.layer1 = Dense(128)
        self.layer2 = Dense(32)
        self.layer3 = Dense(NUM_CLASSES, use_relu=False)

    def __call__(self, inputs):
        x = self.layer1(inputs)
        x = self.layer2(x)
        return self.layer3(x)

    @property
    def params(self):
        return self.layer1.params + self.layer2.params + self.layer3.params
python
class Model(tf.Module):
    """A three layer neural network."""

    def __init__(self):
        self.layer1 = Dense(128)
        self.layer2 = Dense(32)
        self.layer3 = Dense(NUM_CLASSES, use_relu=False)

    def __call__(self, inputs):
        x = self.layer1(inputs)
        x = self.layer2(x)
        return self.layer3(x)

    @property
    def params(self):
        return self.layer1.params + self.layer2.params + self.layer3.params

Recurrent Neural Networks

循环神经网络(RNN)

Custom GRU Cell

自定义GRU单元

python
import tensorflow.experimental.numpy as tnp

class GRUCell:
    """Builds a traditional GRU cell with dense internal transformations.

    Gated Recurrent Unit paper: https://arxiv.org/abs/1412.3555
    """

    def __init__(self, n_units, forget_bias=0.0):
        self._n_units = n_units
        self._forget_bias = forget_bias
        self._built = False

    def __call__(self, inputs):
        if not self._built:
            self.build(inputs)
        x, gru_state = inputs
        # Dense layer on the concatenation of x and h.
        y = tnp.dot(tnp.concatenate([x, gru_state], axis=-1), self.w1) + self.b1
        # Update and reset gates.
        u, r = tnp.split(tf.sigmoid(y), 2, axis=-1)
        # Candidate.
        c = tnp.dot(tnp.concatenate([x, r * gru_state], axis=-1), self.w2) + self.b2
        new_gru_state = u * gru_state + (1 - u) * tnp.tanh(c)
        return new_gru_state

    def build(self, inputs):
        # State last dimension must be n_units.
        assert inputs[1].shape[-1] == self._n_units
        # The dense layer input is the input and half of the GRU state.
        dense_shape = inputs[0].shape[-1] + self._n_units
        self.w1 = tf.Variable(tnp.random.uniform(
            -0.01, 0.01, (dense_shape, 2 * self._n_units)).astype(tnp.float32))
        self.b1 = tf.Variable((tnp.random.randn(2 * self._n_units) * 1e-6 + self._forget_bias
                   ).astype(tnp.float32))
        self.w2 = tf.Variable(tnp.random.uniform(
            -0.01, 0.01, (dense_shape, self._n_units)).astype(tnp.float32))
        self.b2 = tf.Variable((tnp.random.randn(self._n_units) * 1e-6).astype(tnp.float32))
        self._built = True

    @property
    def weights(self):
        return (self.w1, self.b1, self.w2, self.b2)
python
import tensorflow.experimental.numpy as tnp

class GRUCell:
    """Builds a traditional GRU cell with dense internal transformations.

    Gated Recurrent Unit paper: https://arxiv.org/abs/1412.3555
    """

    def __init__(self, n_units, forget_bias=0.0):
        self._n_units = n_units
        self._forget_bias = forget_bias
        self._built = False

    def __call__(self, inputs):
        if not self._built:
            self.build(inputs)
        x, gru_state = inputs
        # Dense layer on the concatenation of x and h.
        y = tnp.dot(tnp.concatenate([x, gru_state], axis=-1), self.w1) + self.b1
        # Update and reset gates.
        u, r = tnp.split(tf.sigmoid(y), 2, axis=-1)
        # Candidate.
        c = tnp.dot(tnp.concatenate([x, r * gru_state], axis=-1), self.w2) + self.b2
        new_gru_state = u * gru_state + (1 - u) * tnp.tanh(c)
        return new_gru_state

    def build(self, inputs):
        # State last dimension must be n_units.
        assert inputs[1].shape[-1] == self._n_units
        # The dense layer input is the input and half of the GRU state.
        dense_shape = inputs[0].shape[-1] + self._n_units
        self.w1 = tf.Variable(tnp.random.uniform(
            -0.01, 0.01, (dense_shape, 2 * self._n_units)).astype(tnp.float32))
        self.b1 = tf.Variable((tnp.random.randn(2 * self._n_units) * 1e-6 + self._forget_bias
                   ).astype(tnp.float32))
        self.w2 = tf.Variable(tnp.random.uniform(
            -0.01, 0.01, (dense_shape, self._n_units)).astype(tnp.float32))
        self.b2 = tf.Variable((tnp.random.randn(self._n_units) * 1e-6).astype(tnp.float32))
        self._built = True

    @property
    def weights(self):
        return (self.w1, self.b1, self.w2, self.b2)

Custom Dense Layer Implementation

自定义Dense层实现

python
import tensorflow.experimental.numpy as tnp

class Dense:
    def __init__(self, n_units, activation=None):
        self._n_units = n_units
        self._activation = activation
        self._built = False

    def __call__(self, inputs):
        if not self._built:
            self.build(inputs)
        y = tnp.dot(inputs, self.w) + self.b
        if self._activation != None:
            y = self._activation(y)
        return y

    def build(self, inputs):
        shape_w = (inputs.shape[-1], self._n_units)
        lim = tnp.sqrt(6.0 / (shape_w[0] + shape_w[1]))
        self.w = tf.Variable(tnp.random.uniform(-lim, lim, shape_w).astype(tnp.float32))
        self.b = tf.Variable((tnp.random.randn(self._n_units) * 1e-6).astype(tnp.float32))
        self._built = True

    @property
    def weights(self):
        return (self.w, self.b)
python
import tensorflow.experimental.numpy as tnp

class Dense:
    def __init__(self, n_units, activation=None):
        self._n_units = n_units
        self._activation = activation
        self._built = False

    def __call__(self, inputs):
        if not self._built:
            self.build(inputs)
        y = tnp.dot(inputs, self.w) + self.b
        if self._activation != None:
            y = self._activation(y)
        return y

    def build(self, inputs):
        shape_w = (inputs.shape[-1], self._n_units)
        lim = tnp.sqrt(6.0 / (shape_w[0] + shape_w[1]))
        self.w = tf.Variable(tnp.random.uniform(-lim, lim, shape_w).astype(tnp.float32))
        self.b = tf.Variable((tnp.random.randn(self._n_units) * 1e-6).astype(tnp.float32))
        self._built = True

    @property
    def weights(self):
        return (self.w, self.b)

Sequential RNN Model

序列RNN模型

python
class Model:
    def __init__(self, vocab_size, embedding_dim, rnn_units, forget_bias=0.0, stateful=False, activation=None):
        self._embedding = Embedding(vocab_size, embedding_dim)
        self._gru = GRU(rnn_units, forget_bias=forget_bias, stateful=stateful)
        self._dense = Dense(vocab_size, activation=activation)
        self._layers = [self._embedding, self._gru, self._dense]
        self._built = False

    def __call__(self, inputs):
        if not self._built:
            self.build(inputs)
        xs = inputs
        for layer in self._layers:
            xs = layer(xs)
        return xs

    def build(self, inputs):
        self._embedding.build(inputs)
        self._gru.build(tf.TensorSpec(inputs.shape + (self._embedding._embedding_dim,), tf.float32))
        self._dense.build(tf.TensorSpec(inputs.shape + (self._gru._cell._n_units,), tf.float32))
        self._built = True

    @property
    def weights(self):
        return [layer.weights for layer in self._layers]

    @property
    def state(self):
        return self._gru.state

    def create_state(self, *args):
        self._gru.create_state(*args)

    def reset_state(self, *args):
        self._gru.reset_state(*args)
python
class Model:
    def __init__(self, vocab_size, embedding_dim, rnn_units, forget_bias=0.0, stateful=False, activation=None):
        self._embedding = Embedding(vocab_size, embedding_dim)
        self._gru = GRU(rnn_units, forget_bias=forget_bias, stateful=stateful)
        self._dense = Dense(vocab_size, activation=activation)
        self._layers = [self._embedding, self._gru, self._dense]
        self._built = False

    def __call__(self, inputs):
        if not self._built:
            self.build(inputs)
        xs = inputs
        for layer in self._layers:
            xs = layer(xs)
        return xs

    def build(self, inputs):
        self._embedding.build(inputs)
        self._gru.build(tf.TensorSpec(inputs.shape + (self._embedding._embedding_dim,), tf.float32))
        self._dense.build(tf.TensorSpec(inputs.shape + (self._gru._cell._n_units,), tf.float32))
        self._built = True

    @property
    def weights(self):
        return [layer.weights for layer in self._layers]

    @property
    def state(self):
        return self._gru.state

    def create_state(self, *args):
        self._gru.create_state(*args)

    def reset_state(self, *args):
        self._gru.reset_state(*args)

Training Configuration

训练配置

Model Parameters

模型参数

python
undefined
python
undefined

Length of the vocabulary in chars

Length of the vocabulary in chars

vocab_size = len(vocab)
vocab_size = len(vocab)

The embedding dimension

The embedding dimension

embedding_dim = 256
embedding_dim = 256

Number of RNN units

Number of RNN units

rnn_units = 1024
rnn_units = 1024

Batch size

Batch size

BATCH_SIZE = 64
BATCH_SIZE = 64

Buffer size to shuffle the dataset

Buffer size to shuffle the dataset

BUFFER_SIZE = 10000
undefined
BUFFER_SIZE = 10000
undefined

Training Constants for MNIST

MNIST训练常量

python
undefined
python
undefined

Size of each input image, 28 x 28 pixels

Size of each input image, 28 x 28 pixels

IMAGE_SIZE = 28 * 28
IMAGE_SIZE = 28 * 28

Number of distinct number labels, [0..9]

Number of distinct number labels, [0..9]

NUM_CLASSES = 10
NUM_CLASSES = 10

Number of examples in each training batch (step)

Number of examples in each training batch (step)

TRAIN_BATCH_SIZE = 100
TRAIN_BATCH_SIZE = 100

Number of training steps to run

Number of training steps to run

TRAIN_STEPS = 1000
TRAIN_STEPS = 1000

Loads MNIST dataset.

Loads MNIST dataset.

train, test = tf.keras.datasets.mnist.load_data() train_ds = tf.data.Dataset.from_tensor_slices(train).batch(TRAIN_BATCH_SIZE).repeat()
train, test = tf.keras.datasets.mnist.load_data() train_ds = tf.data.Dataset.from_tensor_slices(train).batch(TRAIN_BATCH_SIZE).repeat()

Casting from raw data to the required datatypes.

Casting from raw data to the required datatypes.

def cast(images, labels): images = tf.cast( tf.reshape(images, [-1, IMAGE_SIZE]), tf.float32) labels = tf.cast(labels, tf.int64) return (images, labels)
undefined
def cast(images, labels): images = tf.cast( tf.reshape(images, [-1, IMAGE_SIZE]), tf.float32) labels = tf.cast(labels, tf.int64) return (images, labels)
undefined

Post-Training Quantization

训练后量化

python
undefined
python
undefined

Load MNIST dataset

Load MNIST dataset

mnist = keras.datasets.mnist (train_images, train_labels), (test_images, test_labels) = mnist.load_data()
mnist = keras.datasets.mnist (train_images, train_labels), (test_images, test_labels) = mnist.load_data()

Normalize the input image so that each pixel value is between 0 to 1.

Normalize the input image so that each pixel value is between 0 to 1.

train_images = train_images / 255.0 test_images = test_images / 255.0
train_images = train_images / 255.0 test_images = test_images / 255.0

Define the model architecture

Define the model architecture

model = keras.Sequential([ keras.layers.InputLayer(input_shape=(28, 28)), keras.layers.Reshape(target_shape=(28, 28, 1)), keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation=tf.nn.relu), keras.layers.MaxPooling2D(pool_size=(2, 2)), keras.layers.Flatten(), keras.layers.Dense(10) ])
model = keras.Sequential([ keras.layers.InputLayer(input_shape=(28, 28)), keras.layers.Reshape(target_shape=(28, 28, 1)), keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation=tf.nn.relu), keras.layers.MaxPooling2D(pool_size=(2, 2)), keras.layers.Flatten(), keras.layers.Dense(10) ])

Train the digit classification model

Train the digit classification model

model.compile(optimizer='adam', loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy']) model.fit( train_images, train_labels, epochs=1, validation_data=(test_images, test_labels) )
undefined
model.compile(optimizer='adam', loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy']) model.fit( train_images, train_labels, epochs=1, validation_data=(test_images, test_labels) )
undefined

When to Use This Skill

何时使用本内容

Use the tensorflow-neural-networks skill when you need to:
  • Build image classification models with CNNs
  • Create text processing models with RNNs or transformers
  • Implement custom layer architectures for specific use cases
  • Design multi-task learning models with shared representations
  • Train sequential models for tabular data
  • Implement residual connections or skip connections
  • Create embedding layers for discrete inputs
  • Build autoencoders or generative models
  • Fine-tune pre-trained models with custom heads
  • Implement attention mechanisms in custom architectures
  • Create time-series prediction models
  • Design reinforcement learning policy networks
  • Build siamese networks for similarity learning
  • Implement custom gradient computation in layers
  • Create models with dynamic architectures based on input
当你需要以下操作时,使用本TensorFlow神经网络相关内容:
  • 用CNN构建图像分类模型
  • 用RNN或Transformer创建文本处理模型
  • 为特定用例实现自定义层架构
  • 设计具有共享表示的多任务学习模型
  • 为表格数据训练序列模型
  • 实现残差连接或跳跃连接
  • 为离散输入创建嵌入层
  • 构建自编码器或生成模型
  • 用自定义头部微调预训练模型
  • 在自定义架构中实现注意力机制
  • 创建时间序列预测模型
  • 设计强化学习策略网络
  • 构建用于相似性学习的孪生网络
  • 在层中实现自定义梯度计算
  • 根据输入创建具有动态架构的模型

Best Practices

最佳实践

  1. Use Keras Sequential for simple architectures - Start with Sequential API for linear layer stacks before moving to functional or subclassing APIs
  2. Leverage pre-built layers - Use tf.keras.layers built-in implementations before creating custom layers
  3. Initialize weights properly - Use appropriate initializers (glorot_uniform, he_normal) based on activation functions
  4. Add batch normalization - Place BatchNormalization layers after Conv2D/Dense layers for training stability
  5. Use dropout for regularization - Apply Dropout layers (0.2-0.5) to prevent overfitting in fully connected layers
  6. Compile before training - Always call model.compile() with optimizer, loss, and metrics before fit()
  7. Monitor validation metrics - Use validation_split or validation_data to track overfitting during training
  8. Save model checkpoints - Implement ModelCheckpoint callback to save best models during training
  9. Use model.summary() - Verify architecture and parameter counts before training
  10. Implement early stopping - Add EarlyStopping callback to prevent unnecessary training iterations
  11. Normalize input data - Scale pixel values to [0,1] or standardize features to mean=0, std=1
  12. Use appropriate activation functions - ReLU for hidden layers, softmax for multi-class, sigmoid for binary
  13. Set proper loss functions - sparse_categorical_crossentropy for integer labels, categorical_crossentropy for one-hot
  14. Implement custom get_config() - Override get_config() in custom layers for model serialization
  15. Use training parameter in call() - Pass training flag to enable/disable dropout and batch norm behavior
  1. 简单架构使用Keras Sequential - 在使用函数式或子类化API之前,先用Sequential API构建线性层堆叠模型
  2. 利用预构建层 - 在创建自定义层之前,优先使用tf.keras.layers的内置实现
  3. 正确初始化权重 - 根据激活函数选择合适的初始化器(glorot_uniform、he_normal等)
  4. 添加批量归一化 - 在Conv2D/Dense层之后放置BatchNormalization层以提升训练稳定性
  5. 使用Dropout进行正则化 - 在全连接层中应用Dropout层(0.2-0.5)以防止过拟合
  6. 训练前编译模型 - 在调用fit()之前,务必使用optimizer、loss和metrics调用model.compile()
  7. 监控验证指标 - 使用validation_split或validation_data在训练期间跟踪过拟合情况
  8. 保存模型检查点 - 实现ModelCheckpoint回调以在训练期间保存最佳模型
  9. 使用model.summary() - 在训练前验证架构和参数数量
  10. 实现早停机制 - 添加EarlyStopping回调以避免不必要的训练迭代
  11. 归一化输入数据 - 将像素值缩放到[0,1]或将特征标准化为均值=0、标准差=1
  12. 使用合适的激活函数 - 隐藏层用ReLU,多分类用softmax,二分类用sigmoid
  13. 设置正确的损失函数 - 整数标签用sparse_categorical_crossentropy,独热编码标签用categorical_crossentropy
  14. 实现自定义get_config() - 在自定义层中重写get_config()以支持模型序列化
  15. 在call()中使用training参数 - 传递training标志以启用/禁用dropout和批量归一化的推理模式行为

Common Pitfalls

常见陷阱

  1. Forgetting to normalize data - Unnormalized inputs cause slow convergence and poor performance
  2. Wrong loss function for labels - Using categorical_crossentropy with integer labels causes errors
  3. Missing input_shape - First layer needs input_shape parameter for model building
  4. Overfitting on small datasets - Add dropout, augmentation, or reduce model capacity
  5. Learning rate too high - Causes unstable training and loss divergence
  6. Not shuffling training data - Leads to biased batch statistics and poor generalization
  7. Batch size too small - Causes noisy gradients and slow training on large datasets
  8. Too many parameters - Large models overfit and train slowly on limited data
  9. Vanishing gradients in deep networks - Use residual connections or batch normalization
  10. Not using validation data - Cannot detect overfitting or tune hyperparameters properly
  11. Forgetting to set training=False - Dropout/BatchNorm behave incorrectly during inference
  12. Incompatible layer dimensions - Output shape of one layer must match input of next
  13. Not calling build() before weights - Custom layers need proper initialization before accessing weights
  14. Using wrong optimizer - Adam works well generally, but SGD with momentum for some tasks
  15. Ignoring class imbalance - Implement class weights or resampling for imbalanced datasets
  1. 忘记归一化数据 - 未归一化的输入会导致收敛缓慢和性能不佳
  2. 为标签选择错误的损失函数 - 对整数标签使用categorical_crossentropy会导致错误
  3. 缺少input_shape - 第一层需要input_shape参数来构建模型
  4. 小数据集上的过拟合 - 添加Dropout、数据增强或降低模型容量
  5. 学习率过高 - 导致训练不稳定和损失发散
  6. 未打乱训练数据 - 导致批次统计有偏差,泛化能力差
  7. 批量大小过小 - 在大数据集上导致梯度噪声大,训练缓慢
  8. 参数过多 - 大型模型在有限数据上容易过拟合且训练缓慢
  9. 深度网络中的梯度消失 - 使用残差连接或批量归一化
  10. 未使用验证数据 - 无法检测过拟合或调整超参数
  11. 忘记设置training=False - 推理时Dropout/BatchNorm行为不正确
  12. 层维度不兼容 - 前一层的输出形状必须匹配后一层的输入形状
  13. 访问权重前未调用build() - 自定义层在访问权重前需要正确初始化
  14. 使用错误的优化器 - Adam通常表现良好,但某些任务适合带动量的SGD
  15. 忽略类别不平衡 - 对不平衡数据集实现类别权重或重采样

Resources

参考资源