为了账号安全,请及时绑定邮箱和手机立即绑定

使用tensorflow梯度磁带的内存不足但仅在我附加列表时发生

使用tensorflow梯度磁带的内存不足但仅在我附加列表时发生

摇曳的蔷薇 2022-11-24 15:25:19

我一直在使用 CNN 处理数据集 (1000,3253)。我正在通过梯度磁带运行梯度计算,但它一直在耗尽内存。然而,如果我删除将梯度计算附加到列表的行,脚本将运行所有时期。我不完全确定为什么会发生这种情况,但我对 tensorflow 和渐变带的使用也是陌生的。任何建议或意见将不胜感激


        #create a batch loop

    for x, y_true in train_dataset:            

        #create a tape to record actions



        with  tf.GradientTape(watch_accessed_variables=False) as tape:

            x_var = tf.Variable(x)

            tape.watch([model.trainable_variables,x_var])    


            y_pred = model(x_var,training=True)    

            tape.stop_recording()

            loss = los_func(y_true, y_pred)

        epoch_loss_avg.update_state(loss)

        epoch_accuracy.update_state(y_true, y_pred)                


        #pdb.set_trace() 

        gradients,something = tape.gradient(loss, (model.trainable_variables,x_var))

        #sa_input.append(tape.gradient(loss, x_var))

        del tape            



        #apply gradients

        sa_input.append(something)

        opti_func.apply_gradients(zip(gradients, model.trainable_variables)) 

    train_loss_results.append(epoch_loss_avg.result())

    train_accuracy_results.append(epoch_accuracy.result())


查看完整描述

1 回答

?
呼唤远方

TA贡献1559条经验 获得超11个赞

由于您是 TF2 的新手,建议您阅读本指南。本指南涵盖 TensorFlow 2.0 中两种广泛情况下的训练、评估和预测(推理)模型:

  1. 使用内置 API 进行训练和验证时(例如 model.fit()、model.evaluate()、model.predict())。这在“使用内置训练和评估循环”部分中有所介绍。

  2. 使用 eager execution 和 GradientTape 对象从头开始编写自定义循环时。这在“从头开始编写您自己的训练和评估循环”一节中有所介绍。

下面是一个程序,我在其中计算每个纪元后的梯度并附加到列表中。在程序结束时,为了简单起见,我将转换listarray

代码 -如果我使用多层和更大过滤器尺寸的深度网络,这个程序会抛出 OOM Error 错误

# Importing dependency

%tensorflow_version 2.x

from tensorflow import keras

from tensorflow.keras import backend as K

from tensorflow.keras import datasets

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense, Activation, Dropout, Flatten, Conv2D, MaxPooling2D

from tensorflow.keras.layers import BatchNormalization

import numpy as np

import tensorflow as tf


# Import Data

(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()


# Build Model

model = Sequential()

model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(32,32, 3)))

model.add(MaxPooling2D((2, 2)))

model.add(Conv2D(64, (3, 3), activation='relu'))

model.add(MaxPooling2D((2, 2)))

model.add(Conv2D(64, (3, 3), activation='relu'))

model.add(Flatten())

model.add(Dense(64, activation='relu'))

model.add(Dense(10))


# Model Summary

model.summary()


# Model Compile 

model.compile(optimizer='adam',

              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),

              metrics=['accuracy'])


# Define the Gradient Fucntion

epoch_gradient = []

loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)


# Define the Gradient Function

@tf.function

def get_gradient_func(model):

    with tf.GradientTape() as tape:

       logits = model(train_images, training=True)

       loss = loss_fn(train_labels, logits)    

    grad = tape.gradient(loss, model.trainable_weights)

    model.optimizer.apply_gradients(zip(grad, model.trainable_variables))

    return grad


# Define the Required Callback Function

class GradientCalcCallback(tf.keras.callbacks.Callback):

  def on_epoch_end(self, epoch, logs={}):

    grad = get_gradient_func(model)

    epoch_gradient.append(grad)


epoch = 4


print(train_images.shape, train_labels.shape)


model.fit(train_images, train_labels, epochs=epoch, validation_data=(test_images, test_labels), callbacks=[GradientCalcCallback()])


# (7) Convert to a 2 dimensiaonal array of (epoch, gradients) type

gradient = np.asarray(epoch_gradient)

print("Total number of epochs run:", epoch)

输出 -


Model: "sequential_5"

_________________________________________________________________

Layer (type)                 Output Shape              Param #   

=================================================================

conv2d_12 (Conv2D)           (None, 30, 30, 32)        896       

_________________________________________________________________

max_pooling2d_8 (MaxPooling2 (None, 15, 15, 32)        0         

_________________________________________________________________

conv2d_13 (Conv2D)           (None, 13, 13, 64)        18496     

_________________________________________________________________

max_pooling2d_9 (MaxPooling2 (None, 6, 6, 64)          0         

_________________________________________________________________

conv2d_14 (Conv2D)           (None, 4, 4, 64)          36928     

_________________________________________________________________

flatten_4 (Flatten)          (None, 1024)              0         

_________________________________________________________________

dense_11 (Dense)             (None, 64)                65600     

_________________________________________________________________

dense_12 (Dense)             (None, 10)                650       

=================================================================

Total params: 122,570

Trainable params: 122,570

Non-trainable params: 0

_________________________________________________________________

(50000, 32, 32, 3) (50000, 1)

Epoch 1/4

1563/1563 [==============================] - 109s 70ms/step - loss: 1.7026 - accuracy: 0.4081 - val_loss: 1.4490 - val_accuracy: 0.4861

Epoch 2/4

1563/1563 [==============================] - 145s 93ms/step - loss: 1.2657 - accuracy: 0.5506 - val_loss: 1.2076 - val_accuracy: 0.5752

Epoch 3/4

1563/1563 [==============================] - 151s 96ms/step - loss: 1.1103 - accuracy: 0.6097 - val_loss: 1.1122 - val_accuracy: 0.6127

Epoch 4/4

1563/1563 [==============================] - 152s 97ms/step - loss: 1.0075 - accuracy: 0.6475 - val_loss: 1.0508 - val_accuracy: 0.6371

Total number of epochs run: 4

希望这能回答您的问题。快乐学习。


查看完整回答
反对 回复 4天前

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信