为什么 keras model.fit 使用了这么多内存，尽管使用了allow_growth

Python

largeQ 2023-07-05 16:39:24

我最近发现，尽管我使用 set_session 和allow_growth=True，使用 model.fit 仍然意味着所有内存都被分配，并且我不能再将它用于程序的其余部分，即使函数退出并且由于模型是局部变量，因此模型不应再分配任何内存。下面是一些示例代码来演示这一点：from numpy import arrayfrom keras import Input, Modelfrom keras.layers import Conv2D, Dense, Flattenfrom keras.optimizers import SGD# stops keras/tensorflow from allocating all the GPU's memory immediatelyfrom tensorflow.compat.v1.keras.backend import set_sessionfrom tensorflow.compat.v1 import Session, ConfigProto, GPUOptionstf_config = ConfigProto(gpu_options=GPUOptions(allow_growth=True))session = Session(config=tf_config)set_session(session)# makes the neural networkdef make_net(): input = Input((2, 3, 3)) conv = Conv2D(256, (1, 1))(input) flattened_input = Flatten()(conv) output = Dense(1)(flattened_input) model = Model(inputs=input, outputs=output) sgd = SGD(0.2, 0.9) model.compile(sgd, 'mean_squared_error') model.summary() return modeldef make_data(input_data, target_output): input_data.append([[[0 for i in range(3)] for j in range(3)] for k in range(2)]) target_output.append(0)def main(): data_amount = 4096 input_data = [] target_output = [] model = make_model() for i in range(data_amount): make_data(input_data, target_output) model.fit(array(input_data), array(target_output), batch_size=len(input_data)) returnwhile True: main()当我使用 Pycharm 调试器运行此代码时，我发现使用的 GPU RAM 一直保持在 0.1GB 左右，直到我第一次运行 model.fit，此时内存使用量在我的 4GB GPU RAM 中飙升至 3.2GB 。我还注意到，第一次运行 model.fit 后，内存使用量不会增加，并且如果我从网络中删除卷积层，内存使用量根本不会增加。有人可以解释一下我的问题吗？更新：将 GPUOptions 中的 per_process_gpu_memory_fraction 设置为 0.1 有助于限制所包含代码中的效果，但不会限制我的实际程序中的效果。更好的解决方案仍然会有帮助。

查看完整描述

2 回答

FFIVE

TA贡献1797条经验获得超6个赞

我曾经面临过这个问题。我从一个我再也找不到的人那里找到了解决方案。我将他的解决方案粘贴在下面。事实上，我发现如果你设置allow_growth=True，tensorflow 似乎会使用你所有的内存。所以你应该只设置你的最大限制。

尝试这个：

gpus = tf.config.experimental.list_physical_devices("GPU")

if gpus:

# Restrict TensorFlow to only use the first GPU

try:

for gpu in gpus:

tf.config.experimental.set_memory_growth(gpu, False)

tf.config.experimental.set_virtual_device_configuration(

gpu,

[

tf.config.experimental.VirtualDeviceConfiguration(

memory_limit=12288 # set your limit

)

tf.config.experimental.set_visible_devices(gpus[0], "GPU")

logical_gpus = tf.config.experimental.list_logical_devices("GPU")

print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPU")

except RuntimeError as e:

# Visible devices must be set before GPUs have been initialized

print(e)

反对回复 2023-07-05

繁华开满天机

TA贡献1816条经验获得超4个赞

使用 SGD 进行训练以及一批中的整个训练数据可能（取决于您的输入数据）非常消耗内存。尝试将您的batch_size尺寸调整为较小的尺寸（例如 8、16、32）

反对回复 2023-07-05

热搜

最近搜索清空

为什么 keras model.fit 使用了这么多内存，尽管使用了allow_growth

为什么 keras model.fit 使用了这么多内存，尽管使用了allow_growth

2 回答

添加回答