使用DB有帮助的原因是因为DB库将数据存储在硬盘而不是内存中。如果您查看库的文档,链接的答案建议您将看到第一个参数是文件名,证明使用了硬盘。 https://docs.python.org/2/library/bsddb.html#bsddb.hashopen
然而,关联的问题是谈论排序 的 值 强> ,而不是关键。尽管在训练模型时可能仍然存在内存问题,但按键排序将减少内存密集度。我建议尝试一些类似的东西
# Get the list of file names imgs = os.listdir('images') # Create a mapping of ID to file name # This will allow us to sort the IDs then load the files in order img_ids = {int(img.split('_')[1]): img for img in imgs} # Get the list of file names sorted by ID sorted_imgs = [v for k, v in sorted(img_ids.items(), key=lambda x: x[0])] # Define a function for loading a named img def load_img(img): loadimg = load_img(os.path.join('images', img)) return image.img_to_array(loadimg) # Iterate through the sorted file names and stack the results data_dict = np.stack([load_img(img) for img in sorted_imgs])