首先,我要重现你的情况。我将使用非常简单的模型:
码:
import torch
import torch.nn as nn
import torch.optim as optim
torch.manual_seed(42)
Some dummy data:
X = torch.randn(100, 5, requires_grad=True, dtype=torch.float)
Y = torch.randn(100, 5, requires_grad=True, dtype=torch.float)
class Model(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(5, 5, bias=False)
self.relu = nn.ReLU()
self.fc2 = nn.Linear(5, 5, bias=False)
def forward(self, x):
x = self.fc1(x)
x = self.relu(x)
x = self.fc2(x)
return x
def train(model, x, y, loss_fn, optimizer, n_epochs=1000, print_loss=True):
weights = []
for i in range(n_epochs):
y_hat = model(x)
loss = loss_fn(y_hat, y)
optimizer.zero_grad()
loss.backward()
if print_loss:
print(f'| {i+1} | Loss: {loss.item():.4f}')
optimizer.step()
print('W:\n', model.fc2.weight.data)
weights.append(model.fc2.weight.data)
return weights
torch.manual_seed(42)
model = Model()
loss_fn = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
n_epochs = 2
weights = train(model=model,
x=X,
y=Y,
loss_fn=loss_fn,
optimizer=optimizer,
n_epochs=n_epochs,
print_loss=True)
</code>
输出:
| 1 | Loss: 1.0285
W:
tensor([[-0.2052, -0.1257, -0.2684, 0.0425, -0.4413],
[ 0.4034, -0.3797, 0.3448, 0.0741, -0.1450],
[ 0.2759, 0.0695, 0.3608, 0.0487, -0.1411],
[ 0.1201, -0.1213, 0.1881, 0.3990, 0.2583],
[-0.1956, 0.2581, 0.0798, 0.2270, -0.2725]])
| 2 | Loss: 1.0279
W:
tensor([[-0.2041, -0.1251, -0.2679, 0.0428, -0.4410],
[ 0.4030, -0.3795, 0.3444, 0.0738, -0.1447],
[ 0.2755, 0.0693, 0.3603, 0.0484, -0.1411],
[ 0.1200, -0.1213, 0.1879, 0.3987, 0.2580],
[-0.1958, 0.2580, 0.0796, 0.2269, -0.2725]])
</code>
好的,它运作良好。我们现在来看看
weights
:
码:
print(*weights, sep=’\n’)
</code>
输出:
tensor([[-0.2041, -0.1251, -0.2679, 0.0428, -0.4410],
[ 0.4030, -0.3795, 0.3444, 0.0738, -0.1447],
[ 0.2755, 0.0693, 0.3603, 0.0484, -0.1411],
[ 0.1200, -0.1213, 0.1879, 0.3987, 0.2580],
[-0.1958, 0.2580, 0.0796, 0.2269, -0.2725]])
tensor([[-0.2041, -0.1251, -0.2679, 0.0428, -0.4410],
[ 0.4030, -0.3795, 0.3444, 0.0738, -0.1447],
[ 0.2755, 0.0693, 0.3603, 0.0484, -0.1411],
[ 0.1200, -0.1213, 0.1879, 0.3987, 0.2580],
[-0.1958, 0.2580, 0.0796, 0.2269, -0.2725]])
</code>
好吧,这不是我们想要的,但实际上它是预期的行为。如果再次查看,您会看到列表中的值对应于第二个纪元的权重值。这意味着我们不是追加新的张量,而是指向实际权重存储的分配,这就是为什么我们只有相同的最终结果。
换句话说,在使用常规追加时,您获得的值相同,因为渐变仍然会传播到原始权重张量。并且附加的“权重张量”指向在backprop期间改变的模型的相同张量。
这就是你需要使用的原因
clone
创造一个新的张量,
的
但
</强>
建议使用
tensor.clone().detach()
而
clone
被记录到计算图中,这意味着如果你通过这个克隆张量进行反向,
传播到克隆张量的梯度将传播到原始张量。
克隆文档
所以,如果你想安全地追加你的重量,请使用:
weights.append(model.fc2.weight.data.clone().detach())
</code>