我绝不是PyTorch的专家,但那个片段看起来很可疑:
# Put the embedded inputs into the GRU. output, hidden = self.gru(embedded, hidden) # Matrix manipulation magic. batch_size, sequence_len, hidden_size = output.shape # Technically, linear layer takes a 2-D matrix as input, so more manipulation... output = output.contiguous().view(batch_size * sequence_len, hidden_size)
GRU
batch_first=True
(seq_len, batch, num_directions * hidden_size)
seq_len
batch_size
view(batch_size * sequence_len, hidden_size)
32*seq_len
这样的事情应该有效:
# Put the embedded inputs into the GRU. output, hidden = self.gru(embedded, hidden) # Not needed, just to show the true output shape order seq_len, batch_size, hidden_size = output.shape # Given the shape of output, this is the last step output = output[-1] # output.shape = (batch_size, hidden_size) <-- What you want
两个警告的个人话语:
view()
(seq_len, batch_size, hidden_size)
(batch_size, seq_len*hidden_size)
view(batch_size, -1)
transpose(1,0)
(batch_size, seq_len, hidden_size)
transpose()
如果它有帮助,这就是 forward GRU分类器网络的方法:
forward
def forward(self, batch, method='last_step'): embeds = self.word_embeddings(batch) x = torch.transpose(embeds, 0, 1) x, self.hidden = self.gru(x, self.hidden) if method == 'last_step': x = x[-1] elif method == 'average_pooling': x = torch.sum(x, dim=0) / len(batch[0]) elif method == 'max_pooling': x, _ = torch.max(x, dim=0) else: raise Exception('Unknown method.') # A series of Linear layers with ReLU and Dropout for l in self.linears: x = l(x) log_probs = F.log_softmax(x, dim=1) return log_probs
这行 train() 应该
train()
output, hidden = model(x, use_softmax=False)
禁用 use_softmax 当你训练时,模型应该正确训练,训练CE损失将减少到0附近。
use_softmax
看到 https://www.kaggle.com/alvations/gru-language-model