PROSAGA码农传奇-pytorch-GRU语言模型没有正确训练

<div class =“post-text”itemprop =“text”>
  <P>
    我绝不是PyTorch的专家，但那个片段看起来很可疑：
  </p>
   <pre>
    <code>
          # Put the embedded inputs into the GRU.
    output, hidden = self.gru(embedded, hidden)
    # Matrix manipulation magic.
    batch_size, sequence_len, hidden_size = output.shape
    # Technically, linear layer takes a 2-D matrix as input, so more manipulation...
    output = output.contiguous().view(batch_size * sequence_len, hidden_size)

</code>
  </pre>
  <UL>
    <LI>
      什么时候
       <code>
        GRU
      </code>
       没有实例化
       <code>
        batch_first=True
      </code>
      ，然后输出形状是
       <code>
        (seq_len, batch, num_directions * hidden_size)
      </code>
        -  不是那个
       <code>
        seq_len
      </code>
       和
       <code>
        batch_size
      </code>
       被翻转。对于view命令，它实际上并不重要，但这是我的主要问题。
    </LI>
    <LI>
       <code>
        view(batch_size * sequence_len, hidden_size)
      </code>
       看起来并不正确。假设您从批量为32的批次开始，但在此之后您的大小为
       <code>
        32*seq_len
      </code>
      。通常，仅使用最后一步的输出（或所有步骤的平均值或最大值）
    </LI>
  </UL>
  <P>
    这样的事情应该有效：
  </p>
   <pre>
    <code>
          # Put the embedded inputs into the GRU.
    output, hidden = self.gru(embedded, hidden)
    # Not needed, just to show the true output shape order
    seq_len, batch_size, hidden_size = output.shape
    # Given the shape of output, this is the last step
    output = output[-1]
    # output.shape = (batch_size, hidden_size) <-- What you want

</code>
  </pre>
  <P>
    两个警告的个人话语：
  </p>
  <UL>
    <LI>
       <code>
        view()
      </code>
       是一个危险的命令！当张量的尺寸不匹配时，PyTorch或任何其他框架仅抛出错误。但仅仅因为尺寸适合
       <code>
        view()
      </code>
       并不意味着重塑已正确完成，即值在输出张量的正确位置。例如，如果必须展平形状
       <code>
        (seq_len, batch_size, hidden_size)
      </code>
       至
       <code>
        (batch_size, seq_len*hidden_size)
      </code>
      ，你不能简单地做
       <code>
        view(batch_size, -1)
      </code>
      ，但首先要做
       <code>
        transpose(1,0)
      </code>
       得到一个形状
       <code>
        (batch_size, seq_len, hidden_size)
      </code>
      。没有了
       <code>
        transpose()
      </code>
      ，
       <code>
        view()
      </code>
       将工作，尺寸将是正确的。但只有
       <code>
        transpose()
      </code>
      ，值在之后的正确位置
       <code>
        view()
      </code>
    </LI>
    <LI>
      由于这是一个很容易犯的错误，我在GitHub上看到了很多例子，在我看来，这些都没有正确完成。问题是网络经常仍然学到一些东西。简而言之，在查看和采用代码片段时，我并不是更加小心
       <code>
        view()
      </code>
       命令在我看来是最大的陷阱。
    </LI>
  </UL>
  <P>
    如果它有帮助，这就是
     <code>
      forward
    </code>
     GRU分类器网络的方法：
  </p>
   <pre>
    <code>
      def forward(self, batch, method='last_step'):
    embeds = self.word_embeddings(batch)
    x = torch.transpose(embeds, 0, 1)
    x, self.hidden = self.gru(x, self.hidden)

if method == 'last_step':
        x = x[-1]
    elif method == 'average_pooling':
        x = torch.sum(x, dim=0) / len(batch[0])
    elif method == 'max_pooling':
        x, _ = torch.max(x, dim=0)
    else:
        raise Exception('Unknown method.')
    # A series of Linear layers with ReLU and Dropout
    for l in self.linears:
        x = l(x)
    log_probs = F.log_softmax(x, dim=1)
    return log_probs

</code>
  </pre>
</DIV>