Matrix Calculus via Differentials, Matrix Derivative, 矩阵求导方法
Matrix Calculus
In this page, we introduce a differential based method for vector and matrix derivatives (matrix calculus), which only needs a few simple rules to derive most matrix derivatives. This method is useful and well established in mathematics; however, few documents clearly or detailedly describe it. Therefore, we make this page aiming at the comprehensive introduction of matrix calculus via differentials.
* If you want results only, there is an awesome online tool Matrix Calculus. If you want “how to,” let’s get started.
To derive a matrix derivative, we repeat using the identities 1 (the process is actually a chain rule) assisted by identities 2.
finally from eq. (2), we get .
finally from eq. (3), we get .
finally from eq. (1), we get .
finally from eq. (5), we get .
E.g. 1,
finally from eq. (2), we get .
finally from eq. (3), we get . From line 3 to 4, we use the conclusion of
, that is to say, we can derive more complicated matrix derivatives by properly utilizing the existing ones. From line 6 to 7, we use
to introduce the
in order to use eq. (3) later, which is common in scalar-by-matrix derivatives.
E.g. 3,
finally from eq. (3), we get .
finally from eq. (3), we get .
E.g. 5 - two layer neural network, ,
is a loss function such as Softmax Cross Entropy and MSE,
is an element-wise activation function such as Sigmoid and ReLU
For ,
finally from eq. (3), we get .
For ,
finally from eq. (3), we get .
E.g. 6, prove
Since
then
therefore
* See examples.md for more examples.
Now, if we fully understand the core mind of the above examples, I believe we can derive most matrix derivatives in Wiki - Matrix Calculus by ourself. Please correct me if there is any mistake, and raise issues to request the detailed steps of computing the matrix derivatives that you are interested in.