1. Base Principle
l=f(A),f:Rm×n→R,f∈C1
∂A∂f=∂A1,1∂f∂A2,1∂f⋮∂Am,1∂f∂A1,2∂f∂A2,2∂f⋮∂Am,2∂f⋯⋯⋱⋯∂A1,n∂f∂A2,n∂f⋮∂Am,n∂f
dl=tr(dl)=tr(df(A))=tr(i,j∑∂Ai,j∂fdAi,j)=tr((∂A∂f)TdA)
∀dA,dl=tr(BTdA)⟹∂A∂f=B
So all we need to do is to find a B that satisfies ∀dA,dl=tr(BTdA).
2. Useful Rules
2.1. Differential
- Addition rule: d(X±Y)=dX±dY
- Product rule: d(XY)=(dX)Y+XdY
- Inverse: dX−1=−X−1dXX−1
- Transpose: d(XT)=(dX)T
- Trace: dtr(X)=tr(dX)
- Determinant: d∣X∣=tr(X#dX), where X# is the adjugate matrix
- Hadamard product: d(X⊙Y)=dX⊙Y+X⊙dY
- Component-wise(element-wise) function: dσ(X)=σ′(X)⊙dX
2.2. Trace
- Scalar trace: a=tr(a)
- Transpose: tr(AT)=tr(A)
- Linearity: tr(aA+bB)=atr(A)+btr(B)
- Cyclic property: tr(AB)=tr(BA), where A and BT are conformable. Both equal to ∑i,jAijBji
- Cyclic property with Hadamard product: tr(AT(B⊙C))=tr((A⊙B)TC), where A,B,C have the same dimensions. Both equal to ∑i,jAijBijCij
3. Common differential calculations
- n : dimension of output
- y^ : predicted value
- y : target value
3.1. MSE
y^=Wx+b
l=∑i=1n(yi−y^i)2=(y^−y)T(y^−y)
let t=y^−y,
dl=tr(dl)=tr(tTdt+dtTt)=tr(tTdt)+tr(dtTt)=tr((2t)Tdt)
Thus.
∂t∂l=2t
dt=d(y^−y)=dy^
∂y^∂l=2t
dy^=d(Wx+b)=dWx
dl=tr((∂y^∂l)TdWx)=tr(x(∂y^∂l)TdW)=tr(((∂y^∂l)xT)TdW)
∂W∂l=(∂y^∂l)xT = 2txT
3.2. Eigenvalue and Eigenvector
Suppose A∈Symn(R), then A can be decomposed as
A=QΛQT
Λ=diag(λ1,λ2,⋯,λn),λi≤λi+1Q=(q1,q2,⋯,qn)∈On(R),where On(R)={M∈Rn×n∣MTM=I}
where λi are the eigenvalues of A and Q is the eigenvector matrix.
3.2.1. Eigenvalue
ΛdΛdl=QTAQ=QTdAQ=tr(dl)=tr((∂Λ∂l)TdΛ)=tr((∂Λ∂l)TQTdAQ)=tr(Q(∂Λ∂l)TQTdA)=tr((Q∂Λ∂lQT)TdA)
Thus,
∂A∂l=Q∂Λ∂lQT
3.2.2. Eigenvector
QTdQ+dQTQ=0, because QTQ=I
Let H=QTdQ
dAQTdAQ=dQΛQT+QdΛQT+QΛdQT=(QTdQ)Λ+dΛ+Λ(dQTQ)=HΛ+dΛ+ΛH
(QTdAQ)ii∀i=j,(QTdAQ)ij∀i=j,Hij∀i,Hii=dΛii=Hij(λj−λi)=λj−λi1(QTdAQ)ij=0
Let Fij={λj−λi10if i=jif i=j
H=F⊙(QTdAQ)
tr(dl)=tr(∂Q∂lTdQ)=tr(∂Q∂lTQH)=tr((QT∂Q∂l)T(F⊙(QTdAQ)))=tr(((QT∂Q∂l)⊙F)T(QTdAQ))=tr(Q ((QT∂Q∂l)⊙F)TQTdA)