- {(64+64+1+129*3)*64}*2=66,048
output_t = activation(Uo*state_t + Wo*input_t + Vo*C_t + bo).
Here, Uo and Wo have 64 parameters and 1 bias for bo. Then, what about C_t?
i_t-1 = activation(Ui*state_t-1 + Wi*input_t -1+ bi)
f_t-1= activation(Uf*state_t-1 + Wf*input_t-1+ bf)
k_t-1= activation(Uk*state_t-1 + Wk*input_t-1 + bk)
These 3 will each have 64+64+1=129 parameters and, so C_t which is a transformation of i_t-1 and f_t-1 and k_t-1 will have 129*3.
Also,(64+64+1+129*3) should be multiplied by 64 time steps b/c LSTM(64).
And everything should be doubled b/c ‘bi’directional
2. {(128+32+1+161*3)*32}*2=41,216
Uo and Wo have 128 and 32 parameters respectively and 1 bias for bo.
C_t will have (128+32+1)*3=161*3 parameters.
Also, (128+32+1+161*3) should be multiplied by 32time steps b/c LSTM(32).
And everything should be doubled b/c ‘bi’directional