In your paper, the moment loss is the regularization to make the moment matrix. But in your network, if I use order 1 or 2, the moment matrix is exactly zero. It seems the momentloss in the loss won't change. And if I did not misunderstand the code, the code uses the moment matrix to generate the kernel matrix?