A continuation of #1376
We added a ct-ct matmul kernel, with modifications from the Tricycle paper. The same paper has a more optimized kernel for ct-pt matmul (6.1), which should go alongside (or replace?) our existing ct-pt matmul kernel that just "stacks" the Halevi-Shoup kernel.