use SAT encoding of fused-multiply-add
this may speed up things because of smaller formulas. Example: binary with overflow bit, total width 3 (so, values 0,1,2,3,large):
- times: 7 NAND nodes with 24 edges
- plus: 7 NAND nodes 20 edges
- fma: 12 NAND nodes with 43 edges
plan:
- introduce method with default implementation
class Semiring s where
fma :: s -> s -> s -> s
fma x y z = plus (times x y) z
- use that method in matrix multiplication
- overwrite default impl. for selected types