Let's keep this closed, PIC is too slow to be widely used and we have a non-SSE fallback.
One way to make this work, would be to have the weave function keeping track of the temporary (T1, T2) register usage, quite a lot of effort.
Comment #5 by github-bugzilla — 2017-01-15T01:54:29Z