It currently allocates a matrix for the computation.
That could be easily done with a temporary buffer (using the stack up to a certain threshold, then falling back to malloc).
The levenshteinDistancePath function should take an output range for the edit ops.