the word tokenizer for WordDiff and WordWithSpaceDiff uses \b in its regular expression. that considers word characters as [a-zA-Z0-9_], which fails on anything beyond 7 bit.
f.e. the german phrase "wir üben" splits to:
'wir üben'.split(/\b/);
-> ["wir", " ü", "ben"]
replacing the tokenizer with value.split(/(\s+)/) is sufficient in my use-case, but i don't have newlines in my text. some further testing needed, i think.
further reading:
http://stackoverflow.com/questions/10590098/javascript-regexp-word-boundaries-unicode-characters/10590620#10590620
the word tokenizer for
WordDiffandWordWithSpaceDiffuses\bin its regular expression. that considers word characters as[a-zA-Z0-9_], which fails on anything beyond 7 bit.f.e. the german phrase "wir üben" splits to:
replacing the tokenizer with
value.split(/(\s+)/)is sufficient in my use-case, but i don't have newlines in my text. some further testing needed, i think.further reading:
http://stackoverflow.com/questions/10590098/javascript-regexp-word-boundaries-unicode-characters/10590620#10590620