Prevented accented letters splitting words in client-side chatPostprocessor

This commit is contained in:
rainbow napkin 2025-12-26 12:54:41 -05:00
parent 7b054b235d
commit d669ed4783

View file

@ -140,10 +140,11 @@ class chatPostprocessor{
this.messageArray = [];
//Unescape any sanatized char codes as we use .textContent for double-safety, and to prevent splitting of char codes
//Split string by word-boundries on words and non-word boundries around whitespace, with negative lookaheads to exclude file seperators so we don't split link placeholders, and dashes so we dont split usernames and other things
//Split string by word-boundries on words and non-word boundries around whitespace,
//with negative lookaheads to exclude file seperators so we don't split link placeholders, dashes so we dont split usernames and other things, and accented characters to keep those from splitting boundries too
//Also split by any invisble whitespace as a crutch to handle mushed links/emotes
//If we can one day figure out how to split non-repeating special chars instead of special chars with whitespace, that would be perf, unfortunately my brain hasn't rotted enough to understand regex like that just yet.
const splitString = utils.unescapeEntities(this.rawData.msg).split(/(?<!-)(?<!␜)(?=\w)\b|(?!-)(?<=\w)\b|(?=\s)\B|(?<=\s)\B|/g);
const splitString = utils.unescapeEntities(this.rawData.msg).split(/(?<!-)(?<!␜)(?=\w)\b|(?!-|[\u00C0-\u017F])(?<=\w)\b|(?=\s)\B|(?<=\s)\B|/g);
//for each word in the splitstring
splitString.forEach((string) => {