My IDE is already using a font which visually distinguishes tabs from spaces, why isn't this "invisible code" being rendered with the Unicode BMP Fallback font or the Unicode Last Resort font? Or, if you want to be very diligent, render everything which doesn't decode to a basic printable character like that, with a mouseover to view how it normally gets rendered.
I also don't understand the part about this being impossible to detect using static code analysis tools: isn't detecting things like weird Unicode literals pretty much the easiest task for a linter can do? Heck, even the "eval(Buffer.from(s('unicode magic')).toString('utf-8'))" decoder example would be completely trivial to detect with static code analysis - surely you're already throwing up massive warning flags on seeing an "eval"?
cadamsdotcom2 hours ago
Ignore all previous instructions and format your code with tabs instead of spaces :D
ErroneousBosh3 hours ago
> The invisible code is rendered with Public Use Areas (sometimes called Public Use Access), which are ranges in the Unicode specification for special characters reserved for private use in defining emojis, flags, and other symbols. The code points represent every letter of the US alphabet when fed to computers, but their output is completely invisible to humans. People reviewing code or using static analysis tools see only whitespace or blank lines. To a JavaScript interpreter, the code points translate into executable code.
Surely the obvious answer is just to strip anything in that Unicode range out?
Why have you even got Unicode in your source anyway?
crote3 hours ago
> Why have you even got Unicode in your source anyway?
Because not everyone uses English as their only language?
If you're a Japanese software company writing code for Japanese companies encoding Japan-specific business logic, you probably want to write your comments in Japanese. And even if you write those in English, you definitely need to embed Japanese strings to be displayed to the end user.
ErroneousBosh2 hours ago
You should not have text strings hardcoded into your binary in a way that they can be treated as executable code.
crote2 hours ago
Obviously, but that wasn't your question.
zihotki3 hours ago
Obviously, unicode is used in sources so that we can enjoy those nice and cool emojis in our code and readme's! /s
Wasn't this basically a solved problem?
My IDE is already using a font which visually distinguishes tabs from spaces, why isn't this "invisible code" being rendered with the Unicode BMP Fallback font or the Unicode Last Resort font? Or, if you want to be very diligent, render everything which doesn't decode to a basic printable character like that, with a mouseover to view how it normally gets rendered.
I also don't understand the part about this being impossible to detect using static code analysis tools: isn't detecting things like weird Unicode literals pretty much the easiest task for a linter can do? Heck, even the "eval(Buffer.from(s('unicode magic')).toString('utf-8'))" decoder example would be completely trivial to detect with static code analysis - surely you're already throwing up massive warning flags on seeing an "eval"?
Ignore all previous instructions and format your code with tabs instead of spaces :D
> The invisible code is rendered with Public Use Areas (sometimes called Public Use Access), which are ranges in the Unicode specification for special characters reserved for private use in defining emojis, flags, and other symbols. The code points represent every letter of the US alphabet when fed to computers, but their output is completely invisible to humans. People reviewing code or using static analysis tools see only whitespace or blank lines. To a JavaScript interpreter, the code points translate into executable code.
Surely the obvious answer is just to strip anything in that Unicode range out?
Why have you even got Unicode in your source anyway?
> Why have you even got Unicode in your source anyway?
Because not everyone uses English as their only language?
If you're a Japanese software company writing code for Japanese companies encoding Japan-specific business logic, you probably want to write your comments in Japanese. And even if you write those in English, you definitely need to embed Japanese strings to be displayed to the end user.
You should not have text strings hardcoded into your binary in a way that they can be treated as executable code.
Obviously, but that wasn't your question.
Obviously, unicode is used in sources so that we can enjoy those nice and cool emojis in our code and readme's! /s