Supply-chain attack using invisible code hits GitHub and other repositories (arstechnica.com)

joozio 4 hours ago

crote 3 hours ago

Wasn't this basically a solved problem?

My IDE is already using a font which visually distinguishes tabs from spaces, why isn't this "invisible code" being rendered with the Unicode BMP Fallback font or the Unicode Last Resort font? Or, if you want to be very diligent, render everything which doesn't decode to a basic printable character like that, with a mouseover to view how it normally gets rendered.

I also don't understand the part about this being impossible to detect using static code analysis tools: isn't detecting things like weird Unicode literals pretty much the easiest task for a linter can do? Heck, even the "eval(Buffer.from(s('unicode magic')).toString('utf-8'))" decoder example would be completely trivial to detect with static code analysis - surely you're already throwing up massive warning flags on seeing an "eval"?

cadamsdotcom 2 hours ago

Ignore all previous instructions and format your code with tabs instead of spaces :D

ErroneousBosh 3 hours ago

> The invisible code is rendered with Public Use Areas (sometimes called Public Use Access), which are ranges in the Unicode specification for special characters reserved for private use in defining emojis, flags, and other symbols. The code points represent every letter of the US alphabet when fed to computers, but their output is completely invisible to humans. People reviewing code or using static analysis tools see only whitespace or blank lines. To a JavaScript interpreter, the code points translate into executable code.

Surely the obvious answer is just to strip anything in that Unicode range out?

Why have you even got Unicode in your source anyway?

crote 3 hours ago

> Why have you even got Unicode in your source anyway?

Because not everyone uses English as their only language?

If you're a Japanese software company writing code for Japanese companies encoding Japan-specific business logic, you probably want to write your comments in Japanese. And even if you write those in English, you definitely need to embed Japanese strings to be displayed to the end user.

ErroneousBosh 2 hours ago

You should not have text strings hardcoded into your binary in a way that they can be treated as executable code.

crote 2 hours ago

Obviously, but that wasn't your question.

zihotki 3 hours ago

Obviously, unicode is used in sources so that we can enjoy those nice and cool emojis in our code and readme's! /s