Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider ways to better obfuscate function bodies #462

Open
mvdan opened this issue Jan 16, 2022 · 23 comments
Open

consider ways to better obfuscate function bodies #462

mvdan opened this issue Jan 16, 2022 · 23 comments
Labels
enhancement New feature or request

Comments

@mvdan
Copy link
Member

mvdan commented Jan 16, 2022

Right now we strip some information away from the compiled code in function bodies, such as position information, variable names, and the names of funcs and types being used. However, the compiled code looks otherwise extremely similar to its non-obfuscated counterpart, especially in its structure.

For example, if I perform two obfuscated builds of the same program with different seeds, all the func/type/var names will be different, but one could deobfuscate a function body and quickly spot the pairs of corresponding obfuscated names in the two builds, as the structure of the function will be very similar. Meaning that if I manage to figure out what an obfuscated name in one build stands for, I can reuse that knowledge rather easily in the other build.

One can also imagine deobfuscating the Go code and trying to spot common patterns in "idiomatic" Go code, such as if err != nil { handle(err) }. Being able to quickly spot these patterns, even if the names are obfuscated, could lead to an easier understanding of what the code is doing.

We should investigate ways to improve this situation. In general terms, what we want is to deterministically "shuffle" the code around using the seed, akin to what we already do with literal obfuscation or when reordering declarations.

Doing this at the machine code level definitely seems like a bad idea; we'd need to explicitly support each GOARCH target. It would also require being able to modify object files in-place, further increasing the required complexity.

Doing it at the Go syntax level via go/ast is probably the most obvious option we have. We already do something like it when obfuscating literals, and it seems to work well. I think it could become more feasible if we implemented a "reduction" of the AST first, as per #459.

Doing this at the compiler's SSA IR level could also be interesting. Advantages:

  • As a heavily simplified form of the code, we could apply "rewrites" of the SSA program more easily. The go/ast is significantly more complex than go/ssa, as there are multiple ways to write the same piece of logic.
  • We could perhaps perform heavier obfuscation this way, as SSA is further down the compiler pipeline when compared to the Go syntax.

Disadvantages:

  • The complexity in design and execution; the SSA is not exposed via -toolexec, and is only kept in memory. This would likely mean having to build and use a modified version of cmd/compile.
  • Unlike go/ast and the Go syntax, Go's SSA representation is internal and may change in backwards-incompatible ways over the course of Go releases.
  • Harder to contribute: while SSA isn't a particularly hard concept in the field of compilers, the average Go developer will probably not be familiar with it.
  • Unhelpful for source code obfuscation; add an "export" command for libraries #369 wouldn't benefit from it at all.

Thus, my initial thoughts are that we should aim for obfuscating func bodies via go/ast rather than the compiler's internal SSA. Happy to hear opinions, counter-points, or other potential ways to solve this.

@mvdan mvdan added enhancement New feature or request help wanted Extra attention is needed labels Jan 16, 2022
@mvdan
Copy link
Member Author

mvdan commented Jan 16, 2022

I should note that using SSA could have the advantage of supporting more languages or syntax forms, but since we only aim to obfuscate Go code, that's not really a benefit.

@pagran
Copy link
Member

pagran commented Jan 17, 2022

I vote for ast obfuscation.

Obfuscation of sources without much difficulty will allow to implement some classical methods.

  1. Conditional obfuscation, for example (there are many ways):
if (err != nil) { }

->

var a int
if (err == nil) {
    a = 0x1337
} else {
   a = 0x7331;
}

if (a ^ 0x975 == 0x????) { code }
  1. Proxy calls (replace call <function> to call ptr [rax+14]):
someFunc(1, 2, 3)

->

var G = newProxyObj() // global var

G->a(1, 2, 3)
  1. Junk code injection

I think methods above are already enough to hide patterns.

More complex methods (like control flow flattering) are also possible, but will require much harder manipulation of the code (i.e. data/flow analyzer is needed)

@pagran
Copy link
Member

pagran commented Feb 26, 2022

I implemented PoC of junk code injection: https://github.com/pagran/garble/tree/flow-obfuscation
Output example: https://gist.github.com/pagran/bca8e94be277b90b9d78185ab64e208e

Final build size increases depending on obfuscator settings (check here), but performance drops very slightly. Does it make sense to finish it?

@mvdan
Copy link
Member Author

mvdan commented Feb 26, 2022

Can you explain how the junk injection works? I can see that the code might not be removed by the compiler as redundant, but you're still essentially ending up with dead code. I imagine one could deobfuscate the code and fairly easily trim the bits of junk that can be proven to be unreachable.

This is not to say that your idea to insert junk code isn't right, but I also think the approach of using a global array and juggling integers with switches is pretty easy to untangle.

If we were to support inserting junk code, I think ideally it should:

  • Avoid being too consistent; it's rather easy to spot similar switches using global integer arrays, for instance. This would likely require a good design to randomly generate different kinds of Go code.
  • Look like reasonably realistic Go code; for example, if the package already imports os, it could randomly inject calls to its APIs like os.Getenv or os.Open, with similarly randomly generated parameters. One can imagine similar things for other packages like time, net/http, and so on.
  • Practically never run, to not affect real programs, while still being near impossible to easily discard as dead code. I admit I'm not sure how to solve this, but if code can be statically proven to be unreachable, then the entire feature is very easy to undo.

My personal and honest opinion is that we should instead look at incremental alterations to existing code, rather than injecting code. This is what I meant by "shuffle" in my original post. For example:

  • Randomly swap the order of an if with its else
  • Randomly reorder switch statement cases
  • Randomly swap syntax nodes with equivalent versios; for instance, an if with a for that loops zero or one times
  • Randomly add syntax that doesn't actually do anything, like [:] on slice expressions, adding redundant type conversions, adding redundant parentheses or blocks, or adding labels to for and switch statements and their break/continue statements
  • Randomly split up or join statements and expressions

I realise these are just draft ideas and likely pretty hard to implement. I also realise that some of them might not matter in terms of compiled binaries, but we still care about source code obfuscation, I think.

I think it's also worth investigating what other obfuscators have done before we spend tens or hundreds of hours on this feature. I personally have no experience in this kind of obfuscation, so while I can give you reasons why some approaches aren't likely to be successful, I can't also say what a good approach looks like with confidence either :)

@mvdan
Copy link
Member Author

mvdan commented Feb 26, 2022

Currently giving https://arxiv.org/abs/1809.11037 a read, which carefully examined theory and practice of Java control flow obfuscation as of 2018.

@mvdan
Copy link
Member Author

mvdan commented Feb 26, 2022

Thinking aloud: Go does have goto, so that could be pretty useful in terms of flattening control flow.

@awgh
Copy link

awgh commented Feb 27, 2022

@mvdan The definitive paper for control flow obfuscation is this one: http://ac.inf.elte.hu/Vol_030_2009/003.pdf

There are several basic techniques described in that paper that should be the initial focus for control flow obfuscation. Flattening, Bogus Instructions, and Substitution. That's a good place to start, and some of these have been mentioned already. Paper authors have a bit more info on their git here: https://github.com/obfuscator-llvm/obfuscator

There are many implementations of this paper already, including the original fork of LLVM and this one based on GCC that actually has a better explanation of some of the methods: https://github.com/meme/hellscape

Basic control flow obfuscation will require AST manipulation, or at least different code generation from the AST, as has been mentioned above. Once the basic stuff is implemented, you could start trying to invent something new and additional, but... the basic methods in the paper are a good mixture of simple to implement and hard to reverse.

Another project worth mentioning is MovFuscator, which compiles everything to MOV's: https://github.com/xoreaxeaxeax/movfuscator

Just wanted to point out that the things being discussed above are largely solved problems with multiple implementations in other compilers... INCLUDING an implementation that works for gccgo already (which means it can't target Windows tho, so not so useful).

Java obfuscation is a different art form entirely, because it leans heavily on reflection and the differences between the VM instructions and the source code. Some of the control flow stuff might translate, but you'll probably have better luck looking at the LLVM-opcode level obfuscation and GCC port of it linked above.

@pagran
Copy link
Member

pagran commented Feb 27, 2022

Main idea of junk code is simple, to "blur" the original code. This is by no means controlflow obfuscation (which requires good control flow analysis). The junk code should protect from bindiff, in the future confuse references to external functions (file handling, networking, etc) and make the pattern search and primary manual analysis in disassemblers harder.

On the source side it is not so hard to remove it, because switch construction is clearly visible, on the binary side it is not so easy. Because the junk code actively refers to variables on which the execution flow depends and a simple "reference search" will not do anything.

About the code plausibility I agree, but current version is nothing more than a prototype in which only the minimum for demonstration is implemented.

@mvdan
Copy link
Member Author

mvdan commented Feb 28, 2022

Thanks, @awgh, those links are a good starting point.

There are several basic techniques described in that paper that should be the initial focus for control flow obfuscation. Flattening, Bogus Instructions, and Substitution.

I see that the paper does talk about flattening with some detail, but it only mentions bogus instructions (aka @pagran's "junk code injection) and substitutions, without really giving much detail. Do you have more links for those? I'm particularly wondering how one would go about inserting bogus instructions without having them be rather easy to recognize.

@pagran I understand yours is just a prototype for now, and I really appreciate the help - my opinion is just that we should carefully design our first steps before we start writing code. For that reason, I really want to understand what other good control flow obfuscators do, and then find a relatively straightforward way to implement one of their basic techniques to begin with.

For example, when it comes to inserting bogus or junk code, I'm really not sure that teaching garble how to insert global arrays and switch statements will be the best long-term strategy. We might well end up at a point where we have to maintain thousands of lines of code for the sake of teaching garble specific kinds of junk code it can inject. It would be much better if, say, we could procedurally generate valid Go code that never executes.

@mvdan
Copy link
Member Author

mvdan commented Feb 28, 2022

I also have the feeling that, when it comes to Go, AST substitutions will be the easier first step when compared to flattening and bogus instructions, because they could be written as small "rewrite rules" that would get applied to each existing AST node.

@pagran
Copy link
Member

pagran commented Feb 28, 2022

Okay, then I propose to create a separate issue with a list of possible "transformations", for example if -> state machine

After approval, I will already start writing code.

@awgh
Copy link

awgh commented Feb 28, 2022

@mvdan OK, for substitutions, it can be a bit more complicated because you need to be somewhat aware of generated assembly and how that assembly will be disassembled. That said, there are some old classic tricks to check out.

Here's a recent summary of an old trick: https://tmpout.sh/1/6.html
And the original paper was Silvio's obviously: http://www.ouah.org/linux-anti-debugging.txt

Rather than inserting garbage go-level source, you should be thinking about making transforms to the generated output that will not change the actual behavior of the running program at all, like the above examples.

"Bogus Instructions" literally means just inserting a bunch of dead code that will never get hit. This is easy to insert for PIC, because you can just insert blobs of garbage and JMP over them. @pagran 's random switch statement idea is pretty good, just make a rat's nest of JMPs that never get hit... and you don't really have to track whether or not bogus code really works, because it shouldn't actually be getting hit.

I don't think there's any point in inserting data like arrays, because that won't mess with someone trying to disassemble it at all.

Substitutions == This code will run, but you can make it hard to disassemble.
Bogus Instructions == This code will NOT run, it can be utter trash.

@awgh
Copy link

awgh commented Feb 28, 2022

Flattening is the only one of these things that requires AST manipulation, the other two are maybe better performed in the code generation phase.

If you have to manipulate the source code to manipulate the generated code... you're going to have a hard time. This is more or less the wrong turn that your predecessors took with gobfuscate.

edit: Although you could make a truly funny bogus code generator with direct AST manipulations, you don't really need to do it this way... could also do it only in code generation.

@mvdan
Copy link
Member Author

mvdan commented Feb 28, 2022

I think we've touched on "code generation" before; because we're not forking the Go toolchain itself, we don't have access to its IR, which is a form of SSA. We similarly don't have access to the "lowered" SSA, which uses target-specific instructions; that's the closest stage to generated code that the compiler gets to before it actually produces object files.

That said, you've given me an idea: turn the AST to SSA via https://pkg.go.dev/golang.org/x/tools/go/ssa, and then spit out the simplest form of Go code that implements the SSA. That may already do a significant chunk of the flattening for us, and it should be easier to obfuscate the function body in SSA form. Generating Go code equivalent to some SSA is probably relatively complex, but it might be our best long-term bet; the Go compiler only accepts Go code as input.

@mvdan
Copy link
Member Author

mvdan commented Mar 2, 2022

^ I intend to do a bit of research into that SSA idea, and will likely coordinate with @awgh, @pagran and others as I have any updates. @pagran are you OK with giving me a couple of weeks to look into this? Assuming that the experiment will succeed, it will be a promising approach long-term, but also radically different to altering the go/ast directly :)

@pagran
Copy link
Member

pagran commented Mar 8, 2022

Sure :)

p.s. Why didn't github notify me about this message? -_-

@rodjunger
Copy link

You've probably already seem this since it's the first google result for "golang ast obfuscation" but there's some previous work in here: https://github.com/q6r/gomambojambo

Not everything is useful since garble already has string/function/package name obfuscation but it also has examples for dead code insertion and a simple kind of control flow obfuscation (for loops converted to gotos).

There's also this https://github.com/meme/hellscape which I've tested and works but it hooks into gccgo so I guess it's useless for us. A mix of garble and hellscape would be insane tho.

@mvdan
Copy link
Member Author

mvdan commented May 14, 2022

I intend to do a bit of research into that SSA idea

Blocked by golang/go#48525 at the moment; starting to use the SSA package today would mean breaking the obfuscation of some generic programs.

@pagran
Copy link
Member

pagran commented Aug 19, 2022

It's a bit sudden, but it' possible to make a fully "reflection-frendly" mode obfuscation.

If disable all names obfuscation and move method bodies to separate functions and hide the calls, it will be an acceptable level of obfuscation because the original names remain in the binary, but it becomes very hard to associate them with functions.

Example:

// main.go
type X struct {
   secret string
}

func (f X) test() string {
	return f.secret + "xxxx"
}

func main() {
	x := X{}
	println(x.test())
}

to:

// main.go
package main

type X struct {
	secret string
}

func (f X) test() string {
	return C._obfName(f)
}

func main() {
	x := X{secret: "hi"}
	println(x.test())
}

// main_body.go
package main

var C = _controller{}

func init() {
	// some obf code with unsafe.Pointer
	C._obfName = _someObfName
}

type _controller struct {
	_obfName func(f X) string
}

func _someObfName(f X) string {
	return f.secret + "xxxx"
}

@lu4p
Copy link
Member

lu4p commented Jun 14, 2023

@pagran is working on #752

@mvdan
Copy link
Member Author

mvdan commented Jul 15, 2023

That PR is now merged, but the feature is not enabled for all functions by default - it is experimental and opt-in for now. I'd keep this issue open until it's on by default.

@mvdan
Copy link
Member Author

mvdan commented Nov 19, 2023

This landed in experimental form in master via https://github.com/burrowers/garble/blob/master/docs/CONTROLFLOW.md. I think we should keep this issue open until the feature is mature enough, and either promoted to a flag like -literals or enabled by default in some form.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests

6 participants
@awgh @mvdan @lu4p @rodjunger @pagran and others