Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibility of discussion of ld.so/dyld/etc. behavior #41

Open
ZPedro opened this issue Aug 15, 2023 · 1 comment
Open

Possibility of discussion of ld.so/dyld/etc. behavior #41

ZPedro opened this issue Aug 15, 2023 · 1 comment

Comments

@ZPedro
Copy link

ZPedro commented Aug 15, 2023

I believe the end of chapter 4 deserves a quick discussion of how control gets from the entry point of the dynamic linker to the entry point of the executable. Including these fun tidbits (for some values of fun, anyway):

  • The dynamic linker needs to manage data structures, allocate memory, and perform an awful lot of string operations in particular. So it needs access to libc functionality. But it can't use the shared libc everyone else uses: it is going to require that functionality prior to being able to load any dynamic library itself! As a result, the dynamic linker has its own copy of (a subset of) the libc statically linked into it: its only dependency is, understandably, the kernel. This is one of the reasons why on Linux the dynamic linker is actually provided by the folks who provide the libc. And this is the reason all static linkers still need to support building fully self-contained, statically linked binaries, where even system libraries are statically linked (which is discouraged for almost all code): in order to build the dynamic linker itself.
  • While the kernel is responsible for interpreting the ELF commands for the executable and the dynamic linker (if applicable), on the other hand it is not in charge of interpreting the dynamic libraries themselves: the only visibility it has into these is the mmap() calls, performed by the dynamic linker, specifying (a subrange of) them as backing, allowing that memory to be shared cross-process. This means the dynamic linker has to have its own ELF parser, independently of the kernel's: everything else with regard to loading dynamic libraries in memory is its responsibility.
  • That a process is provided its own address space for exclusive use enables code in the main executable to be compiled in a position-dependent fashion. At least, in theory: security considerations such as ASLR mean most executables are position-independent these days. But dynamic libraries have no such choice and must consist of position-independent code because, even if there are systems for preferentially loading them at a certain address, there is no guarantee that this virtual address range will be available by the time they are loaded: another dynamic library might have been loaded there first for instance. In which case the bumped dynamic library will need to be loaded at a non-preferred virtual address and work anyway.
  • .init and .fini sections
  • for bonus points, the GOT, the PLT, and relocation entries.
@ZPedro
Copy link
Author

ZPedro commented Sep 19, 2023

As an illustration of why the control path from the dynamic linker to the executable matters: the first version of both "parental controls" and code signing in Mac OS X (as it was then known) were in fact off to a good start, as they implemented those as part of execve(). No way you can load an executable other than through execve, right?

But as Thomas Ptacek raised at the time you could still coerce the dynamic linker into loading any dynamic library you liked (through DYLD_INSERT_LIBRARIES: Markdown garbled the environment variable in the original post): as discussed above that does not go through execve, and the kernel is only involved to the extent these files are memory-mapped with the executable bit set, for which there was no gate check.

Well, what possible harm could one insane, mutant dynamic library do? It's just a library, it's not going to be in control until code in the executable (or code in another library invoked by the executable) calls it, right? Except that for initialization purposes all dynamic linkers support code in a specifically designated section of the dynamic library image (.init, in the case of ELF), which the dynamic linker jumps to as part of setting up the dynamic library (this is what Mr. Ptacek refers to when he mentions gcc constructor functions); in other words, before main() gets called. You're not supposed to do anything scary in there, but there is no mechanism to prevent that code from never returning and end up controlling the newly reset process, in effect diverting the control away from main().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant