Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory range interval tree multiset #323

Open
m4b opened this issue Aug 18, 2017 · 4 comments
Open

memory range interval tree multiset #323

m4b opened this issue Aug 18, 2017 · 4 comments

Comments

@m4b
Copy link
Collaborator

m4b commented Aug 18, 2017

I believe it is necessary to have a new data structure in project/program which maps virtual memory ranges to values.

Getting the right design down will be crucial, but I think it will premeditate many new features.

Design

I propose a new object, as generic as possible, which:

  1. Uses an interval tree for the backing data structure.
  2. Uses some kind of mechanism which allows defined, but arbitrary values (perhaps some kind of extendable tagged union, or a key-value store which is initialized on interval tree creation) to be inserted into the backing store.

Here are some use cases:

  • I want to return all known sectors/intervals for an address a, and the corresponding data.
  • I want to check if an address in a load is in the .ro data section in an elf file
  • I want to check if an address is in the text section
  • I want to check how much "coverage" I have of a binary with a given set of intervals; e.g., what information they point to
  • I want to query what data a given range has, e.g., whether its "known" or not, or whatever the associated values are for that range

Values

A key-value list could be a nice interesting approach; but it will likely be complex implementation, and I'd have to think about it.

E.g., an extremely rough sketch with basic enum which we use seems fastest approach:

pub enum Permission {
  Read,
  ReadWrite,
  Execute,
  .. bla bla
}

pub enum MemoryInfo {
  Sector { name: String, permissions: Permission },
}
let itree = IntervalTree::new::<MemoryInfo>();
for section in elf.section {
  itree.insert(Interval::new(section.start, section.end),  MemoryInfo::Sector { name: section.name, section.permissions() });
}

Unfortunately this approach, besides perhaps the interval tree data structure itself, won't be as extensible. imho. Or rather it will be specific to panopticon itself.

While this is fine, I think we can do better (just not sure how yet ;))

The reason I'd not want it specific to panopticon is that I don't think it's necessary, but for example, I want to add interval based section/segment/symbol coloring to bingrep. I've actively been working on this; it would use goblin to fill up the ranges with metadata; but this is exactly what panopticon needs too.

I added this functionality to rdr via something I called bytecoverage; basically its a stupid name for tagging ranges with information (super revolutionary, I know). https://github.com/m4b/rdr/blob/master/lib/utils/ByteCoverage.ml

The problem is the backing data structures were just so badly implemented, the intervals never worked quite right, sometimes trampled other intervals, etc., so I'd like to really nail the data structures down before serious work on the values.

The added benefit of having it be a separate crate is that I could implement the byte ranges as say, goblin_coverage, which given an interval tree, inserts into it interval data specific to binary segments, symbols, etc.

Then, when panopticon uses the crate goblin_coverage or bingrep, or whoever, they'd all benefit from the "single source of truth", which means more testing, more bug reports, and better all around service.

I'd really love some feedback/brainstorming about how to get a good data structure that:

  1. is an interval tree
  2. the implementation is as efficient as we can get it, algorithmically speaking
  3. the values of the tree are generic
  4. our shared implementation uses a defined generic value, but which is extendable perhaps statically before construction, but which could benefit from another crate inserting this "generic" value into the tree also

So it sounds like 2 crates:

  1. an interval tree crate (we may be able to find a good working one)
  2. a crate which exposes a generic value, and as a basic service, can initialize its reified tree given say a goblin object, with the basic segment memory mappings for that object
@flanfly
Copy link
Member

flanfly commented Aug 19, 2017

Have you seen this: https://github.com/theban/interval-tree? I don't know if @eqv still maintains it, but I think it's a good starting point.

@flanfly
Copy link
Member

flanfly commented Aug 19, 2017

The memory section feature you describe overlaps a bit w/ the Layer/Region idea that's already implemented. I'm not happy with it's design, tho. So, the interval tree may subsume them somehow.

@m4b
Copy link
Collaborator Author

m4b commented Aug 19, 2017

I'll check out layer/region; I'm mostly concerned about how to extensibly add values to the memory range data structure; probably enum is best. Specifically, the approach that BAP takes is very interesting: http://binaryanalysisplatform.github.io/memory

theban interval tree i've seen in the past, it looks good, has contains and range which are crucial:

    for (i,pair) in t.range(34, 36).enumerate() {
        //[...]
    }

It doesn't appear to build in the multiset ability, might have to write a thin wrapper; which would probably go in line with having a datatype the wrapper exposes for the insertable values?

@eqv
Copy link

eqv commented Aug 31, 2017

Hey, I wrote theban/interval-tree exactly for this purpose. If you want to use it, feel free! If there are any problems, I would gladly help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants