Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to parse inside a bundle? #537

Open
dynamic-modeller opened this issue Jan 25, 2022 · 6 comments
Open

How to parse inside a bundle? #537

dynamic-modeller opened this issue Jan 25, 2022 · 6 comments

Comments

@dynamic-modeller
Copy link

dynamic-modeller commented Jan 25, 2022

Hi,

Sorry for the dumb question, but i want to use the stix2 library to parse bundles. Following your excellent documentation, it is easy to import the json string,, parse the bundle into an object and print it.

But how to get inside the bundle and find out which objects are in there? This is the main challenge, finding out all of the objects inside any bundle through parsing

image

So you can see from the above i can get the bundle object. What should i be doing to get inside and find out all the other objects that are in there? Sorry if it is a dumb question, but i want to be able to parse arbitrary stix 2.1 bundles.

Once i can do that then i am going to try to build a custom datastore and sink, using the Vaticle TypeDB reasoning system, so I am only at the start of the journey. Any help is much appreciated

@rpiazza
Copy link
Contributor

rpiazza commented Jan 25, 2022

STIX 2.1. objects in python can be referenced either using the dict syntax or the object syntax.

The bundle has an attribute "objects", which is a list.

Try this:

ta = obj.objects[0]
print ta.name

If you prefer dict syntax:

ta = obj["objects"][0]

Maybe I'm not sure what you are asking...

@dynamic-modeller
Copy link
Author

Hi Rich,

Thanks for your response., too be honest i am not sure, i am exploring how far i can take your library. We actually have our own loading code for our prototype, but as a more robust method i was investigating integrating with your existing toolset. While it seems to strongly support the creation of Stix objects (e.g. deserialising out of TypeDB), I was investigating the parsing side, which seems more shallow.

For parsing, I ideally want to drop down to python objects/functions, each of which has their own insert statements, which then all get added together to insert the bundle. But this then requires a massive if-then-elif structure, or equivalent (dict-approach, class-approach or v3.10 match-case), to pull all of the pieces apart. Since this is common to everryone, and would be required for your filesystem store, I wondered if you had this already setup so i could modify, or whether to build it myself.

In short, I am trying to decide the best strategy for reading in data and importing it to TypeDB, so i can reuse as much of your code as possible, adn be as compatible as possible with your current library. Can you advise please? Thanks a lot, Brett

@chisholm
Copy link
Contributor

chisholm commented Jan 26, 2022

You can look at the filesystem store code; it doesn't have a massive if-then-elif structure. I'm not sure what that structure would be for in your case, maybe one if-clause per object type? STIX objects do have some commonalities, e.g. all have an ID and a type. So some generic code is possible. The filesystem store mostly deals in the commonalities (e.g. its directory structure is based on type and ID), so it can remain relatively simple.

I am not familiar with TypeDB. If you want to include the unique aspects of each STIX object type in unique ways in your data modeling, then you may have to write unique bits of code for each object type. I think their data migration tutorial essentially does that. It has one "template function" per data type, from the looks of things. You might need to do something similar.

The stix2 library might help with some STIX validation and usage conveniences, e.g. using datetime instances for versioning timestamps. If you have JSON data you are certain is valid, you could also just treat it as plain JSON if you wanted, and skip using the library.

Processing the objects in a bundle via a stix2 Bundle object could be done as:

for obj in bundle.objects:
    # Do something with obj

or with a more dict-like usage:

for obj in bundle["objects"]:
    # Do something with obj

Of course the latter style also makes the code agnostic to whether bundle is a dict or a stix2 object. To be more robust, software should be designed to gracefully handle custom content, including custom properties, object types, and extensions ("custom" properties/object types are deprecated now but you might still encounter content which uses them. Extensions are a different animal and not treated in the same way). That may mean rejecting all content you can't handle (even though it may be spec-compliant), or having some kind of simple fallback behavior. The stix2 library has the ability to detect (deprecated) custom content, and let you choose to reject it if you wish.

@dynamic-modeller
Copy link
Author

Hi Chisholm,

Great answer thanks, Yep one if-then-else for each object was what i was thinking, and i will check out the filesystem store code.

Ideally, we can write an extension to stix2, that is totally compatible with everything you have done, but also enables people to use an open source typedb data source/sink. My aim is to integrate it as much as possible so we are 100% compatible with your documentation, and with some small additional documentation just for the typedb data-store/sink bits.

Ideally, everyone that uses stix2, will also find it as easy to use typedb as a local store/source, as use memory/filesystem. That is my aim anyway, thanks for your help, let me dig into the code a bit, and check it out. Cheers, Brett

@dynamic-modeller
Copy link
Author

dynamic-modeller commented Jan 27, 2022

Hi Chisholm,
I can see you discard the bundle, in the filessystem sink, funny but the prototype code i've been handed does that as well. Is this normal, or is it desirable to store the bundle as well? I assume its better if one can include the bundle structure as well.

Also i can see that your filesystem if then elif structure is a lot simpler than mine is going to be, as you just recursively add, which is pretty cool. I notice you use isinstance a lot, instead of the util functions like is_sdo, is_sco etc. Was this intentional, or was it that you just didnt need those util functions?

Getting there on the add function, thanks Brett

@chisholm
Copy link
Contributor

The spec says of bundles:

A Bundle does not have any semantic meaning and the objects contained within the Bundle are not considered related by virtue of being in the same Bundle.

and

A Bundle is transient, and implementations SHOULD NOT assume that other implementations will treat it as a persistent object or keep any custom properties found on the bundle itself.

Bundles are just treated as generic containers. There is no need to preserve them, since without meaning, there isn't really anything to preserve. For a "grouping with meaning" there is the grouping SDO.

As far as the is_* functions, those utilities actually didn't exist when the filesystem datastore was written. They are a more recent addition, which grew out of work on the STIX generator. But I thought they would be of more general interest, and made sense for this library. More use could probably be made of them within this library's codebase, but it wasn't a priority to go back and retroactively find uses for them. So within this library, they are used only lightly.

Note that the is_* utility functions are only checking STIX types, and they will work with dicts, STIX IDs, and the type name as a string directly. An isinstance check against _STIXBase is checking the Python type, and the intent is really to determine whether I need to convert the value into a stix2 object, i.e. an instance of one of the registered classes. The STIX type or type category of that thing doesn't matter (at that point, anyway). The isinstance usages where it checks for lists, strings, etc are of course not checking for general STIX object category membership, or for a specific STIX type. For the bundle check, "bundle" isn't a category, just a single object type. It's a simpler case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants