JSON Dict Comparison Improvement #22

TheDr1ver · 2021-09-07T14:15:48Z

Consider revisiting stripping dates from data (shodan http data 80_data or 443_data appears to be the biggest offender at the moment). This also affects the 443_hash value.

Other targets for removal:

Anything censys with __encoding in the key - value = DISPLAY_UTF8 or value = DISPLAY_HEX
Shodan - 443_opts_heartbleed - contains date which will change every time
Shodan - _location_latitude, _location_longitude, _location_city - might change too frequently

NOTE - This scrubbing should only happen after the diff comes back with a positive result. That way we're not looking at every single character in every JSON blob that comes our way, plus it'll be easier to find "true scrubs" rather than accidentally deleting pieces of data that some plugin determines to look "date-like".

Subset for diffing inside bodies should be implemented

If you get a diff between HTML-specific fields like *_http_response_body then that HTML should be parsed and diffed separately if at all possible... But that may quickly get so complicated as to turn into a project of its own.

The text was updated successfully, but these errors were encountered:

TheDr1ver · 2021-09-23T14:54:51Z

Censys

Delete:

*__encoding_*

^^ Addressed in #42

Scrub:

*_banner
    cookies:
        Set-Cookie.*?=(.*?);
        (e.g. sessionid=<base64>; csrftoken=<base64>; expires=<date>)

*_http_response_body
    <input.*?(?=token).*?value(.*?)>
    # Or could be double-rex process. 
        # One rex to find <input> with 'token' inside it:
            <input[^>]*?(?=token).*?>
        # Then another to scrub the value inside of the result
            s/value=\".*?\"/value=\"\"/g
    ^^^ note that this overly simplified. We should have a better way of scrubbing HTML in general.

Scrub-reliant deletes:
(if any of the related fields get scrubbed in the previous function, delete these fields entirely from the result)

*_banner:
    *banner_hex
    *http_response_headers_Set_Cookie_*

*_http_response_body:
    *_http_response_body_hash
    *_http_response_body_size

TheDr1ver · 2021-09-23T15:02:45Z

Shodan

Delete:

*_asn
*_isp
*_location_*
*_opts_*

^^ Addressed in #42

Scrub:

*_data
    \nDate:(.*?)\n

Scrub-reliant deletes:
(if any of the related fields get scrubbed in the previous function, delete these fields entirely from the result)

*_data:
    *_hash

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JSON Dict Comparison Improvement #22

JSON Dict Comparison Improvement #22

TheDr1ver commented Sep 7, 2021 •

edited

TheDr1ver commented Sep 23, 2021 •

edited

TheDr1ver commented Sep 23, 2021 •

edited

JSON Dict Comparison Improvement #22

JSON Dict Comparison Improvement #22

Comments

TheDr1ver commented Sep 7, 2021 • edited

TheDr1ver commented Sep 23, 2021 • edited

Censys

TheDr1ver commented Sep 23, 2021 • edited

Shodan

TheDr1ver commented Sep 7, 2021 •

edited

TheDr1ver commented Sep 23, 2021 •

edited

TheDr1ver commented Sep 23, 2021 •

edited