Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analytics incorrectly capturing url data for domains other than rerun.io #6288

Closed
jleibs opened this issue May 10, 2024 · 0 comments
Closed
Assignees
Labels
📊 analytics telemetry analytics 🪳 bug Something isn't working 🕸️ web regarding running the viewer in a browser

Comments

@jleibs
Copy link
Member

jleibs commented May 10, 2024

Our intent for the url analytics property has always been to only collect data from rerun.io.

See the comment:

    /// The URL on which the web viewer is running.
    ///
    /// We _only_ collect this on `rerun.io` domains.
    pub url: Option<String>,

This filtering did not end up getting implemented correctly and so we have leaked a small number of events containing non-rerun urls. These events can contain data such as the user internal domain-names that are less anonymized than we would like given the spirit of our policy.

At time of writing this data is made up of 475 analytics-events across 25 unique non-rerun non-localhost urls.

Proposed mitigation

Prior to 0.16 release

  • Rename event url -> rerun_url to make the intent more clear.
  • Fix the filtering logic to only include urls from the rerun.io domain.

After the 0.16 release

  • Configure the posthog property filter to mask future incoming properties named url so that we drop data from older versions of the app.
  • Manually delete the 475 historical events that contain inappropriate url data.
@jleibs jleibs added 🪳 bug Something isn't working 🕸️ web regarding running the viewer in a browser 📊 analytics telemetry analytics labels May 10, 2024
@jleibs jleibs self-assigned this May 14, 2024
jleibs added a commit that referenced this issue May 14, 2024
### What
Resolves:
- #6274
- #6288

I added the processing at the last stage of the analytics output so we
don't depend on every code path possibily populating the analytics event
to do its own domain evaluation and/or hashing.

Validated events look correct in posthog:

![image](https://github.com/rerun-io/rerun/assets/3312232/17ddf585-5135-4ad1-8132-f64661b9e730)


### Checklist
* [x] I have read and agree to [Contributor
Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and
the [Code of
Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md)
* [x] I've included a screenshot or gif (if applicable)
* [x] I have tested the web demo (if applicable):
* Using examples from latest `main` build:
[rerun.io/viewer](https://rerun.io/viewer/pr/6322?manifest_url=https://app.rerun.io/version/main/examples_manifest.json)
* Using full set of examples from `nightly` build:
[rerun.io/viewer](https://rerun.io/viewer/pr/6322?manifest_url=https://app.rerun.io/version/nightly/examples_manifest.json)
* [x] The PR title and labels are set such as to maximize their
usefulness for the next release's CHANGELOG
* [x] If applicable, add a new check to the [release
checklist](https://github.com/rerun-io/rerun/blob/main/tests/python/release_checklist)!

- [PR Build Summary](https://build.rerun.io/pr/6322)
- [Recent benchmark results](https://build.rerun.io/graphs/crates.html)
- [Wasm size tracking](https://build.rerun.io/graphs/sizes.html)

To run all checks from `main`, comment on the PR with `@rerun-bot
full-check`.
@jleibs jleibs closed this as completed May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📊 analytics telemetry analytics 🪳 bug Something isn't working 🕸️ web regarding running the viewer in a browser
Projects
None yet
Development

No branches or pull requests

1 participant