Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing features and known differences #4

Open
8 of 13 tasks
xvello opened this issue Sep 15, 2023 · 1 comment
Open
8 of 13 tasks

Missing features and known differences #4

xvello opened this issue Sep 15, 2023 · 1 comment

Comments

@xvello
Copy link
Contributor

xvello commented Sep 15, 2023

Missing features before rolling-out to team2

  • kafka message key for {token}:{distinct_id} locality
  • implement utf16 escaping, see safe_clickhouse_string invocations in capture.py. The plugin-server equivalent is easier to read. [slack thread] -> serde rejects such payloads, see this documentation
  • fill in data, use django_compat test for confirmation -> maybe skip some of the data massaging if the tests pass without, list in "other sdks compat" section
  • CORS support, see django code for it
  • tuning via envvars, easy to add new options

Missing features before rolling-out to customers - posthog-js only

  • Overflow detection (local for now, reuse same algorithm) and LIKELY_ANONYMOUS_IDS
  • how do we handle custom proxies that might not forward /i/?
  • Billing limits (needs a redis client, update the values out-of-band, fail open)
  • Kafka writes timeouts and error handling, maybe implement limited retries? -> rdkafka handles retries for us, up to 5 minutes by default. We'll timeout at the nginx level for now, and keep the messages in the rdkafka produce queue

Missing features for compat with other SDKs

For all these (and the ones we'll add), let's instrument the django code path to check whether it's actually active, and how many teams would be impacted

  • Source sent_at from the event body if present (used by some sdks + custom clients)
  • Source sent_at from the body on x-www-form-urlencoded requests: old posthog-js versions?
  • Source events from toplevel batch field if present
  • Check whether we indeed silently drop events with missing fields as documented, instead of returning an error -> if we keep dropping, let's implement an ingestion warning for this!

Known differences with django capture

These won't be fixed unless we aim at being compatible with the long tail of posthog-js versions:

  • the raw kafka message does not hold a site_url anymore, it looks unused now -> confirm it's the case
  • no support for lz64 compression, was removed from posthog-js
  • no support for the /engage endpoint, we can leave it routed to django
  • events bigger than the maxkafka message size trigger an INVALID_REQUEST status instead of a INTERNAL_ERROR
  • dates written to kafka are in RFC3339 format, a subset of ISO8601 that plugin-server should accept OK. Let's make sure CH does (partition_stats consumer)
  • sent_at timestamp in second not supported. Will be ignored, and event timestamp used without correction
@xvello xvello changed the title Missing features and known changes Missing features and known differences Sep 15, 2023
@xvello
Copy link
Contributor Author

xvello commented Nov 13, 2023

Scratchpad

  • Overflow
  • Alerting - https://github.com/PostHog/charts/pull/514
  • Tracing deploy on prod
  • Rollout to free tier
  • More trace spans
  • Re-organize code to prepare for replay endpoint
  • Get list of custom proxy domains to confirm /i/ is properly forwarded
  • Delete EventSink::send (always use send_batch)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant