[Prototype] Object tracking and observability #451

kirs · 2020-04-06T14:59:05Z

I'm not sure we need this yet, but I wanted to throw a prototype that I've played with that might be useful for anyone else who's going to work on this area.

As mentioned in caveats, large blobs and too many embedded associations are a major problem.

At the same time we don't know how often which embedded associations are accessed and if some of them do more harm than good.

This PR brings "object tracking" to IDC. Here's how it works:

At the beginning of the request, allocate an array for all objects that will be loaded by IDC
Track every read of the embedded association in that array
At the end of the request, iterate over objects that were loaded from the IDC and check what embedded associations were accessed.

From the memory/GC point, it means those records would have to be kept around for longer, until the request if finished. That's not great, but I believe is tolerable for a percentage of request.

By subscribing to AS::N events, you could either observe it locally or trace it in production:

ActiveSupport::Notifications.subscribe('object_track.identity_cache') do |_event, _, _, _, payload|
  # for local visibility
  # puts "--- #{payload[:object].class.to_s}(id=#{payload[:object].id}): #{payload[:accessed_associations].to_a.join(", ")}"
  # payload[:caller].each do |line|
  #   puts " #{line}"
  # end

  # in production, assuming that Tracing.trace is an abstract way to emit spans
  Tracing.trace('idc.object_tracking', tags: payload) { }
end

If your tracing environment is set up to emit data to something like BigQuery (which we use at Shopify), this could be a ground to make a data-driven decision about removing some of embedded associations.

Or maybe I'm overthinking this and there's an easier way?

cc @dylanahsmith @hkdsun @edward @Scalvando @floriecai

lib/identity_cache/tracking.rb

dylanahsmith

This tracking seems quite global, yet also has very fine grained data such as the stack trace for the location of the fetch. It seems like we might get the worst of both worlds with this mixing of use cases.

For instance, in the global use case we probably want to figure out the percentage of times that a fetched embedded association is actually used and probably don't actually need the stack trace, so its overhead might just be an anti-feature.

On the other hand, if we are trying to use this locally to find unused embedded associations, tracking all fetches in a block could result in capturing much more than we care about, adding a lot of noise to the logs. In that case, we are also more likely wanting to track only a specific cache fetch, where the API is kind of awkward for that purpose, since we don't want to capture all the other fetches while the one we are interested in could get used.

Perhaps we should try to provide a simpler and more flexible primitive that is more like accessed_fields.

It looks like there are two parts to the solution we need:

a way to selectively choose which cache fetched records to track
a way to collect information on which associations were actually used

Choosing which cache fetched records to track seems like it would often consist of wrapping the cache fetch in order to collect the records to track. This could be done outside of this library using monkey patching for the global use case. However, if we were trying to track specific cache fetchers, we might instead want to explicitly call the method to track the object, which wouldn't need any monkey patching. Either way, we don't really need to built this into the library.

Collecting the information on which associations were actually used on a record at the end of the records lifetime is the part where we need some primitive in this library to easily support. For this, I think we just need a method like accessed_cached_associations or maybe a method per association like accessed_cached_association?(name). That way we can iterate over the tracked records and find which cache embedded associations were fetched and not accessed.

Having something more general like this would give the application control over what is collected when starting to track an object (e.g. for grouping) and would allow us to find more than just unused cache embedded associations. For example, we might want to use the same instrumentation to find id embedded associations that are included in a cache fetch but not used. We could also make the instrumentation more generic in order to help find preloaded associations that aren't used (e.g. by checking for the @proxy instance variable on the association object) or any other type of overfetching.

dylanahsmith · 2020-04-06T16:57:38Z

lib/identity_cache/tracking.rb

+      begin
+        orig = object_tracking_enabled


If object_tracking_enabled fails, then we don't want the ensure block to be executed

Suggested change

begin

orig = object_tracking_enabled

orig = object_tracking_enabled

begin

dylanahsmith · 2020-04-06T16:58:34Z

lib/identity_cache/tracking.rb

+    def with_object_tracking_and_instrumentation
+      begin
+        with_object_tracking { yield }
+      ensure
+        instrument_and_reset_tracked_objects
+      end
+    end


Suggested change

def with_object_tracking_and_instrumentation

begin

with_object_tracking { yield }

ensure

instrument_and_reset_tracked_objects

end

end

def with_object_tracking_and_instrumentation

with_object_tracking { yield }

ensure

instrument_and_reset_tracked_objects

end

dylanahsmith · 2020-04-06T17:01:32Z

lib/identity_cache/tracking.rb

+      begin
+        with_object_tracking { yield }
+      ensure
+        instrument_and_reset_tracked_objects


If an exception causes associations to not be used in the block, then that doesn't mean that the block shouldn't load those associations. So we might want to just call reset_tracked_objects here, extract that call from instrument_and_reset_tracked_objects and call the instrumentation at the end of the begin block.

WIP - usage tracking

ae0bae0

dylanahsmith reviewed Apr 6, 2020

View reviewed changes

lib/identity_cache/tracking.rb Show resolved Hide resolved

dylanahsmith reviewed Apr 6, 2020

View reviewed changes

lib/identity_cache/tracking.rb Show resolved Hide resolved

dylanahsmith reviewed Apr 6, 2020

View reviewed changes

casperisfine force-pushed the master branch from 97719a0 to daaba73 Compare May 5, 2020 10:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Prototype] Object tracking and observability #451

[Prototype] Object tracking and observability #451

kirs commented Apr 6, 2020 •

edited

dylanahsmith left a comment •

edited

dylanahsmith Apr 6, 2020

dylanahsmith Apr 6, 2020

dylanahsmith Apr 6, 2020

[Prototype] Object tracking and observability #451

Are you sure you want to change the base?

[Prototype] Object tracking and observability #451

Conversation

kirs commented Apr 6, 2020 • edited

dylanahsmith left a comment • edited

Choose a reason for hiding this comment

dylanahsmith Apr 6, 2020

Choose a reason for hiding this comment

dylanahsmith Apr 6, 2020

Choose a reason for hiding this comment

dylanahsmith Apr 6, 2020

Choose a reason for hiding this comment

kirs commented Apr 6, 2020 •

edited

dylanahsmith left a comment •

edited