DM-39485: Use redis for work dispatch #62

mfisherlevine · 2023-08-06T20:41:00Z

No description provided.

TallJimbo

Lots of comments, some minor, some about the future, but really not that many given the PR size.

I did not look too closely at:

logic where I didn't have enough context/background to understand it (e.g. decoding redis objects, non-redis channels)
scripts (that's all machine-generated, right, for gitops, except scripts/meta?)
places where you already had "TODO" comments that suggested that you wanted to rewrite that code substantially already.

TallJimbo · 2024-05-13T16:52:40Z

config/config_slac_testing.yaml

@@ -73,3 +73,11 @@ comCamMetadataShardPath: '/sdf/home/m/mfl/u/rubintv/LSSTComCam/sidecar_metadata/
 # summit-like configs
 tmaMetadataPath: '/sdf/home/m/mfl/u/rubintv/tma/sidecar_metadata'
 tmaMetadataShardPath: '/sdf/home/m/mfl/u/rubintv/tma/sidecar_metadata/shards'
+
+# Redis work distribution configs


Is this comment supposed to be here? Looks like a pipeline yaml file to me, not a Redis config (same in other similar files).

TallJimbo · 2024-05-13T16:55:19Z

config/config_usdf_comcamsim.yaml

+# Butler paths - use full paths not aliases here as path existence is checked
+butlerPath: '/sdf/group/rubin/repo/embargo/butler.yaml'
+
+# # data paths


Are these doubled #s significant?

TallJimbo · 2024-05-13T16:57:56Z

python/lsst/rubintv/production/baseChannels.py

+        channelName,
+        watcherType,
+        doRaise,
+        queueName=None,  # only needed for redis watcher. Not the neatest but will do for now


Args here are now out of sync with docs.

TallJimbo · 2024-05-13T17:06:31Z

python/lsst/rubintv/production/payloads.py

+    """
+    Convert a pipelineGraph to bytes.
+
+    Upstream this to pipe_base after OR3.


Given how things have gone since we first added this, we might want to evaluate whether we can drop the stuff that uses it here instead. I may be wrong, but it seemed like being able to change the pipeline without restarting a pod was looking less important.

TallJimbo · 2024-05-13T17:08:35Z

python/lsst/rubintv/production/payloads.py

+    """
+    A dataclass representing a payload.
+
+    These go in minimal, but come out full, by using the butler.


I generally very much recommend using pydantic for structs that get converted to/from JSON, but this expansion-by-butler at least that makes that less of an obvious win.

TallJimbo · 2024-05-13T18:47:17Z

python/lsst/rubintv/production/redisUtils.py

+        except redis.exceptions.ConnectionError as e:
+            raise RuntimeError("Could not connect to redis - is it running?") from e
+        except Exception as e:
+            raise RuntimeError(f"Unexpected error connecting to redis: {e}")


Suggested change

raise RuntimeError(f"Unexpected error connecting to redis: {e}")

raise RuntimeError(f"Unexpected error connecting to redis: {e}") from e

Or maybe just let this raise...I'm not sure your RuntimeError is adding anything useful here.

TallJimbo · 2024-05-13T19:09:37Z

python/lsst/rubintv/production/redisUtils.py

+            The name of the task that has finished processing.
+        processingId : `int`
+            Either the exposureId or visitId of the payload that has finished
+            being processed for the specified task.


I'd seen processingId in a few places before but didn't see what it meant until I got here. I think it's going to need some kind of abstraction to work with the AOS pipelines, from what I've seen of them.

This is just one example of what I think is the biggest weakness of this system in its current state, so I'm gonna use this thread as a place to start that conversation:: different kinds of data IDs (like visits and exposures) are all over the place, and in some case they're instrument-specific. Instead, I think the redis, control layer, and worker code should instead recognize three kinds of processing:

per-detector (these layers do need to know about detectors because that's the worker-affinity dimension);

gather-of-detectors (per exposure, per visit, per intra/extra-focal pair-of-visits, maybe per-group someday);

cumulative nightly.

I think we can introspect a PipelineGraph to split it up tasks into a sequence of those (or die if that can't be done). The ambitious version of this goal gets rid of all mentions of expRecord from anything other than user-display messages (e.g. logs, plots) and things that actually need exposure metadata (I'm thinking it'd be replaced by some ABC), and it'd get rid of all mention of "steps". But that ambitious version may not be doable until older Rapid Analysis channel types are migrated, and it may never be worth doing at all. The argument for doing is that it prepares us for the next time somebody comes up with a new problem for Rapid Analysis to solve (i.e. new pipeline for it to run), by separating the orchestration logic further from the assumptions about the kinds of pipelines it runs.

TallJimbo · 2024-05-13T19:19:49Z

python/lsst/rubintv/production/redisUtils.py

+
+        This is because, when a butler watcher restarts, it will always find
+        the most recent exposure record in the repo. We don't want to always
+        issue these for processing, so we keep a list of what's been seen.


Is the redis content persistent, then, so if the redis DB goes down it remembers this sort of thing when it comes back up?

TallJimbo · 2024-05-13T19:35:37Z

python/lsst/rubintv/production/watchers.py

+    writeDataIdFile,
+)
+
+__all__ = ("FileWatcher", "RedisWatcher", "ButlerWatcher")


We usually put __all__ before all of the imports (except __future__ imports).

TallJimbo · 2024-05-13T20:08:23Z

python/lsst/rubintv/production/redisUtils.py

+    return f"INCOMING-{instrument}-raw"
+
+
+class RedisHelper:


Some documentation of the Redis schema would be really useful: e.g. keys, types, and what they mean. Absent that, I can't really claim I understood how various states for workers are represented and hence how different methods interact.

mfisherlevine force-pushed the tickets/DM-39485 branch 2 times, most recently from 4bd5e81 to f2c9680 Compare August 10, 2023 17:20

mfisherlevine force-pushed the tickets/DM-39485 branch 2 times, most recently from 6b571fc to 41ed889 Compare August 18, 2023 11:08

mfisherlevine force-pushed the tickets/DM-39485 branch 2 times, most recently from 98f7fd0 to bbea315 Compare August 29, 2023 11:45

mfisherlevine force-pushed the tickets/DM-39485 branch from bbea315 to ad811de Compare September 12, 2023 13:25

mfisherlevine force-pushed the tickets/DM-39485 branch from ad811de to f300b9f Compare September 21, 2023 23:21

mfisherlevine force-pushed the tickets/DM-39485 branch 6 times, most recently from 80f6757 to 913a85e Compare October 13, 2023 15:01

mfisherlevine force-pushed the tickets/DM-39485 branch 2 times, most recently from b9b4fec to b2ab28b Compare February 18, 2024 12:43

mfisherlevine force-pushed the tickets/DM-39485 branch 3 times, most recently from 62693e1 to 91900ef Compare March 20, 2024 18:26

mfisherlevine force-pushed the tickets/DM-39485 branch 11 times, most recently from 92c2f53 to 261fa4e Compare March 31, 2024 13:46

mfisherlevine added 2 commits May 1, 2024 08:11

Add redis connection for BTS and TTS

f2353cd

Remove XXX now night report works

9798bd1

mfisherlevine force-pushed the tickets/DM-39485 branch from 0849855 to 4c0295c Compare May 1, 2024 15:11

mfisherlevine added 19 commits May 3, 2024 09:59

Move redis IP to yaml and change some XXXs to TODOs

55fa026

Add docs for expRecordFromJson

a066d4b

Resolve XXXs in processingControl and remove unused code

c60f123

Fix bug in focalPlane control where ints were not cast to bools

1947d57

Remove plotInfo now that is possible, rename self.plot

7809281

Remove unused code from atexit attempt

8302c4c

Move getNumExpectedItems to module level utils and use in new plotting

44bd449

Remove unused specialism for starTracker upload

14c3bd5

Remove some trivial XXXs

3d0c1f0

Allow AOS uploads to RubinTV

8bdaab5

Remove hack to allow md shards to work for all instruments

7cd22f5

Add log to CachingLimitedButler init and remove xxx

18be767

Make getShardPath a free floating function so it can be used elsewhere

9921723

Update docs, comments and remove XXXs

398ec78

Use utility function to create pipeline from bytes

3e8a552

Finish remaining doc/comment-only XXXs

939fff1

Fix flake8 errors

b726e05

Fix init args and note TODO

65bef8e

Fix final XXX addressing detectors on class init

01311d3

mfisherlevine force-pushed the tickets/DM-39485 branch from df43976 to 01311d3 Compare May 3, 2024 16:59

mfisherlevine marked this pull request as ready for review May 3, 2024 17:03

mfisherlevine added 2 commits May 7, 2024 15:11

Add pre-commit config stuff

63afed0

Run pre-commit tooling

6c6005e

TallJimbo approved these changes May 13, 2024

View reviewed changes

mfisherlevine added 2 commits May 20, 2024 09:27

Refresh butler inside event loop to catch new calibs

a377515

Write a test suite to run RA from the command line

0ccee6a

mfisherlevine force-pushed the tickets/DM-39485 branch from a0db060 to 0ccee6a Compare May 23, 2024 23:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-39485: Use redis for work dispatch #62

DM-39485: Use redis for work dispatch #62

mfisherlevine commented Aug 6, 2023

TallJimbo left a comment

TallJimbo May 13, 2024

TallJimbo May 13, 2024

TallJimbo May 13, 2024

TallJimbo May 13, 2024

TallJimbo May 13, 2024

TallJimbo May 13, 2024

TallJimbo May 13, 2024

TallJimbo May 13, 2024

TallJimbo May 13, 2024

TallJimbo May 13, 2024

TallJimbo May 13, 2024

	raise RuntimeError(f"Unexpected error connecting to redis: {e}")
	raise RuntimeError(f"Unexpected error connecting to redis: {e}") from e

DM-39485: Use redis for work dispatch #62

Are you sure you want to change the base?

DM-39485: Use redis for work dispatch #62

Conversation

mfisherlevine commented Aug 6, 2023

TallJimbo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment