Remove MongoJack and consolidate MongoDB utils #837

trevorgerhardt · 2022-11-08T09:20:44Z

A process that started many years ago, this removes the MongoJack dependency and completes the incremental replacement of the Persistence module and its collections with the AnalysisDB and AnalysisCollection types.

In light of a possible database switch in the near future, it seemed more straightforward to continue to use String based IDs everywhere, instead of switching to a MongoDB specific ObjectID type.

Our average collection already uses String IDs, but this will require a migration for aggregation areas, data sources, and data groups.

Other notes:

Switched from arrays to Lists in MongoDB model classes.
Added a DbResultWriter that sets completed: true on regional analyses when they are complete.
Made two OSMCache methods static (cleanId and getKey) so that the BundleController no longer requires it as a component dependency.
Updated the MongoDB driver from v3.11.0 to v4.7.2. MongoDB currently provides documentation for v4.3 to v4.8.
Removed unused HTTP endpoints (and their handlers) in the AggregationAreaController, BundleController, DataSourceController, and RegionalAnalysisController.
Refactored BundleController's created method, breaking it up into smaller steps.
Refactored RegionalAnalysisController's create method. Some benefits:
- The FileStorage and AnalysisDB components no longer need to be injected through the Broker to the MultiOriginAssembler. Result writers are created in the controller with what they need.
- Moved the RegionalTask creation into the AnalysisRequest object, where much of the logic already lived. More refactoring can certainly be done here, but it is a good first step.
- Removed any usage of the RegionalTask stored on a RegionalAnalysis. This means that we only need to serialize it into MongoDB, we don't need to handle deserializing it.

This will resolve #758.

A process that started many years ago, this removes the MongoJack dependency and migrates the `Persistence` module and its collections to the `AnalysisDB` and `AnalysisCollection` types that were set to replace it. In light of a possible database switch in the near future, it seemed more straightforward to continue to use `String` based IDs everywhere, instead of switching to a MongoDB specific `ObjectID` type. Our average collection already uses `String` IDs, but this will require a migration for aggregation areas, data sources, and data groups.

- Prefer `Set` vs `EnumSet` (BSON parser can handle `Set` by default) - Prefer `List`s over arrays.

ansoncfit

It's great to consolidate database interactions to a single, better supported MongoDB driver (farewell, MongoJack cursors). For context, I believe R5 historically used MongoJack, and analysis-server used the MongoDB driver, and they coexisted in this repo after combining the two projects. Edit: maybe both used MongoJack, and our intention was to consolidate but we left it at only at the first step until now.

I directly edited the initial comment in this PR to clarify a couple points (let me know if I mischaracterized anything). In-line comments/suggestions are below. Other comments based on a review and our discussion:

Do the changes of arrays to Lists, Strings to Sets, etc. require any migrations (e.g. of presets)? Edit: yes, this would be a breaking change: stored requests will need migration, and API users will need to update their requests. Note, we have a general principle of preferring one-time migrations to added conditionals/complexity in code.

It looks like this PR removes the custom (de)serializer for ModeSet but keeps them for Transit and Leg. Is it worth ~~keeping the former~~ writing a new AnalysisRequest custom (de)serializer to avoid the need to switch EnumSets to Sets, or more generally using translation classes between pure JSON representation from the database and our internal uses? It feels a bit strange to me to have a PR nominally about database drivers touching core routing code (NetworkPreloader, TravelTimeComputer, PerTargetPropagater etc.)

This PR makes wider use of BsonDiscriminator: when deserializing from Mongo to a base class, this annotation indicates how to tell which subclass to use. A discussion suggested we previously implemented this approach for decay functions, but it looks like this PR adds annotations to the decay functions too, and changes codec registration for decay functions?

If part of the motivation for this PR is enabling an eventual shift away from MongoDB, should we avoid adding MongoDB specific classes such as

DeleteResult
GeoJSON Position?

ansoncfit · 2022-12-06T00:04:06Z

build.gradle

-    // Legacy system for storing Java objects, this functionality is now provided by the MongoDB driver itself.
-    implementation 'org.mongojack:mongojack:2.10.1'


ansoncfit · 2022-12-06T00:13:17Z

src/main/java/com/conveyal/analysis/components/BackendComponents.java

+                new GtfsController(database, gtfsCache),
+                new BundleController(database, fileStorage, gtfsCache, taskScheduler),
                new OpportunityDatasetController(fileStorage, taskScheduler, censusExtractor, database),
-                new RegionalAnalysisController(broker, fileStorage),
+                new RegionalAnalysisController(broker, database, fileStorage),
                new AggregationAreaController(fileStorage, database, taskScheduler),


Should we use a convention for consistent ordering here?

When there is no obvious order, I usually just go for alphabetical.

Any ideas for another convention?

ansoncfit · 2022-12-06T00:15:41Z

src/main/java/com/conveyal/analysis/components/broker/Broker.java

-        if (findJob(templateTask.jobId) != null) {
-            LOG.error("Someone tried to enqueue job {} but it already exists.", templateTask.jobId);
-            throw new RuntimeException("Enqueued duplicate job " + templateTask.jobId);
+    public synchronized void enqueueTasksForRegionalJob(Job job, MultiOriginAssembler assembler) {


Any synchronization concerns we should keep in mind?

Not that I can think of here. This method was already synchronized.

ansoncfit · 2022-12-06T00:22:53Z

src/main/java/com/conveyal/analysis/components/broker/Broker.java

-     * TODO Why is all this detail added after the Persistence call?
-     *      We don't want to store all the details added below in Mongo?
-     */
-    private RegionalTask templateTaskFromRegionalAnalysis (RegionalAnalysis regionalAnalysis) {


Replaced by Job.templateTaskFromRegionalTask (https://github.com/conveyal/r5/pull/837/files#diff-cb2f7bd76fde9cde3a8702c32be5ae84d08f7b789a0da852588e6f374da86895R160)

Note that scenario.json is now created and saved to S3 at RegionalAnalysisController.storeScenarioJson ( https://github.com/conveyal/r5/pull/837/files#diff-675769f1c1f75dae94398599a44ff5db4cb90ab443f717d1c2d81a30caa23c24R425)

ansoncfit · 2022-12-06T00:51:37Z

src/main/java/com/conveyal/analysis/components/broker/RedeliveryTest.java

+        var job = new Job(templateTask, WorkerTags.fromRegionalAnalysis(regionalAnalysis));
+        var assembler = new MultiOriginAssembler(job, new ArrayList<>());


Should we update our style guide re: use of var keyword?

Yes. Although I haven't looked at our style guide in ages. My short opinion / addition to the style guide is: prefer using var for all non-primitive types except in cases where we want to explicitly show a type.

In the examples above, writing the types instead of var is redundant information on the same line: Job, and MultiOriginAssembler are already there.

A case where we might want to be more explicit, is when a method returns a value and we want to distinguish that type in relation to neighboring types.

src/main/java/com/conveyal/analysis/controllers/RegionalAnalysisController.java

src/main/java/com/conveyal/analysis/controllers/BundleController.java

src/main/java/com/conveyal/analysis/controllers/RegionalAnalysisController.java

src/main/java/com/conveyal/r5/analyst/cluster/TravelTimeSurfaceTask.java

…egationAreaDerivation.java Co-authored-by: Anson Stewart <astewart@conveyal.com>

Co-authored-by: Anson Stewart <astewart@conveyal.com>

…isController.java Co-authored-by: Anson Stewart <astewart@conveyal.com>

…o remove-mongojack

Simplify the creation of MongoDB codecs by using annotations instead.

Fixes an Intellij warning about unchecked generics.

Include the bundle ID and scenario ID so that the request object does not need to be fetched and deserialized.

Make `GTFSCache.getFileKey` static to enable `BundleController` to operate without depending on the `GTFSCache` component.

Note: this endpoint is currently unused.

trevorgerhardt added cleanup dependencies Pull requests that update a dependency file t0 Time level 0: think hours labels Nov 8, 2022

trevorgerhardt changed the title ~~Remove MongoJack and consolidate utils~~ Remove MongoJack and consolidate MongoDB utils Nov 10, 2022

trevorgerhardt marked this pull request as ready for review November 10, 2022 02:34

trevorgerhardt requested review from abyrd and ansoncfit November 10, 2022 02:35

trevorgerhardt enabled auto-merge (squash) November 10, 2022 02:35

trevorgerhardt marked this pull request as draft November 11, 2022 22:46

auto-merge was automatically disabled November 11, 2022 22:46
Pull request was converted to draft

trevorgerhardt added 16 commits November 19, 2022 10:27

Remove unnecessary toString()s

bd77ee5

Retrieve each modification type manually

a929020

Prefer Lists over arrays for MongoDB parsing

8f4a40c

Refactor to handle new MongoDB changes

efb4ec5

- Prefer `Set` vs `EnumSet` (BSON parser can handle `Set` by default) - Prefer `List`s over arrays.

Update MongoDB driver

f5c632f

Prefer var

de55f9f

Compare to first grid's zoom

6b3fd5e

Use native MongoDB Geometry type

44e2d47

Simplify delete source set

325eea9

Additional GridResultWriter clean up

06e7cb4

Create a utility method to gzip a File

ea12ed3

Add zero argument constructor back

341f317

Clean up result writers

b29a750

Pass components directly to BundleController

8d215b7

Use BsonDiscriminators for modifications

7546f20

Remove unnecessary Codecs

71eedc0

ansoncfit reviewed Dec 7, 2022

View reviewed changes

trevorgerhardt and others added 2 commits December 7, 2022 04:10

Update src/main/java/com/conveyal/analysis/datasource/derivation/Aggr…

c4b9ca9

…egationAreaDerivation.java Co-authored-by: Anson Stewart <astewart@conveyal.com>

Update src/main/java/com/conveyal/analysis/models/BaseModel.java

486abb7

Co-authored-by: Anson Stewart <astewart@conveyal.com>

trevorgerhardt and others added 16 commits December 7, 2022 04:21

Update src/main/java/com/conveyal/analysis/controllers/RegionalAnalys…

8e1a0d1

…isController.java Co-authored-by: Anson Stewart <astewart@conveyal.com>

Update src/main/java/com/conveyal/analysis/controllers/RegionalAnalys…

7d9a45e

…isController.java Co-authored-by: Anson Stewart <astewart@conveyal.com>

Update src/main/java/com/conveyal/analysis/controllers/RegionalAnalys…

d885ba6

…isController.java Co-authored-by: Anson Stewart <astewart@conveyal.com>

Remove now unnecessary FIXME comment

3f4bb12

Merge branch 'remove-mongojack' of https://github.com/conveyal/r5 int…

5a3d49d

…o remove-mongojack

Add accidentally removed @Override

9ac1659

Use the actual return type, instead of the generic Object type

1dfc172

Add BsonDiscriminators to DecayFunctions

432d961

Simplify the creation of MongoDB codecs by using annotations instead.

Add synchronized back to terminate method

b66a4be

Revert comment auto-formatting

5187485

Revert comment auto-formatting

c2ae60b

Revert comment auto-formatting

423f05e

Convert toStreetModeSet to take a single Set

29ed72b

Fixes an Intellij warning about unchecked generics.

Switch AnalysisRequest back to taking strings as modes

de9ed5f

Update getScenarioJsonUrl

dd3e49a

Include the bundle ID and scenario ID so that the request object does not need to be fetched and deserialized.

Clean up EnumSet changes

1c4e3a6

trevorgerhardt requested a review from ansoncfit December 9, 2022 09:40

trevorgerhardt marked this pull request as ready for review December 9, 2022 09:40

trevorgerhardt marked this pull request as draft December 15, 2022 13:45

trevorgerhardt added 5 commits December 26, 2022 16:19

Remove gtfsCache component dependency

7a0b1bf

Make `GTFSCache.getFileKey` static to enable `BundleController` to operate without depending on the `GTFSCache` component.

Implement TODO added in last commit

7865c74

Note: this endpoint is currently unused.

Merge branch 'dev' into remove-mongojack

99e848d

Throw a "not found error" when grid does not exist

552b3d7

Fix tests

b7eedd9

trevorgerhardt marked this pull request as ready for review December 30, 2022 11:16

trevorgerhardt enabled auto-merge (squash) January 21, 2023 06:45

trevorgerhardt added 3 commits January 25, 2023 06:53

Merge branch 'dev' into remove-mongojack

12d76ef

Remove unused imports

378e8a1

Merge branch 'dev' into remove-mongojack

02f9319

abyrd mentioned this pull request Nov 9, 2023

Update MongoDB driver (match with MongoJack and Jackson) #858

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove MongoJack and consolidate MongoDB utils #837

Remove MongoJack and consolidate MongoDB utils #837

trevorgerhardt commented Nov 8, 2022 •

edited

ansoncfit left a comment •

edited

ansoncfit Dec 6, 2022

ansoncfit Dec 6, 2022

trevorgerhardt Dec 7, 2022

ansoncfit Dec 6, 2022

trevorgerhardt Dec 8, 2022

ansoncfit Dec 6, 2022

ansoncfit Dec 6, 2022

trevorgerhardt Dec 7, 2022

		// Legacy system for storing Java objects, this functionality is now provided by the MongoDB driver itself.
		implementation 'org.mongojack:mongojack:2.10.1'

		var job = new Job(templateTask, WorkerTags.fromRegionalAnalysis(regionalAnalysis));
		var assembler = new MultiOriginAssembler(job, new ArrayList<>());

Remove MongoJack and consolidate MongoDB utils #837

Are you sure you want to change the base?

Remove MongoJack and consolidate MongoDB utils #837

Conversation

trevorgerhardt commented Nov 8, 2022 • edited

ansoncfit left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

trevorgerhardt commented Nov 8, 2022 •

edited

ansoncfit left a comment •

edited