Engaging Crowds: Subject set completeness #2016

eatyourgreens · 2021-02-03T17:18:13Z

Adds a query which gets available subject numbers, for a given workflow, from the Cellect API.
Removes included subject sets from the workflows API query (see #678.) Subject sets are fetched separately, by ID, from the Panoptes API. This increases the number of API requests required to build each page, but allows us to limit subject set requests to grouped workflows.
Updates the API mocks, in data-fetching tests, to include grouped/non-grouped cases and to add the /subject_sets endpoint.
Wraps the data-fetching requests in try/catch, so that we can log API errors to Sentry.

Package:
app-project

Closes #1956.

Review Checklist

General

Are the tests passing locally and on Travis?
Is the documentation up to date?

Components

Has a storybook story been created or updated?
Is the component accessible?
- Can it be used with a screen reader? BBC guide to testing with VoiceOver
- Can it be used from the keyboard? WebAIM guide to keyboard testing
- Is it passing accessibility checks in the storybook?

Apps

Does it work in all major browsers: Firefox, Chrome, Edge, Safari?
Does it work on mobile?
Can you yarn panic && yarn bootstrap or docker-compose up --build and app works as expected?

Publishing

Is the changelog updated?
Are the dependencies updated for apps and libraries that are using the newly published library?

Post-merging

Did the app deploy to https://frontend.preview.zooniverse.org/projects/:project-name/:owner or https://frontend.preview.zooniverse.org/about?
Is the new feature working or bug now fixed?
- Is there a Talk or blog post written to announce the new feature(s)?
Is the design working across browsers (Firefox, Chrome, Edge, Safari) and mobile?
- Is this approved by our designer?
Is this ready for production deployment?

beckyrother

Looks great! Just looks like there's a bit too much space between the last line and the bottom of the box – should just be 30px (the same as on the sides, which look right). Thanks!

eatyourgreens · 2021-02-18T15:04:57Z

@beckyrother Should be fixed now. This screenshot's from Firefox, but should be the same in other browsers.

beckyrother

Looks great! Thanks Jim :)

shaunanoordin

PR Review

This PR adds a "subject set completeness" reading to the Subject Set Selection menu (where applicable).

The "subject set completeness" readout is a basic new UI element
The code connects to Cellect to get the completeness stats (see Dev Notes below) ❕

Testing looks good on HMS NHS, but I ran into WTFs in other scenarios.

Dev Notes

The question of Cellect ⚠️ ❔

Observation:

this PR adds a fetch to the Cellect service (cellect.zooniverse.org) to fetch stats for Subject Sets.
This fetch sub-action occurs on the main "classify" action logic route. i.e. it happens as soon as the user fetches the page, and isn't optional.

Thoughts:

Concern: cellect.zooniverse.org seems to be a supplementary external service (i.e. not core to the classifier's main purpose) but it's tied to the main logic route.
Opinion: If Cellect has issues for any reason, we shouldn't allow it to either crash or slow down the main classifier experience for the user.

Recommendations(?):

Pop a try-catch on the fetch-from-Cellect action...
...and drop the fetch timeout to, like, 5 seconds? If possible?

Level of concern, from 1 () to 10 (somebody just ran rm -fr * on the server root): 6? 7?

Testing

Tested with macOS10+Chrome88

Scenario 1: HMS NHS (production)

This is the primary target project, and has 2 workflows with 2 subject sets each.

URLs:

Testing Steps:

Go to the project page and click Classify, OR go to the root Classify URL directly.
You'll see the Workflow Selection screen. Pick any workflow.
You'll see the Subject Set Selection screen.
Observe that each Subject Sets "card" has an additional "50% complete" etc readout

This part LGTM! 👍

However...

Scenario 2: a bunch of other projects (staging and production)

URLs:

Results:

Unable to view any of those pages!

Website fully crashes with the error message:

Server Error
Error: Error serializing `.workflows[0]` returned from `getServerSideProps` in "/projects/[owner]/[project]".
Reason: `undefined` cannot be serialized as JSON. Please use `null` or omit this value all together.

🤷‍♂️ ❓ ❗

Scenario 3: more other projects

URLs:

http://localhost:3000/projects/nora-dot-eisner/planet-hunters-tess?env=production

This part... looks good??

Status

I'm popping a CR for this review:

The question of Cellect is something I'd like answered. If you think Cellect ain't something big to be worried about, I won't be too hard to convince.
The bugs with testing scenario 2 is something that I'm still trying to figure out. Why are only specific projects borking?? I chose a mix of projects with multiple WFs and single WFs.

Current status is 🤷 ❓ ❗

shaunanoordin · 2021-02-18T22:52:43Z

packages/app-project/package.json

@@ -50,6 +50,7 @@
    "morgan": "^1.10.0",
    "newrelic": "~7.1.0",
    "next": "~9.5.5",
+    "node-fetch": "~2.6.1",


Note to self: this dependency adds the equivalent of "window.fetch" to Node, which doesn't have fetch functionality by default (methinks).

shaunanoordin · 2021-02-18T23:03:56Z

packages/app-project/src/helpers/fetchWorkflowsHelper/fetchWorkflowsHelper.js

+async function fetchWorkflowCellectStatus(workflow) {
+  const workflowURL = `https://cellect.zooniverse.org/workflows/${workflow.id}/status`
+  const response = await fetch(workflowURL)
+  const body = await response.json()
+  const { groups } = body ?? {}
+  return groups
+}


⚠️ Question: would it be good idea to wrap this in a try-catch block?

I'm wondering what'd happen if cellect.zooniverse.org takes an unexpected nap - not sure if the fetchWorkflowCellectStatus() would return groups = {}, groups = undefined, or just throw new Error(). But in any case, I wouldn't want to see the Classifier completely bork as a result of a secondary system failure.

Additional note to self: expanding on error handling, if Cellect wakes up drunk one day and has a response time of like 30 seconds, would that delay also be felt by users on the front end?

Bonus: figure out if fetch()/fetch-node can be configured with an explicit timeout

Good point. If getServerSideProps, or getStaticProps errors, then the server will send back a 500 response. The same thing will happen if any of the Panoptes requests fail.

Ideally, I think we'd like to use getStaticProps and use static caching to reduce our dependency on API requests. I believe getServerSideProps runs on every page transition so we are hitting the API a lot at the moment.

shaunanoordin · 2021-02-18T23:19:25Z

packages/app-project/src/helpers/fetchWorkflowsHelper/fetchWorkflowsHelper.js

+      const subjectSet = subjectSets.find(subjectSet => subjectSet.id === subjectSetID)
+      subjectSet.availableSubjects = subjectSetCounts[subjectSetID]


[minor] Is there any real possibility that subjectSet might end up as undefined here? I mean, logically, you really wouldn't call buildWorkflow() without a workflow and a matching subjectSets. The only way this could bork is if the Subject Sets retrieved from fetchWorkflowData() contains more than what's listed in the workflow's links, and if that happens that's usually a back end bug that we'd need to know about.

Subject sets sent back from fetchWorkflowData are all subject sets from all workflows, so should include every linked set for each individual workflow.

It would be useful to test this for a realistic data set. The test project only has two workflows and two sets. What happens when you want to link 100 volumes of transcription material to your workflow?

It turns out this is undefined for AnnoTate (but then, so are a bunch of other values that we would expect to exist, like workflow display name.)
http://local.zooniverse.org:3000/projects/drrogg/annotate?env=production

I suspect what's going on for AnnoTate is that completed sets aren't present in the Cellect response.

I'm now playing with the code to see what needs to be hacked to make AnnoTate work.

AnnoTate's workflow has 1198 linked subject sets, returned in pages of 20. I've completely rewritten the data fetching code to start dealing with larger lists of subject sets. There's now a separate API request for the subject set data, using the subject IDs that are returned by Cellect.

shaunanoordin · 2021-02-18T23:21:30Z

...p-project/src/shared/components/SubjectSetPicker/components/SubjectSetCard/SubjectSetCard.js

  const [subject] = subjects
  const { publicRuntimeConfig = {} } = getConfig() || {}
  const assetPrefix = publicRuntimeConfig.assetPrefix || ''
  const placeholderUrl = `${assetPrefix}/subject-placeholder.png`
  const subjectURLs = subject ? subject.locations.map(location => Object.values(location)[0]) : []
  const alt = subject ? `Subject ${subject.id}` : 'Loading'
+  const completeness = 1 - (availableSubjects / set_member_subjects_count)


[minor] Will set_member_subjects_count ever be 0? I mean, it's no biggie, it won't crash, it'll just set completeness to negative infinity, which admittedly may make the workflow seem daunting to all but the most optimistic volunteer.

Good point. Does Panoptes let you create sets with no subjects in them?

It's definitely possible to create Workflow with an empty Subject Set. I made a quick test, and the result is that the Classifier just shrugs and says "I guess there's no subject to load" (with a bunch of errors in the dev console) and displays a blank Task Area and blank Subject Viewer.

If you're curious: http://localhost:3000/projects/darkeshard/transformers/classify/workflow/3457

This is still in the "minor worries" category though.

eatyourgreens · 2021-02-19T15:44:46Z

The errors from standard projects should be fixed now. I'd forgotten to check for workflow.grouped before making the request to Cellect. I've also beefed up the tests to include responses for grouped and non-grouped workflows in the mock API.

eatyourgreens · 2021-02-19T16:00:57Z

If Cellect is down, I guess we'll lose sequential subject selection for Engaging Crowds. I think the fallback then might be the default Panoptes selector. Random selection maybe? So Cellect being unavailable might not just break the API requests here, it might break the subject queue too.

shaunanoordin · 2021-02-19T22:13:40Z

PR Review (Update)

First off, a big 👍 for making sure there's some branching logic that checks when (!workflow.grouped) - this certainly goes a long way in addressing unnecessary calls to Cellect.

Second, I ran test through the URLs and...

So far I've traced the issue of Prototype 2015 to the following:

It's production Project 1292
It has one active workflow, WF900
For some reason, fetchWorkflowsHelper returns 900's displayName as undefined
I'm still trying to figure out why
Side note, the fact that the app gets very upset when displayName is undefined and just crashes seems to indicate some very brittle code somewhere down the line. That's a separate investigation.

I'm filing this as a Monday problem. Yoinks!

shaunanoordin · 2021-02-19T22:30:35Z

Here's what I found out:

Pinging https://panoptes.zooniverse.org/api/translations/?translated_id=7792&translated_type=workflow results in...

{
    "translations": [
        {
            "id": "6709",
            "translated_id": 7792,
            "translated_type": "Workflow",
            "language": "en",
            "strings": {
                "display_name": "Alternate identification",
                "tasks.T0.help": "",
                "tasks.T0.question": "Good or Bad?",
                "tasks.T0.answers.0.label": "Good",
                "tasks.T0.answers.1.label": "Bad",
                "tasks.T0.answers.2.label": "Morality is subjective and you cannot objectively judge robots based solely on your own moral viewpoint "
            },
            "string_versions": {
                "display_name": 6403569,
                "tasks.T0.help": 6403569,
                "tasks.T0.question": 6403569,
                "tasks.T0.answers.0.label": 6403569,
                "tasks.T0.answers.1.label": 6403569,
                "tasks.T0.answers.2.label": 6403569
            },
            "href": "/translations/6709",
            "created_at": "2018-09-26T12:12:31.622Z",
            "updated_at": "2019-02-22T11:39:51.587Z",
            "links": {
                "published_version": null,
                "workflow": "7792"
            }
        }
    ],
    "links": { ... }
}

BUT compare that to https://panoptes.zooniverse.org/api/translations/?translated_id=900&translated_type=workflow which results in...

{
    "translations": [],
    "links": { ... }
}

So Workflow 7792 has translations, but WF 900 doesn't? (They're both WFs in the same Project 1292)

I think the key takeaway here is that we can't 100% rely on /translations to always return resources??

eatyourgreens · 2021-02-20T11:59:23Z

When was workflow 900 last updated? Anything updated in the last two years or so should have translations, I think.

eatyourgreens · 2021-02-22T11:13:39Z

I've added workflow.display_name as a fallback for the workflow name, but I'm not sure we should do a lot of work to accommodate workflows that are more than 2 or 3 years old. Those workflows would error with our deployed production code.

packages/app-project/src/helpers/fetchWorkflowsHelper/fetchWorkflowsHelper.js

eatyourgreens · 2021-02-22T14:43:52Z

The tests expect fetchWorkflowsHelper to pass errors on to consumers, so I'm re-throwing caught errors for workflow requests.

EDIT: should the page error or render if subject sets aren't available? I've got the subject sets query set up to catch API errors and render with an empty subject sets array, but that doesn't make sense if subject sets are required in order to classify.

srallen

This is just a conceptual question out of scope of this PR (which I'm happy to make a discussion issue for), but it seems like theres a lot of async code going on in this helper file, much more than its name fetchWorkflowsHelper would imply and I'm not just not clear why there isn't a workflow store for all this.

eatyourgreens · 2021-02-22T17:11:53Z

I think that's historical, going back to #1077 when fetchWorkflowsHelper was added to fetch data for the project Hero component. It is doing a lot more work now, across more than one component. We could definitely add the workflow data to the initialState page prop, then hydrate a workflows store in the browser. That said, I think workflow state is static: once we've fetched workflows and subject sets from the API, they're never updated in the browser.

There's an ongoing discussion to add static props to _app.js, for cases like this. It doesn't make a lot of sense to be constantly fetching all this data from Panoptes when we could cache it across requests. Thanks to that discussion, I did discover next-plugin-preval, which might be useful here.

eatyourgreens · 2021-03-19T14:40:53Z

I've updated this to include the latest changes to our Sentry setup. I've also split the Node data-fetching code into helpers/fetchWorkflowsHelper and helpers/fetchSubjectSets.

Fetch subject set statuses from Cellect, after fetching workflows from Panoptes. Add `subjectSet.availableSubjects` to each workflow subject set.

Only use the Cellect API for grouped workflows. Return a default set of workflow data otherwise. Remove included subject sets from the workflow request and request them separately, based on the Cellect response. Update tests to test for grouped and non-grouped workflows. Update mocks to represent Cellect responses without the grouped attribute.

Wrap workflow API calls in try/catch and log Node errors to Sentry.

Split out the subject set data fetching into `helpers/fetchSubjectSets`.

shaunanoordin · 2021-03-19T17:58:10Z

Following up from my previous PR review, the functionality checks now all look good! I'm going to give the code a quick read to see what updates have happened since I last went through this. brb!

shaunanoordin

PR Review (Update)

Follows earlier PR review

Functionality tests look good, and code read looks good!

Testing

Same tests as before, and everything's passing OK now.

Scenario 1: HMS NHS (production)

Results: 🆗 Completeness value for each WF and SS are displaying properly on their selection screens.

Scenario 2: a bunch of other projects (staging and production)

Results: 🆗 These workflows load normally. I take it the "can't find translations" issue has been resolved, by using the workflow.display_name as a fallback? (i.e.: const displayName = displayNames[workflow.id] || workflow.display_name ) Lookin' good!

Scenario 3: more other projects

http://localhost:3000/projects/nora-dot-eisner/planet-hunters-tess?env=production

Results: 🆗

Status

LGTM 👍 I genuinely can't find anything of issue with this PR in its current state. Let's gooo!

packages/app-project/src/helpers/fetchSubjectSets/fetchSubjectSets.js

eatyourgreens added this to In progress in Engaging Crowds via automation Feb 3, 2021

eatyourgreens added this to the Engaging Crowds: National Archives project milestone Feb 3, 2021

eatyourgreens force-pushed the subject-set-completeness branch from eff8b0a to 7b27884 Compare February 8, 2021 16:40

eatyourgreens marked this pull request as ready for review February 16, 2021 11:03

eatyourgreens force-pushed the subject-set-completeness branch from 4ac9154 to 195cdd3 Compare February 16, 2021 11:16

eatyourgreens requested review from beckyrother and shaunanoordin February 16, 2021 11:23

beckyrother suggested changes Feb 16, 2021

View reviewed changes

eatyourgreens force-pushed the subject-set-completeness branch 2 times, most recently from a28708d to 95cf2bb Compare February 18, 2021 15:02

beckyrother self-requested a review February 18, 2021 20:18

beckyrother approved these changes Feb 18, 2021

View reviewed changes

github-actions bot added the approved This PR is approved for merging label Feb 18, 2021

shaunanoordin requested changes Feb 19, 2021

View reviewed changes

eatyourgreens force-pushed the subject-set-completeness branch from e067eff to db4966d Compare February 19, 2021 11:25

eatyourgreens force-pushed the subject-set-completeness branch from 760d3fa to 023f8b0 Compare February 19, 2021 16:01

eatyourgreens force-pushed the subject-set-completeness branch from 023f8b0 to bf655ee Compare February 22, 2021 11:08

eatyourgreens commented Feb 22, 2021

View reviewed changes

packages/app-project/src/helpers/fetchWorkflowsHelper/fetchWorkflowsHelper.js Outdated Show resolved Hide resolved

eatyourgreens mentioned this pull request Feb 22, 2021

remove API include requests #678

Open

eatyourgreens force-pushed the subject-set-completeness branch from bf655ee to f92626b Compare February 22, 2021 11:40

eatyourgreens commented Feb 22, 2021

View reviewed changes

packages/app-project/src/helpers/fetchWorkflowsHelper/fetchWorkflowsHelper.js Outdated Show resolved Hide resolved

eatyourgreens requested a review from shaunanoordin February 22, 2021 14:02

eatyourgreens force-pushed the subject-set-completeness branch from 9028524 to bfd95d7 Compare February 22, 2021 14:41

srallen reviewed Feb 22, 2021

View reviewed changes

eatyourgreens force-pushed the subject-set-completeness branch 2 times, most recently from a1233b3 to 2b7446c Compare March 19, 2021 14:38

eatyourgreens added 9 commits March 19, 2021 15:38

Install node-fetch

4f0f54b

Fetch subject set status from cellect

37604f6

Fetch subject set statuses from Cellect, after fetching workflows from Panoptes. Add `subjectSet.availableSubjects` to each workflow subject set.

Add completeness counts to subject set cards

510d85e

Restore workflow.default

e0ba688

Add cellect API to tests

65f5d1c

Tidy up workflow display names

0d9e870

log page props errors to Sentry

915d5d5

Wrap workflow API calls in try/catch and log Node errors to Sentry.

Split up helper files

ff47add

Split out the subject set data fetching into `helpers/fetchSubjectSets`.

eatyourgreens force-pushed the subject-set-completeness branch from 2b7446c to ff47add Compare March 19, 2021 15:38

shaunanoordin approved these changes Mar 19, 2021

View reviewed changes

eatyourgreens commented Mar 19, 2021

View reviewed changes

packages/app-project/src/helpers/fetchSubjectSets/fetchSubjectSets.js Outdated Show resolved Hide resolved

Fix typo

93e579d

eatyourgreens merged commit cdf9e49 into master Mar 19, 2021

Engaging Crowds automation moved this from In progress to Finished Mar 19, 2021

eatyourgreens deleted the subject-set-completeness branch March 19, 2021 18:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Engaging Crowds: Subject set completeness #2016

Engaging Crowds: Subject set completeness #2016

eatyourgreens commented Feb 3, 2021 •

edited

beckyrother left a comment

eatyourgreens commented Feb 18, 2021

beckyrother left a comment

shaunanoordin left a comment

shaunanoordin Feb 18, 2021

shaunanoordin Feb 18, 2021

shaunanoordin Feb 18, 2021

eatyourgreens Feb 19, 2021

shaunanoordin Feb 18, 2021

eatyourgreens Feb 19, 2021

eatyourgreens Feb 19, 2021

eatyourgreens Feb 22, 2021 •

edited

shaunanoordin Feb 18, 2021

eatyourgreens Feb 19, 2021

shaunanoordin Feb 19, 2021

eatyourgreens commented Feb 19, 2021

eatyourgreens commented Feb 19, 2021

shaunanoordin commented Feb 19, 2021 •

edited

shaunanoordin commented Feb 19, 2021

eatyourgreens commented Feb 20, 2021

eatyourgreens commented Feb 22, 2021

eatyourgreens commented Feb 22, 2021 •

edited

srallen left a comment •

edited

eatyourgreens commented Feb 22, 2021 •

edited

eatyourgreens commented Mar 19, 2021

shaunanoordin commented Mar 19, 2021

shaunanoordin left a comment

		const subjectSet = subjectSets.find(subjectSet => subjectSet.id === subjectSetID)
		subjectSet.availableSubjects = subjectSetCounts[subjectSetID]

Engaging Crowds: Subject set completeness #2016

Engaging Crowds: Subject set completeness #2016

Conversation

eatyourgreens commented Feb 3, 2021 • edited

Review Checklist

General

Components

Apps

Publishing

Post-merging

beckyrother left a comment

Choose a reason for hiding this comment

eatyourgreens commented Feb 18, 2021

beckyrother left a comment

Choose a reason for hiding this comment

shaunanoordin left a comment

Choose a reason for hiding this comment

PR Review

Dev Notes

Testing

Status

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eatyourgreens Feb 22, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eatyourgreens commented Feb 19, 2021

eatyourgreens commented Feb 19, 2021

shaunanoordin commented Feb 19, 2021 • edited

PR Review (Update)

shaunanoordin commented Feb 19, 2021

eatyourgreens commented Feb 20, 2021

eatyourgreens commented Feb 22, 2021

eatyourgreens commented Feb 22, 2021 • edited

srallen left a comment • edited

Choose a reason for hiding this comment

eatyourgreens commented Feb 22, 2021 • edited

eatyourgreens commented Mar 19, 2021

shaunanoordin commented Mar 19, 2021

shaunanoordin left a comment

Choose a reason for hiding this comment

PR Review (Update)

Testing

Status

eatyourgreens commented Feb 3, 2021 •

edited

eatyourgreens Feb 22, 2021 •

edited

shaunanoordin commented Feb 19, 2021 •

edited

eatyourgreens commented Feb 22, 2021 •

edited

srallen left a comment •

edited

eatyourgreens commented Feb 22, 2021 •

edited