Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add custom review tags #1527

Merged
merged 29 commits into from
Jan 6, 2024
Merged

Conversation

voismager
Copy link
Contributor

@voismager voismager commented Sep 17, 2023

Please see #1505 for more details.

List of changes:

  • A new table was added to project setup / edit pages that allows users to set up custom tags divided by categories (see screenshot)
  • A checklist table with specified tags was added to Review page (see screenshot)
  • Selected tags for each record are exported as a binary matrix (1 if tag was selected, 0 otherwise) along with labels
  • I tested main happy paths, everything seems to be working correctly with our changes 🙂
  • Changes are backwards-compatible with current AR projects. I had to add a new column to results table, so I bumped state_version to 2 and made it so schema will automatically migrate to the new version if version is 1 (see asreview/state/compatibility.py)

Please let us know what you think, we're looking forward to your feedback 🙂

Screenshot 2023-09-17 at 23 42 24 Screenshot 2023-09-17 at 23 42 56

@voismager voismager changed the title Custom review tags Add custom review tags Sep 17, 2023
@voismager voismager marked this pull request as ready for review September 17, 2023 22:07
@Rensvandeschoot
Copy link
Member

This is great work!! I got it to work and I love this feature, because it has often been discussed: #1391, #1505, #1291, #1451

I am happy to review this PR!

@Rensvandeschoot
Copy link
Member

The functionality can also be used to filter records you want to discuss in a later stage (see this discussion)
WhatsApp Image 2023-09-18 at 10 19 41

@voismager
Copy link
Contributor Author

voismager commented Sep 18, 2023

@Rensvandeschoot Thank you! Let me know if you have any questions or concerns.
I see some checks failed, should I fix them in the meantime?

@terrymyc
Copy link
Member

@voismager Thank you for the PR. This looks promising! To my understanding, tags are used to categorize records. Currently, the relevant/irrelevant labels are the only tags we have. Before we move on to the next step and fix the unit tests, I have some questions and comments:

  • Is it necessary to have two levels of tags? I understand the need to categorize records, but the similar feature in a reference management software like Zotero does not have two levels of tagging. Also, we try to avoid duplicating the functionality of reference management software, which can better meet the need for sophisticated organization of records. I would suggest we start with implementing a single-level tag feature.
  • Since the custom tags are associated with records, it is ideal to add custom tags on-the-fly when reviewing a record rather than setting up tags when setting up a project. It is confusing to set/edit tags on the project setup and Details screens, as tags are attributes of the record, not the project. It seems to me that setting up tags ahead of time is using ASReview as a reference manager. I would recommend leaving the project setup and Details screens as they are (also meaning no tagging when adding prior knowledge). I would suggest adding/editing tags of a record on the Review page and editing tags of records on the History page.

I would like to hear feedback from others on these matters. Thanks again for your contribution!

@Rensvandeschoot
Copy link
Member

Thank you again for the PR. This is one of the most promising PRs we have seen. I appreciate the insights shared by everyone, including the potential overlap with reference management software functionalities.

I tested the functionality, and everything seems to work correctly. Based on my experiences and discussion with @J535D165 I would like to share the following ideas/suggestions regarding the user experience and interface with you:

  1. In the following screenshot, I have created three categories of tags that relate to the labeling decision. These categories specify the type of condition the labeling decision applies to (assuming there are three), whether the label needs to be discussed with a senior or other screener, and the inclusion or exclusion criteria used. In my opinion, these tags should be predetermined, as they should remain consistent throughout the entire process and among different screeners, and thus, should be included in the project information, I believe. Perhaps it would be a good idea to allow the user to make the labeling decision first, followed by a pop-up where the tags need to be labeled. This way, the labeling decision, along with the other tags, becomes an integral part of the record.

image


2. When configuring the tags, would it be feasible to facilitate the creation of the entire category along with the tags in a single step, without necessitating a separate pop-up for the category name followed by another step for setting up the tags and tag names? This approach would not only streamline the process by reducing the number of clicks but also prevent potential confusion. Also, it's possible to have a category without any tags, which serves no use-case, I think.

image

  1. Additionally, I'd like to propose adding a small note under the header to clarify that tags are not used for training the machine learning model.

  2. Furthermore, to maintain a cleaner interface, could we consider removing the Tag ID? It appears to be utilized exclusively in the backend, so it might be more user-friendly not to display this information on the user interface.

  3. Would it be possible to export all tag categories, also if a tag is never used. Currently, a never-used tag is omitted from the export.

image

Again, thank you all for your efforts and contributions, it is highly appreciated!!! I foresee this work paves the way for numerous promising features in the near future. I am happy to test a new version, and do let us know if you want to discuss the implementation. Also, we have some junior programmers whom I can ask to help with the documentation if you'd like.

@PeterLombaers
Copy link
Member

Hi @voismager! Nice pull request, this will open up many possibilities and is a useful feature for many people. I took a look at the part directly related to the state file, since that is where I did the most work myself. Looks good! Two things that I was wondering:

  • You use custom_metadata_json or custom_metadata in some places. This is often a dictionary containing a single key 'tags'. Maybe it's simpler to just use tags instead of custom_metadata_json? That seems a more intuitive name, and you could cut out the first key 'tags' out of the dictionary. Or did you have a good reason for it?
  • I like that it automatically check for the state version and then keeps on upgrading it until you reach the current version. Right now your conversion scripts take an sqlite3.Connection as argument. Does it make more sense to let them have the state object itself as argument? That way, if we for example change something to the settings_metadata.json file, we can upgrade it in the same flow.

@j-5chneider
Copy link

This feature is amazing thank you for contributing. It's the only feature missing that holds me back from switching to ASReview.

@smthomas1704
Copy link
Contributor

  1. When configuring the tags, would it be feasible to facilitate the creation of the entire category along with the tags in a single step, without necessitating a separate pop-up for the category name followed by another step for setting up the tags and tag names? This approach would not only streamline the process by reducing the number of clicks but also prevent potential confusion. Also, it's possible to have a category without any tags, which serves no use-case, I think.

We do plan to support adding tags on the fly as a fast follow to this PR. Do you suppose its a blocker for this work?

@yifeimichelle
Copy link
Contributor

Thanks for the amazing, detailed feedback on our PR! We are blown away by the support from this community. Just to let you know we're discussing some of the comments internally (and some team members have been out due to illness and travel) but we hope to have more responses coming up very soon.

To respond to the comments from @terrymyc

  • Is it necessary to have two levels of tags? I understand the need to categorize records, but the similar feature in a reference management software like Zotero does not have two levels of tagging. Also, we try to avoid duplicating the functionality of reference management software, which can better meet the need for sophisticated organization of records. I would suggest we start with implementing a single-level tag feature.
  • Since the custom tags are associated with records, it is ideal to add custom tags on-the-fly when reviewing a record rather than setting up tags when setting up a project. It is confusing to set/edit tags on the project setup and Details screens, as tags are attributes of the record, not the project. It seems to me that setting up tags ahead of time is using ASReview as a reference manager. I would recommend leaving the project setup and Details screens as they are (also meaning no tagging when adding prior knowledge). I would suggest adding/editing tags of a record on the Review page and editing tags of records on the History page.

I see how it seems like we are duplicating the feature of a reference manager, but our main reason for adding tags at this stage is so that it doesn't have to be added in the reference manager, since that would require duplicating work compared to capturing this data at the time of initial review. Some feedback from our academic partners is that reference managers are also not that easy to work with when using tags at a large scale, either adding or doing analysis on the basis of tags. They often end up using external workflows, such as R or Python or a spreadsheet, sometimes bypassing the reference manager altogether.

The two-tier tags was also a request from these academic users (more info on the project here), because that is how they would like to codify data. If there are users that do not want two-tiered tags maybe we can provide the option to just have simple tags.

@Rensvandeschoot
Copy link
Member

We do plan to support adding tags on the fly as a fast follow to this PR. Do you suppose its a blocker for this work?

No, I don't think it is a blocker. The only thing we need to determine is the difference between the 'tag' functionality and the 'notes' option. As long as the tags relate to the labeling decision there is no overlap between the two. If tags are added on the fly, what would you do with those records that have already received a labeling decision?

@voismager voismager closed this Oct 9, 2023
@voismager voismager reopened this Oct 9, 2023
@voismager
Copy link
Contributor Author

voismager commented Oct 9, 2023

Thank you again for the PR. This is one of the most promising PRs we have seen. I appreciate the insights shared by everyone, including the potential overlap with reference management software functionalities.

I tested the functionality, and everything seems to work correctly. Based on my experiences and discussion with @J535D165 I would like to share the following ideas/suggestions regarding the user experience and interface with you:

  1. In the following screenshot, I have created three categories of tags that relate to the labeling decision. These categories specify the type of condition the labeling decision applies to (assuming there are three), whether the label needs to be discussed with a senior or other screener, and the inclusion or exclusion criteria used. In my opinion, these tags should be predetermined, as they should remain consistent throughout the entire process and among different screeners, and thus, should be included in the project information, I believe. Perhaps it would be a good idea to allow the user to make the labeling decision first, followed by a pop-up where the tags need to be labeled. This way, the labeling decision, along with the other tags, becomes an integral part of the record.

image

  2. When configuring the tags, would it be feasible to facilitate the creation of the entire category along with the tags in a single step, without necessitating a separate pop-up for the category name followed by another step for setting up the tags and tag names? This approach would not only streamline the process by reducing the number of clicks but also prevent potential confusion. Also, it's possible to have a category without any tags, which serves no use-case, I think.

image

  1. Additionally, I'd like to propose adding a small note under the header to clarify that tags are not used for training the machine learning model.
  2. Furthermore, to maintain a cleaner interface, could we consider removing the Tag ID? It appears to be utilized exclusively in the backend, so it might be more user-friendly not to display this information on the user interface.
  3. Would it be possible to export all tag categories, also if a tag is never used. Currently, a never-used tag is omitted from the export.

image

Again, thank you all for your efforts and contributions, it is highly appreciated!!! I foresee this work paves the way for numerous promising features in the near future. I am happy to test a new version, and do let us know if you want to discuss the implementation. Also, we have some junior programmers whom I can ask to help with the documentation if you'd like.

Hello, sorry for delayed response, I was in the hospital last couple of weeks. Also I hit close button by accident, please disregard the notification 😄

  1. Regarding tags consistency. I think I made it so that tags are exported / imported along with other project details. This way researchers can share them easily with each other. Thus I don't think there's a need to use predetermined tags, at least for now. Regarding the pop-up, it was one of our first design options but I quickly dismissed it as I thought not having everything on your screen slows down the pace of reviewing. I may be wrong on this one though, so I'm open to discuss this further.
  2. Agree, a category without any tags serves no purpose, it's a bug. I also like your proposal for a single-step setup. @lgaud was working on this part and I don't know if she'll be available to change that in the near future. Both of these issues seem minor to me, so I would move on for now.
  3. I can add this note, could you point out where would you like to see it exactly?
  4. No, I think tag id is useful outside back-end. It's a unique identifier that researcher chooses once and can't change later, unlike tag display name. We use tag ids as column names in the exported matrix. Think of the case when a user changes tag name on fly because they want to clarify something, or when there's a multilingual team that would like to localise tag names for each member. In both cases the resulting matrix will contain the same columns thanks to tag ids.
  5. Yes, I can do that.

@voismager
Copy link
Contributor Author

voismager commented Oct 9, 2023

Hi @voismager! Nice pull request, this will open up many possibilities and is a useful feature for many people. I took a look at the part directly related to the state file, since that is where I did the most work myself. Looks good! Two things that I was wondering:

  • You use custom_metadata_json or custom_metadata in some places. This is often a dictionary containing a single key 'tags'. Maybe it's simpler to just use tags instead of custom_metadata_json? That seems a more intuitive name, and you could cut out the first key 'tags' out of the dictionary. Or did you have a good reason for it?
  • I like that it automatically check for the state version and then keeps on upgrading it until you reach the current version. Right now your conversion scripts take an sqlite3.Connection as argument. Does it make more sense to let them have the state object itself as argument? That way, if we for example change something to the settings_metadata.json file, we can upgrade it in the same flow.

Hi @PeterLombaers,

  1. I added this column In case we'd need to add additional data to db in the succeeding PRs. Instead of changing db schema each time there's a new column, we can add a new field to json inside custom_metadata dictionary, and check if it's there for backward compatibility. I can remove it and add a separate 'tags' column if you prefer.
  2. Sure, I think I can do that.

@voismager
Copy link
Contributor Author

voismager commented Oct 9, 2023

If tags are added on the fly, what would you do with those records that have already received a labeling decision?

@Rensvandeschoot, I can answer that 😄

In our current implementation, we don't come back to old records in any case. If tags were added later, old records would have an empty list of selected tags, same as if new records were reviewed and none of tags were selected.

@voismager
Copy link
Contributor Author

I would suggest adding/editing tags of a record on the Review page and editing tags of records on the History page.

@terrymyc, we were actually planning on doing that, but in the follow up PRs.

I hope @yifeimichelle have answered the rest of your points.

@Rensvandeschoot
Copy link
Member

Hello, sorry for delayed response, I was in the hospital last couple of weeks. Also I hit close button by accident, please disregard the notification 😄

Oe... I hope you are recovered! all the best!

@Rensvandeschoot
Copy link
Member

  1. Regarding tags consistency. I think I made it so that tags are exported / imported along with other project details. This way researchers can share them easily with each other. Thus I don't think there's a need to use predetermined tags, at least for now. Regarding the pop-up, it was one of our first design options but I quickly dismissed it as I thought not having everything on your screen slows down the pace of reviewing. I may be wrong on this one though, so I'm open to discuss this further.

Sorry for the confusion, I agree that default, pre-determined tags are not needed and screeners should have the flexibility to set tags. However, we should discuss/clarify the distinction between 'notes' and 'tags.' I think that Notes can be more free-form and spontaneous, while tags should be closely tied to the labeling decision or be an essential part of the record.

Additionally, the absence of a tag could be interpreted as a 'zero' in the output file. This could either mean that the tag is genuinely not applicable, or it could indicate missing information because the user overlooked the tag. To avoid ambiguity, I would suggest making the tagging process more explicit, for example, in a pop-up after having made the labelling decision. (I think...)

@Rensvandeschoot
Copy link
Member

  1. Agree, a category without any tags serves no purpose, it's a bug. I also like your proposal for a single-step setup. @lgaud was working on this part and I don't know if she'll be available to change that in the near future. Both of these issues seem minor to me, so I would move on for now.

Indeed very minor for the technical implementation, helpful in the end for the user experience :-)

@Rensvandeschoot
Copy link
Member

  1. I can add this note, could you point out where would you like to see it exactly?

Maybe at the same place as the text below Model? @terrymyc

image

@voismager
Copy link
Contributor Author

Hello @J535D165, our team made some changes to the PR according to our previous discussions:

  1. Bug with empty categories is fixed (hopefully we didn't introduce any new 😄)
  2. Tag / category id is now being autogenerated based on name as a prompt, but user can still choose a different id if they wish to
  3. Tag editor is now easier to use with tags being on the same level as category
  4. All configured tags are now exported, even if they were never selected
  5. Compatibility method now accepts sqlstate instead of sqlite3.Connection
  6. Some tests were added

Could you check our PR once more? 🙂

@Rensvandeschoot
Copy link
Member

Great work! I can very much appreciate your progress!! I tested the new version and could add tags and export these in a data file without issues!!!

I leave it up to @J535D165 to provide technical comments and merge this PR. We can always add more features at a later stage (like changing tags via the history panel).

For now, I have one suggestion to consider: For the records selected as prior knowledge, the label for all tags is 0, which is not necessarily true, but you can not provide a tag for these prior records. Should the tags for priors remain empty? Or do you think a different solution might be better?

@george-gca
Copy link

I don't know if it is better to do this in another PR, but could you also add support for tags as a new column in the data format? This way we could easily import/export data with tags in it.

@J535D165 J535D165 added this to the v2.0 milestone Jan 3, 2024
@J535D165 J535D165 merged commit 6ef67ab into asreview:master Jan 6, 2024
10 checks passed
@J535D165
Copy link
Member

J535D165 commented Jan 6, 2024

A big thanks to all!!! This PR is an excellent example of collaboration in science :)

I made some changes to the PR because other works started to conflict with this merge. I think the work in #1224 helps resolve some of the UX suggestions in this PR. This work will be part of the upcoming 2.0 release with many other exciting features.

Let us know if you or your team are willing to stay involved. We are welcoming all PRs, small and big ones (like this one). I will promise we will merge then fast(er than this one, sorry).

Best Jonathan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request front-end Issues related to front-end, UX and UI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet