Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cat2cat: Handling an Inconsistently Coded Categorical Variable in a Panel Dataset #562

Open
3 of 20 tasks
Polkas opened this issue Dec 3, 2022 · 12 comments
Open
3 of 20 tasks

Comments

@Polkas
Copy link

Polkas commented Dec 3, 2022

Submitting Author Name: Maciej Nasinski
Submitting Author Github Handle: @Polkas
Other Package Authors Github handles: (comma separated, delete if none)
Repository: https://github.com/Polkas/cat2cat
Submission type: Pre-submission
Language: en


  • Paste the full DESCRIPTION file inside a code block below:
Package: cat2cat
Title: Handling an Inconsistently Coded Categorical Variable in a Panel Dataset
Version: 0.4.5.9000
Authors@R: person("Maciej", "Nasinski", email = "nasinski.maciej@gmail.com", role = c("aut", "cre"))
Maintainer: Maciej Nasinski <nasinski.maciej@gmail.com>
Description: 
  Unifying of an inconsistently coded categorical variable between two different time points in accordance with a mapping table.
  The main rule is to replicate the observation if it could be assign to a few categories.
  Then using simple frequencies or modern statistical methods to approximate probabilities of being assign to each of them.
  This novel procedure was invented and implemented in the paper by (Nasinski, Majchrowska and Broniatowska (2020) <doi:10.24425/cejeme.2020.134747>).
Depends: R (>= 3.6)
License: GPL (>= 2)
URL: https://github.com/Polkas/cat2cat, https://polkas.github.io/cat2cat/
BugReports: https://github.com/Polkas/cat2cat/issues
Encoding: UTF-8
Imports:
    MASS
Suggests:
    caret,
    randomForest,
    knitr,
    rmarkdown,
    pacman,
    testthat (>= 3.0.0),
    magrittr,
    dplyr
LazyData: true
VignetteBuilder: knitr
RoxygenNote: 7.2.1
Config/testthat/edition: 3

Scope

  • Please indicate which category or categories from our package fit policies or statistical package categories this package falls under. (Please check an appropriate box below):

    Data Lifecycle Packages

    • data retrieval
    • data extraction
    • data munging
    • data deposition
    • data validation and testing
    • workflow automation
    • version control
    • citation management and bibliometrics
    • scientific software wrappers
    • field and lab reproducibility tools
    • database software bindings
    • geospatial data
    • text analysis

    Statistical Packages

    • Bayesian and Monte Carlo Routines
    • Dimensionality Reduction, Clustering, and Unsupervised Learning
    • Machine Learning
    • Regression and Supervised Learning
    • Exploratory Data Analysis (EDA) and Summary Statistics
    • Spatial Analyses
    • Time Series Analyses
  • Explain how and why the package falls under these categories (briefly, 1-2 sentences). Please note any areas you are unsure of:

The main objective is to unify the inconsistently coded categorical variables in a panel/longitudinal dataset.
The supervised methods can be used in the cat2cat procedure.
The output from the cat2cat function can be used in the e.g. weighted linear regression or to assess the counts over the time.

I plan to apply it when know if I can submit the package.

  • Who is the target audience and what are scientific applications of this package?

Any scientific field where the panel/longitudinal dataset can be used.
Examples of a panel dataset with such inconsistent coded categorical variables are ones linked with the The International Standard Classification of Occupations (ISCO) and the International Classification of Diseases (ICS).

According to best of my knowledge there is no alternative to my solution other than aggregate the datasets (with some simplifications) or remove the variable.

@Polkas Polkas changed the title Title: Handling an Inconsistently Coded Categorical Variable in a Panel Dataset cat2cat: Handling an Inconsistently Coded Categorical Variable in a Panel Dataset Dec 3, 2022
@annakrystalli
Copy link
Contributor

Thanks for the pre-submission enquiry @Polkas !

The editorial team is discussing and we'll get back to you shortly.

@annakrystalli
Copy link
Contributor

Dear @Polkas,

The editorial team has concluded that the package definitely fits in our "stats" scope.

Before proceeding and closing this pre-sub enquiry, there is also a need to clarify what category it would fit. The stats-devguide states categories are appropriate where at least half of all standards can be applied. We suggest you need to try and narrow down to one category only.

We feel it does not best fit the "time series" category and seems initially to most likely be "Machine Learning," We suggest you spend a little time to read though the standards and consider which you would think most appropriate.

Following that, the best way to confirm would be to go through the formal process of documenting compliance with the stats standards, which needs to be done prior to submission anyways. You can call @ropensci-review-bot check srr in this issue to confirm documentation has been completed successfully. You can find more details in our documentation.

Just ping me here to confirm that's done and the category you have narrowed it down too or if you need any help.

Thanks again for your enquiry!

@maurolepore
Copy link
Member

Dear @Polkas,

Today starts my rotation as EiC meaning the role of @annakrystalli is now mine. Did you have the chance to follow up on the comment above?

@maelle
Copy link
Member

maelle commented May 2, 2023

👋 @Polkas! I'm now the current editor in chief Any update? 😸

@maelle
Copy link
Member

maelle commented May 23, 2023

@Polkas friendly reminder, did you get a chance to work on the comments from #562 (comment)?

@Polkas
Copy link
Author

Polkas commented May 23, 2023

Hey, thank you for your update. I already assessed what category and scope is possible for my package. I found out that the base requirements are possible to be followed. I am limited with any decision to update my package now as I submited my paper to SoftwareX journal and waiting for their decision and comments.

@Polkas Polkas closed this as completed May 23, 2023
@Polkas Polkas reopened this May 23, 2023
@maelle
Copy link
Member

maelle commented Jul 21, 2023

@Polkas any update? 😸

@Polkas
Copy link
Author

Polkas commented Sep 17, 2023

Hey, my paper was just published. I will start to work on the new feature branch for possible ropensci submission. I will give here a follow-up. Have a great day.

@jhollist
Copy link
Member

@Polkas I am currently serving as the EIC and am checking in on some older submissions. First, congrats on the publication! You mentioned that you might pursue another submission to rOpenSci. Have you decided to move forward with that?

@Polkas
Copy link
Author

Polkas commented Dec 17, 2023

Hey @jhollist, thank you for your response. I have dedicated effort to align with the expected standards. However, it appears that the current focus of rOpenSci may have shifted away from packages similar to mine.

I understand that rOpenSci is now prioritizing support for packages that facilitate reproducible research and manage the data lifecycle for scientists. I have thoroughly reviewed the current package categories and, unfortunately, it seems my package may not align with any of these categories.

If my understanding is correct and my package indeed falls outside the scope of rOpenSci's current focus, please feel free to close this issue.

@jhollist
Copy link
Member

@Polkas your package is a better fit for our Statistical Software. Based on the conversation above (#562 (comment)), take a close look at https://stats-devguide.ropensci.org/pkgdev.html#scope and see if you think any of those fit. The prior conversations on here and amongst the editors felt like Machine Learning might be the best fit. If you would like to proceed take a close look at the Stats devguide. If you have specific questions after that, you can ping me again here. Thanks!

@ldecicco-USGS
Copy link

Hi @Polkas ! I'm checking in on submissions that have been sitting for awhile. It sounds like the feedback has been that this package would be better suited for the rOpenSci Statistical Software submission. The process is similar, but there are a few differences. I'll once again plug the statistical submission guide:

https://stats-devguide.ropensci.org/

Let me know if you have any questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants