Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review edits #57

Open
6 of 12 tasks
jbousquin opened this issue Apr 2, 2024 · 1 comment
Open
6 of 12 tasks

Review edits #57

jbousquin opened this issue Apr 2, 2024 · 1 comment

Comments

@jbousquin
Copy link
Collaborator

jbousquin commented Apr 2, 2024

List to track progress on reviewer comments. Will add/edit as we go.

  • nitpick (domain.py): there is no need for a raw string for TADA_DATA_URL
  • discussion (convert.py): About the TODO - both points of view (regrouping constants in a single place, or having them defined near their place of use to avoid jumping around the code base) are valid. I am usually in favor of the former.
  • suggestion (domain.py): In harmonize_TADA_dict, we could use a groupby operation to avoid looping through the dataframe using python. TOCHECK
  • suggestion (domain.py): replace x in list(set(pandas_series)) pattern w/ .unique method
  • suggestion (domain.py, basic.py): replace one dict functions (e.g., out_col_lookup) w/ module-level dicts. Move their docstrings (e.g., sources) to module docstring.
  • suggestion (convert.py): We could add "references" sections in the docstrings so that the sources are present in the website and not only in the source code.
  • suggestion (basis.py, general): use pandas' methods, e.g., assign w/ np.where() for masked replace. Do compare on timeit, consistency and any other pros/cons.
  • todo (init.py): importlib.metadata was added in python 3.8, which is the minimal version supported by the package according to its pyproject.toml. The try .. except block should not be needed, even more so considering that importlib_metadata is not listed in the project requirements.
  • todo (basis.py): group the conditions branches in update_result_basis
  • todo (contributing.rst): Add a section describing how to setup development environment (e.g. installing the test and docs dependencies).
  • issue (domain.py): list sub-dependencies (e.g., requests)
  • issue (domain.py): specify exception expected in re_case.

Note: several items not on the list have been or are being resolved.

@jbousquin
Copy link
Collaborator Author

jbousquin commented Apr 2, 2024

These may not get resolved for future features adds:

nitpick 1: TADA_DATA_URL and the URL for domain tables are only used by one function each, but the plan is for additional resources on both to be able to be used by future feature adds. For example new crosswalks with the upcoming transition from WQX2.0 -> 3.0

discussion 1: these were moved around to avoid cyclic imports

todo 2 (basis.py): These additional basis columns haven't received much attention yet. It was coded this was to make it easy to come back to and write additional specific handling. combined weight/time, left particuleSize which has been notes specific to it's handling.

Notes:
Suggestion 1: probably, not sure where to implement yet, not a straightforward groupby
Suggestion 2: check other modules
Suggestion 3: made these dicts, but need to move docs to module level. domains.out_col_lookup(), domains.xy_datum(), domains.stations_rename(), domains.accepted_methods(), basis.basis_conversion(), basis.stp_dict()
Suggestion 4: conductivity_to_PSU has references, references for methods/model, if for code/checks it goes in source code.
Suggestion 5: numpy.where is great, but in the provided example existing columns values need to be left alone so this will require a if/else. numpy.where will coerce the other values (y) to the dtype which is problematic for nan. There are fixes, but simple is better than complex.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant