Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP - 230 create a draft canopy class #231

Draft
wants to merge 44 commits into
base: develop
Choose a base branch
from

Conversation

AmyOctoCat
Copy link
Collaborator

@AmyOctoCat AmyOctoCat commented May 9, 2024

Description

An initial implementation for a canopy class that calculates the canopy layer heights for a given community. See community_sketch.py for an example of the code in use.

Still to do:

  • light extinction profile
  • leaf area within each canopy layer -> in progress
  • would like to change the names of mass_fol and mass_swd for better readability.

A rough outline of how the canopy layers are calculated:

  1. Data is imported (Cohort and Community classes can be instantiated in code or imported from json files using the deserialisation package), in the form of a json array of ImportedCommunity objects, and a json array of PlantFunctionalType objects.
  2. The deserialiser then converts these into Community objects. The Community class is effectively a container for a list of cohorts, a cell_id and cell_area.
  3. When the Cohorts are initialised as part of the deserialisation, the T Model Geometry and Canopy Factors from Jaideep's extension to the T Model are calculated.
  4. A Canopy class can then be created from a Community object and a canopy gap fraction.
  5. This has a calculate_layer_heights method, which is called during initialisation of the Canopy class.

Notes:

  • I have used json_dataclasses for the import of data from json which provides automatic validation against a marshmallow schema. This makes it so that we can provide a json schema in the documentation for users to see what input they need to provide. All of this logic is in the deserialisation module.

Fixes # (issue)

Type of change

  • New feature (non-breaking change which adds functionality)
  • Optimization (back-end change that speeds up the code)
  • Bug fix (non-breaking change which fixes an issue)

Key checklist

  • Make sure you've run the pre-commit checks: $ pre-commit run -a
  • All tests pass: $ poetry run pytest

Further checks

  • Code is commented, particularly in hard-to-understand areas
  • Tests added that prove fix is effective or that feature works

@AmyOctoCat AmyOctoCat linked an issue May 9, 2024 that may be closed by this pull request
@codecov-commenter
Copy link

codecov-commenter commented May 9, 2024

Codecov Report

Attention: Patch coverage is 53.84615% with 48 lines in your changes are missing coverage. Please review.

❗ No coverage uploaded for pull request base (develop@c53eb92). Click here to learn what that means.

Files Patch % Lines
pyrealm/community_sketch.py 53.84% 48 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop     #231   +/-   ##
==========================================
  Coverage           ?   92.85%           
==========================================
  Files              ?       29           
  Lines              ?     1805           
  Branches           ?        0           
==========================================
  Hits               ?     1676           
  Misses             ?      129           
  Partials           ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@MarionBWeinzierl MarionBWeinzierl marked this pull request as draft May 23, 2024 08:40
Copy link
Collaborator

@davidorme davidorme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good but there are some structural things it would be good to look at. A couple of overarching things:

  1. As noted below, I think I've given you a bad steer on the Community structure. I'm not convinced that actual Cohort objects are the way forward - the structure is very clear but it's likely to be less efficient. It is possible though that adding and deleting cohorts is much more efficient with lists of Cohort instances than subsetting and appending arrays?

  2. I think I'd rather have factory methods of classes than the intermediate import classes - it just seems cleaner.

Comment on lines +1 to +5
"""Utility classes for deserialising raw JSON data.

Utility classes for deserialising PFT and community data from files containing
JSON arrays.
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems rather indirect? We have the ImportedCo... classes, which are only used as a route to the actual Cohort and Community classes. It seems odd to have a class with a single class method - it's basically a function? This may be because I'm missing something about the further development direction.

I can see this setup does isolate the deserialisation process for testing, but it seems more direct to have a from_json factory method as part of the actual end destination class? I've sketched something for Flora below - as that is a usefully simple test case. I think one of the other things about that is we could then support multiple factory methods (from_toml, from_csv).

raise ValueError(msg)

for name, pft in zip(pft_names, pfts):
self[name] = pft
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative factory method - more direct deserialisation.

Suggested change
self[name] = pft
self[name] = pft
@class_method
def from_json(cls, path: Path) -> Flora:
"""Factory method to create a Flora instance from JSON."""
with open(path) as file:
pfts_json = json.load(file)
pfts_from_json = PlantFunctionalType.schema().load(pfts_json, many=True) # type: ignore[attr-defined] # noqa: E501
return cls(pfts=pfts_from_json)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can really see the value in the dataclasses_json package - hadn't come across it before! The whole instance.schema().load() validation is really neat.

I don't much like JSON as an human readable format. These are files that are going to be prepared by hand and read by users, and the JSON format makes them more fiddly to work with. My hunch is that the PFTs in TOML and community in CSV is going to be the cleanest way for typical users.

In the short term, the input format isn't the main issue. I suspect also it would be worth internally converting TOML/CSV to JSON and then feeding it through the JSON validation? So we have an underlying _from_json_string method and then from_JSON, from_CSV and from_TOML are all just ways of reading the data and passing through to _from_json_string. Or maybe there is a from dictionary option to make it a little more direct?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - yes: instance.schema().load() accepts dictionaries, so we could have a from_dict class method and then feed a wide range of formats into it.

Comment on lines +14 to +20
class Cohort:
"""Contains cohort data.

Stores cohort data. During initialisation, retrieves the detailed plant functional
type data, calculates the T Model Geometry of a given cohort and the canopy factors
from Jaideep's extension to the T Model.
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've a nasty feeling I gave a misleading answer on OO versus pandas here. Within a single Community, I think essentially of the calculations are going to be applicable across arrays of cohort data (as in the calculations of values in the canopy.md example). That should be much more efficient than an OO representation of individual cohorts (I'm asserting that, but I strongly suspect it's true).

I don't want to use pandas specifically here because I don't think we have any need for anything other than community attributes as np.arrays of cohort values. We don't need Series or DataFrame or any pandas methods, we can just use numpy. Added onto that, we can document attributes much more cleanly than we could just some big internal DataFrame.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just reviewing the PRs and we did discuss this on the meta-issue for demography (#227 (comment)), without really coming to any sort of resolution.

class CanopyFactors:
"""Canopy factors calculated from Jaideep's extension to the T Model."""

q_m: float
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be in PlantFunctionalType - see above.

Comment on lines +28 to +30
m: float
n: float

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can calculate qm here:

Suggested change
m: float
n: float
m: float
n: float
qm: float = field(init=False)
def __post_init__(self):
m = self.m
n = self.n
self.qm = (
m
* n
* ((n - 1) / (m * n - 1)) ** (1 - 1 / n)
* (((m - 1) * n) / (m * n - 1)) ** (m - 1)
)

Comment on lines +63 to +66
def get_trait_array(
self, trait_name: str, cohort_pfts: NDArray[np.str], flora: Flora
) -> NDArray:
"""Extract array of values across cohorts for a given trait.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually - this is a static method isn't it. It could be a standalone function (easier to use in other use cases), or we could make cohort_pfts and flora into attributes and simplify the signature.

Comment on lines +29 to +33
# Arrays of required PFT traits for cohorts
self.traits = CommunityTraits(self.cohort_pfts, self.flora)

# Calculate geometries from initial dbh
self.tmodel_geometries = TModelGeometries(self.cohort_dbh, self.traits)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could just have trait arrays and cohort geometries attributes just as part of Community, but it would be a long list and it seems cleaner and more easily tested if we encapsulate them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks David, it's really good to see an illustration of what you meant. I'm a bit concerned that when adding and removing cohorts we'd have to be very careful to ensure that we removed the relevant entries from every set of traits, ie all the t model geometries, all the community traits, etc., since if we missed one, we'd end up with the data misaligned, which might make it a bit fragile. If we later added new attributes to a cohort, then we'd also need to be sure we were adding them into whatever function we use to remove/add them. The best case would be the arrays being different lengths and ending up with a runtime failure, but I'd also be worried about the program silently just attaching the wrong geometry to the wrong cohort etc (for example if we removed one cohort from one data set and a different one from another).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a clear advantage of the OO Cohort model - and also of using some kind of record array. Having said that, we could also write a fairly defensive remove_cohort(index) method to these classes.

@AmyOctoCat
Copy link
Collaborator Author

I think this is good but there are some structural things it would be good to look at. A couple of overarching things:

1. As noted below, I think I've given you a bad steer on the Community structure. I'm not convinced that actual `Cohort` objects are the way forward - the structure is very clear but it's likely to be less efficient. It is possible though that adding and deleting cohorts is much more efficient with lists of `Cohort` instances than subsetting and appending arrays?

2. I think I'd rather have factory methods of classes than the intermediate import classes - it just seems cleaner.

Hi David,

Thank you very much for the feedback, these were things I was going round in circles a bit myself, so it's great to have the input.

Regarding the first point I think I've circled round on the arrays vs objects quite a few times over and still don't have a strong feeling. I think going for the OO approach might make it easier to read and easier to reason about when it comes to adding and removing cohorts, but I can see the efficiency argument for using arrays.

Regarding the second point I also went in and out on this. JSON dataclasses did not play very nicely with passing the flora object into the nested cohorts within the community which was why I ended up with the separate import classes (I thought this was also a bit clearer from the point of view of easily seeing which fields were needed in the input data). Once there were separate import classes, the logic looked quite complicated and I thought it deserved it's own class (was going for the single responsibility principle from SOLID), but will have a review to see if there's a better way. Maybe adding extra functionality to the generated from_json method that json_dataclass creates in the cohort class is possible that would allow us to pass the flora in there.

Sorry that's all very inconclusive! Hopefully some more ideas might percolate over the space of the weekend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

Create a draft Canopy class
3 participants