-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP - 230 create a draft canopy class #231
base: develop
Are you sure you want to change the base?
Conversation
…don/pyrealm into 230-create-a-draft-canopy-class
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #231 +/- ##
==========================================
Coverage ? 92.85%
==========================================
Files ? 29
Lines ? 1805
Branches ? 0
==========================================
Hits ? 1676
Misses ? 129
Partials ? 0 ☔ View full report in Codecov by Sentry. |
…ies and cells. The relationship is 1:1 so have updated accordingly
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
…CollegeLondon/pyrealm into 230-create-a-draft-canopy-class
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good but there are some structural things it would be good to look at. A couple of overarching things:
-
As noted below, I think I've given you a bad steer on the Community structure. I'm not convinced that actual
Cohort
objects are the way forward - the structure is very clear but it's likely to be less efficient. It is possible though that adding and deleting cohorts is much more efficient with lists ofCohort
instances than subsetting and appending arrays? -
I think I'd rather have factory methods of classes than the intermediate import classes - it just seems cleaner.
"""Utility classes for deserialising raw JSON data. | ||
|
||
Utility classes for deserialising PFT and community data from files containing | ||
JSON arrays. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems rather indirect? We have the ImportedCo...
classes, which are only used as a route to the actual Cohort
and Community
classes. It seems odd to have a class with a single class method - it's basically a function? This may be because I'm missing something about the further development direction.
I can see this setup does isolate the deserialisation process for testing, but it seems more direct to have a from_json
factory method as part of the actual end destination class? I've sketched something for Flora
below - as that is a usefully simple test case. I think one of the other things about that is we could then support multiple factory methods (from_toml
, from_csv
).
raise ValueError(msg) | ||
|
||
for name, pft in zip(pft_names, pfts): | ||
self[name] = pft |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternative factory method - more direct deserialisation.
self[name] = pft | |
self[name] = pft | |
@class_method | |
def from_json(cls, path: Path) -> Flora: | |
"""Factory method to create a Flora instance from JSON.""" | |
with open(path) as file: | |
pfts_json = json.load(file) | |
pfts_from_json = PlantFunctionalType.schema().load(pfts_json, many=True) # type: ignore[attr-defined] # noqa: E501 | |
return cls(pfts=pfts_from_json) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can really see the value in the dataclasses_json
package - hadn't come across it before! The whole instance.schema().load()
validation is really neat.
I don't much like JSON as an human readable format. These are files that are going to be prepared by hand and read by users, and the JSON format makes them more fiddly to work with. My hunch is that the PFTs in TOML and community in CSV is going to be the cleanest way for typical users.
In the short term, the input format isn't the main issue. I suspect also it would be worth internally converting TOML/CSV to JSON and then feeding it through the JSON validation? So we have an underlying _from_json_string
method and then from_JSON
, from_CSV
and from_TOML
are all just ways of reading the data and passing through to _from_json_string
. Or maybe there is a from dictionary option to make it a little more direct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - yes: instance.schema().load()
accepts dictionaries, so we could have a from_dict
class method and then feed a wide range of formats into it.
class Cohort: | ||
"""Contains cohort data. | ||
|
||
Stores cohort data. During initialisation, retrieves the detailed plant functional | ||
type data, calculates the T Model Geometry of a given cohort and the canopy factors | ||
from Jaideep's extension to the T Model. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've a nasty feeling I gave a misleading answer on OO versus pandas
here. Within a single Community, I think essentially of the calculations are going to be applicable across arrays of cohort data (as in the calculations of values in the canopy.md
example). That should be much more efficient than an OO representation of individual cohorts (I'm asserting that, but I strongly suspect it's true).
I don't want to use pandas
specifically here because I don't think we have any need for anything other than community attributes as np.arrays
of cohort values. We don't need Series
or DataFrame
or any pandas
methods, we can just use numpy
. Added onto that, we can document attributes much more cleanly than we could just some big internal DataFrame
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just reviewing the PRs and we did discuss this on the meta-issue for demography (#227 (comment)), without really coming to any sort of resolution.
class CanopyFactors: | ||
"""Canopy factors calculated from Jaideep's extension to the T Model.""" | ||
|
||
q_m: float |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be in PlantFunctionalType
- see above.
m: float | ||
n: float | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can calculate qm
here:
m: float | |
n: float | |
m: float | |
n: float | |
qm: float = field(init=False) | |
def __post_init__(self): | |
m = self.m | |
n = self.n | |
self.qm = ( | |
m | |
* n | |
* ((n - 1) / (m * n - 1)) ** (1 - 1 / n) | |
* (((m - 1) * n) / (m * n - 1)) ** (m - 1) | |
) | |
def get_trait_array( | ||
self, trait_name: str, cohort_pfts: NDArray[np.str], flora: Flora | ||
) -> NDArray: | ||
"""Extract array of values across cohorts for a given trait. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually - this is a static method isn't it. It could be a standalone function (easier to use in other use cases), or we could make cohort_pfts
and flora
into attributes and simplify the signature.
# Arrays of required PFT traits for cohorts | ||
self.traits = CommunityTraits(self.cohort_pfts, self.flora) | ||
|
||
# Calculate geometries from initial dbh | ||
self.tmodel_geometries = TModelGeometries(self.cohort_dbh, self.traits) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could just have trait arrays and cohort geometries attributes just as part of Community, but it would be a long list and it seems cleaner and more easily tested if we encapsulate them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks David, it's really good to see an illustration of what you meant. I'm a bit concerned that when adding and removing cohorts we'd have to be very careful to ensure that we removed the relevant entries from every set of traits, ie all the t model geometries, all the community traits, etc., since if we missed one, we'd end up with the data misaligned, which might make it a bit fragile. If we later added new attributes to a cohort, then we'd also need to be sure we were adding them into whatever function we use to remove/add them. The best case would be the arrays being different lengths and ending up with a runtime failure, but I'd also be worried about the program silently just attaching the wrong geometry to the wrong cohort etc (for example if we removed one cohort from one data set and a different one from another).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a clear advantage of the OO Cohort
model - and also of using some kind of record array. Having said that, we could also write a fairly defensive remove_cohort(index)
method to these classes.
Hi David, Thank you very much for the feedback, these were things I was going round in circles a bit myself, so it's great to have the input. Regarding the first point I think I've circled round on the arrays vs objects quite a few times over and still don't have a strong feeling. I think going for the OO approach might make it easier to read and easier to reason about when it comes to adding and removing cohorts, but I can see the efficiency argument for using arrays. Regarding the second point I also went in and out on this. JSON dataclasses did not play very nicely with passing the flora object into the nested cohorts within the community which was why I ended up with the separate import classes (I thought this was also a bit clearer from the point of view of easily seeing which fields were needed in the input data). Once there were separate import classes, the logic looked quite complicated and I thought it deserved it's own class (was going for the single responsibility principle from SOLID), but will have a review to see if there's a better way. Maybe adding extra functionality to the generated from_json method that json_dataclass creates in the cohort class is possible that would allow us to pass the flora in there. Sorry that's all very inconclusive! Hopefully some more ideas might percolate over the space of the weekend. |
Description
An initial implementation for a canopy class that calculates the canopy layer heights for a given community. See
community_sketch.py
for an example of the code in use.Still to do:
mass_fol
andmass_swd
for better readability.A rough outline of how the canopy layers are calculated:
Notes:
Fixes # (issue)
Type of change
Key checklist
pre-commit
checks:$ pre-commit run -a
$ poetry run pytest
Further checks