WIP - 230 create a draft canopy class #231

AmyOctoCat · 2024-05-09T15:19:10Z

Description

An initial implementation for a canopy class that calculates the canopy layer heights for a given community. See community_sketch.py for an example of the code in use.

Still to do:

light extinction profile
leaf area within each canopy layer -> in progress
would like to change the names of mass_fol and mass_swd for better readability.

A rough outline of how the canopy layers are calculated:

Data is imported (Cohort and Community classes can be instantiated in code or imported from json files using the deserialisation package), in the form of a json array of ImportedCommunity objects, and a json array of PlantFunctionalType objects.
The deserialiser then converts these into Community objects. The Community class is effectively a container for a list of cohorts, a cell_id and cell_area.
When the Cohorts are initialised as part of the deserialisation, the T Model Geometry and Canopy Factors from Jaideep's extension to the T Model are calculated.
A Canopy class can then be created from a Community object and a canopy gap fraction.
This has a calculate_layer_heights method, which is called during initialisation of the Canopy class.

Notes:

I have used json_dataclasses for the import of data from json which provides automatic validation against a marshmallow schema. This makes it so that we can provide a json schema in the documentation for users to see what input they need to provide. All of this logic is in the deserialisation module.

Fixes # (issue)

Type of change

New feature (non-breaking change which adds functionality)
Optimization (back-end change that speeds up the code)
Bug fix (non-breaking change which fixes an issue)

Key checklist

Make sure you've run the pre-commit checks: $ pre-commit run -a
All tests pass: $ poetry run pytest

Further checks

Code is commented, particularly in hard-to-understand areas
Tests added that prove fix is effective or that feature works

…don/pyrealm into 230-create-a-draft-canopy-class

codecov-commenter · 2024-05-09T15:25:24Z

Codecov Report

Attention: Patch coverage is 53.84615% with 48 lines in your changes are missing coverage. Please review.

❗ No coverage uploaded for pull request base (develop@c53eb92). Click here to learn what that means.

Files	Patch %	Lines
pyrealm/community_sketch.py	53.84%	48 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop     #231   +/-   ##
==========================================
  Coverage           ?   92.85%           
==========================================
  Files              ?       29           
  Lines              ?     1805           
  Branches           ?        0           
==========================================
  Hits               ?     1676           
  Misses             ?      129           
  Partials           ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…ies and cells. The relationship is 1:1 so have updated accordingly

…ility

for more information, see https://pre-commit.ci

…CollegeLondon/pyrealm into 230-create-a-draft-canopy-class

for more information, see https://pre-commit.ci

davidorme

I think this is good but there are some structural things it would be good to look at. A couple of overarching things:

As noted below, I think I've given you a bad steer on the Community structure. I'm not convinced that actual Cohort objects are the way forward - the structure is very clear but it's likely to be less efficient. It is possible though that adding and deleting cohorts is much more efficient with lists of Cohort instances than subsetting and appending arrays?
I think I'd rather have factory methods of classes than the intermediate import classes - it just seems cleaner.

davidorme · 2024-06-06T14:25:32Z

pyrealm/canopy_model/deserialisation/deserialisation.py

+"""Utility classes for deserialising raw JSON data.
+
+Utility classes for deserialising PFT and community data from files containing
+JSON arrays.
+"""


This seems rather indirect? We have the ImportedCo... classes, which are only used as a route to the actual Cohort and Community classes. It seems odd to have a class with a single class method - it's basically a function? This may be because I'm missing something about the further development direction.

I can see this setup does isolate the deserialisation process for testing, but it seems more direct to have a from_json factory method as part of the actual end destination class? I've sketched something for Flora below - as that is a usefully simple test case. I think one of the other things about that is we could then support multiple factory methods (from_toml, from_csv).

davidorme · 2024-06-06T14:34:28Z

pyrealm/canopy_model/model/flora.py

+            raise ValueError(msg)
+
+        for name, pft in zip(pft_names, pfts):
+            self[name] = pft


Alternative factory method - more direct deserialisation.

Suggested change

self[name] = pft

self[name] = pft

@class_method

def from_json(cls, path: Path) -> Flora:

"""Factory method to create a Flora instance from JSON."""

with open(path) as file:

pfts_json = json.load(file)

pfts_from_json = PlantFunctionalType.schema().load(pfts_json, many=True) # type: ignore[attr-defined] # noqa: E501

return cls(pfts=pfts_from_json)

davidorme · 2024-06-06T14:50:13Z

pyrealm_build_data/community/pfts.json

I can really see the value in the dataclasses_json package - hadn't come across it before! The whole instance.schema().load() validation is really neat.

I don't much like JSON as an human readable format. These are files that are going to be prepared by hand and read by users, and the JSON format makes them more fiddly to work with. My hunch is that the PFTs in TOML and community in CSV is going to be the cleanest way for typical users.

In the short term, the input format isn't the main issue. I suspect also it would be worth internally converting TOML/CSV to JSON and then feeding it through the JSON validation? So we have an underlying _from_json_string method and then from_JSON, from_CSV and from_TOML are all just ways of reading the data and passing through to _from_json_string. Or maybe there is a from dictionary option to make it a little more direct?

OK - yes: instance.schema().load() accepts dictionaries, so we could have a from_dict class method and then feed a wide range of formats into it.

davidorme · 2024-06-06T15:21:05Z

pyrealm/canopy_model/model/cohort.py

+class Cohort:
+    """Contains cohort data.
+
+    Stores cohort data. During initialisation, retrieves the detailed plant functional
+    type data, calculates the T Model Geometry of a given cohort and the canopy factors
+    from Jaideep's extension to the T Model.
+    """


I've a nasty feeling I gave a misleading answer on OO versus pandas here. Within a single Community, I think essentially of the calculations are going to be applicable across arrays of cohort data (as in the calculations of values in the canopy.md example). That should be much more efficient than an OO representation of individual cohorts (I'm asserting that, but I strongly suspect it's true).

I don't want to use pandas specifically here because I don't think we have any need for anything other than community attributes as np.arrays of cohort values. We don't need Series or DataFrame or any pandas methods, we can just use numpy. Added onto that, we can document attributes much more cleanly than we could just some big internal DataFrame.

Just reviewing the PRs and we did discuss this on the meta-issue for demography (#227 (comment)), without really coming to any sort of resolution.

davidorme · 2024-06-06T15:24:51Z

pyrealm/canopy_model/model/jaideep_t_model_extension.py

+class CanopyFactors:
+    """Canopy factors calculated from Jaideep's extension to the T Model."""
+
+    q_m: float


I think this should be in PlantFunctionalType - see above.

davidorme · 2024-06-06T15:28:27Z

pyrealm/canopy_model/model/flora.py

+    m: float
+    n: float
+


We can calculate qm here:

Suggested change

m: float

n: float

m: float

n: float

qm: float = field(init=False)

def __post_init__(self):

m = self.m

n = self.n

self.qm = (

m

* n

* ((n - 1) / (m * n - 1)) ** (1 - 1 / n)

* (((m - 1) * n) / (m * n - 1)) ** (m - 1)

)

davidorme · 2024-06-06T19:37:12Z

pyrealm/canopy_model/model/community_alt.py

+    def get_trait_array(
+        self, trait_name: str, cohort_pfts: NDArray[np.str], flora: Flora
+    ) -> NDArray:
+        """Extract array of values across cohorts for a given trait.


Actually - this is a static method isn't it. It could be a standalone function (easier to use in other use cases), or we could make cohort_pfts and flora into attributes and simplify the signature.

davidorme · 2024-06-06T19:38:41Z

pyrealm/canopy_model/model/community_alt.py

+        # Arrays of required PFT traits for cohorts
+        self.traits = CommunityTraits(self.cohort_pfts, self.flora)
+
+        # Calculate geometries from initial dbh
+        self.tmodel_geometries = TModelGeometries(self.cohort_dbh, self.traits)


We could just have trait arrays and cohort geometries attributes just as part of Community, but it would be a long list and it seems cleaner and more easily tested if we encapsulate them.

Thanks David, it's really good to see an illustration of what you meant. I'm a bit concerned that when adding and removing cohorts we'd have to be very careful to ensure that we removed the relevant entries from every set of traits, ie all the t model geometries, all the community traits, etc., since if we missed one, we'd end up with the data misaligned, which might make it a bit fragile. If we later added new attributes to a cohort, then we'd also need to be sure we were adding them into whatever function we use to remove/add them. The best case would be the arrays being different lengths and ending up with a runtime failure, but I'd also be worried about the program silently just attaching the wrong geometry to the wrong cohort etc (for example if we removed one cohort from one data set and a different one from another).

That is a clear advantage of the OO Cohort model - and also of using some kind of record array. Having said that, we could also write a fairly defensive remove_cohort(index) method to these classes.

AmyOctoCat · 2024-06-07T16:18:06Z

I think this is good but there are some structural things it would be good to look at. A couple of overarching things:

1. As noted below, I think I've given you a bad steer on the Community structure. I'm not convinced that actual `Cohort` objects are the way forward - the structure is very clear but it's likely to be less efficient. It is possible though that adding and deleting cohorts is much more efficient with lists of `Cohort` instances than subsetting and appending arrays?

2. I think I'd rather have factory methods of classes than the intermediate import classes - it just seems cleaner.

Hi David,

Thank you very much for the feedback, these were things I was going round in circles a bit myself, so it's great to have the input.

Regarding the first point I think I've circled round on the arrays vs objects quite a few times over and still don't have a strong feeling. I think going for the OO approach might make it easier to read and easier to reason about when it comes to adding and removing cohorts, but I can see the efficiency argument for using arrays.

Regarding the second point I also went in and out on this. JSON dataclasses did not play very nicely with passing the flora object into the nested cohorts within the community which was why I ended up with the separate import classes (I thought this was also a bit clearer from the point of view of easily seeing which fields were needed in the input data). Once there were separate import classes, the logic looked quite complicated and I thought it deserved it's own class (was going for the single responsibility principle from SOLID), but will have a review to see if there's a better way. Maybe adding extra functionality to the generated from_json method that json_dataclass creates in the cohort class is possible that would allow us to pass the flora in there.

Sorry that's all very inconclusive! Hopefully some more ideas might percolate over the space of the weekend.

davidorme and others added 5 commits April 26, 2024 15:00

Adding demography fragments - data, docs, code sketch

3f61f24

Fixing doc build

2666f40

Borked kernelspec name in canopy.md

5281f37

Minor fix to canopy markdown doc

1914668

Merge branch 'demography_components' of github.com:ImperialCollegeLon…

64f6b04

…don/pyrealm into 230-create-a-draft-canopy-class

AmyOctoCat linked an issue May 9, 2024 that may be closed by this pull request

Create a draft Canopy class #230

Open

AmyOctoCat and others added 14 commits May 9, 2024 17:53

partial very sketchy implementation for canopy model

b1467c2

simplify the CohortGeometry class not to contain nested cohort

3074aae

neaten up the CommunityGeometry type to use type aliasing

164fc0d

realised had conceptually misunderstood relationship between communit…

0bfcbf6

…ies and cells. The relationship is 1:1 so have updated accordingly

renaming mass_stm to mass_stem, making pronouncable to improve readab…

0f71f17

…ility

partial implementation for crown radius profile calculation

c1fba53

partial implementation for crown radius profile calculation

07b5a8f

partial implementation for crown area calculation

7370966

continuing implementation of canopy model

6496f07

continuing implementation of canopy model

2150da5

[pre-commit.ci] auto fixes from pre-commit.com hooks

63da57a

for more information, see https://pre-commit.ci

continuing implementation of canopy model

6125f4c

pull serialisation and deserialisation into separate file

0f21fef

[pre-commit.ci] auto fixes from pre-commit.com hooks

702e0e6

for more information, see https://pre-commit.ci

MarionBWeinzierl marked this pull request as draft May 23, 2024 08:40

AmyOctoCat added 8 commits May 23, 2024 14:02

fix bugs

b456b26

fix bugs

59790fe

debugging why the results are different to the pandas version

1bc845e

fix input data differences

98d28d3

fix input data

b675ac9

debugging diference in canopy layer heights

a7e9457

fix input data, more debugging

497020c

black changes

0bd8526

AmyOctoCat added 4 commits May 24, 2024 10:54

refactor data structures

2c78ae7

pull total community area calculation out into separate function

9a85d45

remove debugging statements

0f09bc5

pull classes out into separate modules

a466eb9

MarionBWeinzierl assigned AmyOctoCat May 30, 2024

AmyOctoCat and others added 13 commits June 4, 2024 17:17

refactor class structure

819d5d4

refactor class structure

7de3ec5

more class structure refactoring

f20d79b

more class structure refactoring

61d3a55

more class structure refactoring

91f8c0e

fix imports

90b8070

update docstring

61d5e1e

more consistant naming in t model class

18c4003

remove magic numbers

a3e5524

add calculation of total Acp for canopy layers

463ac11

Sketch of array-based community

268935b

Merge branch '230-create-a-draft-canopy-class' of github.com:Imperial…

44aea7f

…CollegeLondon/pyrealm into 230-create-a-draft-canopy-class

[pre-commit.ci] auto fixes from pre-commit.com hooks

480088d

for more information, see https://pre-commit.ci

davidorme requested changes Jun 6, 2024

View reviewed changes

davidorme reviewed Jun 6, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP - 230 create a draft canopy class #231

WIP - 230 create a draft canopy class #231

AmyOctoCat commented May 9, 2024 •

edited

codecov-commenter commented May 9, 2024 •

edited

davidorme left a comment

davidorme Jun 6, 2024

davidorme Jun 6, 2024

davidorme Jun 6, 2024

davidorme Jun 6, 2024

davidorme Jun 6, 2024

davidorme Jun 6, 2024

davidorme Jun 6, 2024

davidorme Jun 6, 2024

davidorme Jun 6, 2024

davidorme Jun 6, 2024

AmyOctoCat Jun 7, 2024

davidorme Jun 7, 2024

AmyOctoCat commented Jun 7, 2024

-            self[name] = pft
+            self[name] = pft
+       @class_method
+       def from_json(cls, path: Path) -> Flora:
+           """Factory method to create a Flora instance from JSON."""
+           with open(path) as file:
+               pfts_json = json.load(file)
+           pfts_from_json = PlantFunctionalType.schema().load(pfts_json, many=True)  # type: ignore[attr-defined] # noqa: E501
+           return cls(pfts=pfts_from_json)

-    m: float
-    n: float
+    m: float
+    n: float
+    qm: float = field(init=False)
+    def __post_init__(self):
+        m = self.m
+        n  = self.n
+        self.qm = (
+            m
+            * n
+            * ((n - 1) / (m * n - 1)) ** (1 - 1 / n)
+            * (((m - 1) * n) / (m * n - 1)) ** (m - 1)
+        )

WIP - 230 create a draft canopy class #231

Are you sure you want to change the base?

WIP - 230 create a draft canopy class #231

Conversation

AmyOctoCat commented May 9, 2024 • edited

Description

Type of change

Key checklist

Further checks

codecov-commenter commented May 9, 2024 • edited

Codecov Report

davidorme left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AmyOctoCat commented Jun 7, 2024

AmyOctoCat commented May 9, 2024 •

edited

codecov-commenter commented May 9, 2024 •

edited