Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No 'variable' in station '3.09' and min_depth/max_depth don't work #51

Open
xushanthu-2014 opened this issue Sep 11, 2022 · 14 comments
Open

Comments

@xushanthu-2014
Copy link

xushanthu-2014 commented Sep 11, 2022

I am trying to extract data from station 3.09, about the variable 'soil_moisture' from depth 0.01 to 0.04. By default I should write command 1 like this:

min_depth,max_depth=0.01, 0.04
ids = ismn_data.get_dataset_ids(variable='soil_moisture',
                                        min_depth=min_depth,
                                        max_depth=max_depth,
                                        filter_meta_dict={'station': '3.09',
                                                          'lc_2000': [10,11,12,20,60,130],
                                                          'lc_2005': [10,11,12,20,60,130],
                                                          'lc_2010': [10,11,12,20,60,130],})

But this yields no element in ids. So I tried to print the metadata for 3.09 by ismn_data.read(1098, return_meta=True), and found there is indeed soil moisture in the metadata, but no values in variable:

ismn_data.read(1098, return_meta=True)
Out[117]: 
(                     soil_moisture soil_moisture_flag soil_moisture_orig_flag
 date_time                                                                    
 2017-01-01 00:00:00          0.192                  G                       M
                           ...                ...                     ...
 2019-02-22 09:00:00          0.155                  G                       M
 
 [12745 rows x 3 columns],
 variable        key       
 clay_fraction   val                           5.2
                 depth_from                    0.0
                 depth_to                     0.05
 climate_KG      val                           Dfb
 climate_insitu  val                       unknown
 elevation       val                         104.0
 instrument      val                 Decagon-5TE-B
                 depth_from                    0.0
                 depth_to                     0.05
 latitude        val                       55.8609
 lc_2000         val                            10
 lc_2005         val                            10
 lc_2010         val                            10
 lc_insitu       val                          None
 longitude       val                        9.2945
 network         val                          HOBE
 organic_carbon  val                           0.5
                 depth_from                    0.0
                 depth_to                      0.3
 sand_fraction   val                          85.1
                 depth_from                    0.0
                 depth_to                     0.05
 saturation      val                          0.41
                 depth_from                    0.0
                 depth_to                      0.3
 silt_fraction   val                           5.7
                 depth_from                    0.0
                 depth_to                     0.05
 station         val                          3.09
 timerange_from  val           2017-01-01 00:00:00
 timerange_to    val           2019-02-22 09:00:00
 variable        val                 soil_moisture
                 depth_from                    0.0
                 depth_to                     0.05
 Name: data, dtype: object)

You can see, right above clay_fraction, there is no value of key variable. So I have to use command 2

ids = ismn_data.get_dataset_ids(variable=None,min_depth=min_depth,
                                        max_depth=max_depth,
                                        filter_meta_dict={'station': '3.09',
                                                          'variable'='soil_moisture',
                                                          'lc_2000': [10,11,12,20,60,130],
                                                          'lc_2005': [10,11,12,20,60,130],
                                                          'lc_2010': [10,11,12,20,60,130],})

but still get nothing in ids. I found that's because I set min_depth and max_depth. If I delete min_depth and max_depth in command 2, I can get ids as [1098, 1104]. But I do want to extract values between 0.01 and 0.04. So is there anything wrong in the data on 3.09? And I am confused what's the difference bewteen command 1 and command 2?

@xushanthu-2014
Copy link
Author

is there anybody who can help me?

@wpreimes
Copy link
Member

Hi, sorry for the late reply. The problem is that at station 3.09, soil moisture sensors are operating between 0 and 5 cm, while your query looks for sensors between 1 and 4 cm (which do not exist).

@wpreimes
Copy link
Member

wpreimes commented Sep 13, 2022

you can see it by selecting the station and by listing all sensor names (the numbers in the name refer to the depths in meters). Please note that your dataset might look differently, I am using an older snapshot of ISMN here.

>> ismn_data['HOBE']['3.09']

Out[22]: Sensors at '3.09': ['Decagon-5TE-A_soil_moisture_0.000000_0.050000', 'Decagon-5TE-B_soil_moisture_0.000000_0.050000', 'Decagon-5TE-A_soil_moisture_0.200000_0.250000', 'Decagon-5TE-B_soil_moisture_0.200000_0.250000', 'Decagon-5TE_soil_moisture_0.500000_0.550000', 'Decagon-5TE-A_soil_temperature_0.000000_0.050000', 'Decagon-5TE-B_soil_temperature_0.000000_0.050000', 'Decagon-5TE-A_soil_temperature_0.200000_0.250000', 'Decagon-5TE-B_soil_temperature_0.200000_0.250000', 'Decagon-5TE_soil_temperature_0.500000_0.550000']

@wpreimes
Copy link
Member

wpreimes commented Sep 13, 2022

My suggestion is, to be less restrictive and allow sensors from e.g 0 to 5 cm instead of 1 to 4 cm

@xushanthu-2014
Copy link
Author

xushanthu-2014 commented Sep 13, 2022

Thanks for your reply @wpreimes! But I want to loop over all European stations. So I am not able to print all sensors out, then select the sensor one by one...besides, I am comparing to my model simulations of soil moisture at each layer ([0, 0.01, 0.04, 0.1, 0.2, 0.4, 0.6, 0.8, 1] meters). So I am finding a way to match the depth of ISMN stations to my model layers, at the same lat/lon grid. For example, my model simulations of grid (containing 3.09) from 0.01 to 0.04 m are matched to observations from 0 to 5 cm on station 3.09. And model simulations of grid containing station X from Y_1 to Y_2 depth are matched observations from Z_1 to Z_2 depth on station X, where [Z_1, Z_2] contains [Y_1, Y_2], or [Y_1, Y_2] contains [Z_1, Z_2], as long as the observation 'match' the model layers.
The other problem is that, I saw that all depth configurations are different across all European ISMN stations. For example, at other stations, they might have depths like 0 to 8 cm....so I cannot just write a loop to run codes of 3.09 to other stations...so is there any way to solve my problems? Thanks!

@wpreimes
Copy link
Member

wpreimes commented Sep 13, 2022

Printing the names was only meant as an example to explain the problem for that specific station.
Matching the different layers between model and insitu data is not straight forward as you noticed. Some tradeoffs will be necessary, especially for sensors that cover a wide range of depths.

Here are some suggestions:

  • Use the starting depth of a sensor only to assign a sensor to the layer it starts measuring in (get_dataset_ids has a keyword argument for that called check_only_sensor_depth_from), but you might want to manually exclude sensors that cover a wide range of depths afterwards (when looping over the extracted sensors, check e.g. the difference between the depth_to and depth_from metadata attribute)
  • Use sensors multiple times. If a sensor measures between 0 and 5 cm I think it is fair to use it in the comparison for the first 3 layers.
  • design your own solution to match the model and insitu layers, to e.g. only compare the "best matching" layer (e.g. the layer with the largest overlap). In that case you could extract all ids for soil_moisture without depth restrictions, loop over them to read the data and use the available metadata / depth information to apply your own code to assign them to your model layer (e.g. for each model layer check whether it is in the range of the ismn sensor, and if it is, use it for that layer). This function might also help https://github.com/TUW-GEO/ismn/blob/master/src/ismn/meta.py#L144

@wpreimes
Copy link
Member

Also, I'm not sure if there are even any sensors that measure SM e.g. between 1 and 4 cm depth in ISMN at all. Just to strengthen my point about making some compromises in your approach. @daberer might know that.

@xushanthu-2014
Copy link
Author

Thanks! @wpreimes, let me try your suggestions first

@daberer
Copy link
Collaborator

daberer commented Sep 13, 2022

Hi, I think for the majority of soil moisture sensors at ISMN the sensor orientation is horizontal (depth_from = depth_to). I checked there are 271 soil moisture sensors within 1 - 4cm bracket if the margin-values (1 and 4cm) are included, mostly from the networks HiWATER_EHWSN and SMN-SDR. Often networks have a similar composition for all locations (same sensors in the same depths), but overall the depths are quite diverse as you noticed.

@xushanthu-2014
Copy link
Author

Hi, sorry for the late reply. The problem is that at station 3.09, soil moisture sensors are operating between 0 and 5 cm, while your query looks for sensors between 1 and 4 cm (which do not exist).

Hi @wpreimes thanks for your comment, but there exists another problem. If I tried with:

ids = ismn_data.get_dataset_ids(variable='soil_moisture',
                                        filter_meta_dict={'station': '3.09',
                                                          'lc_2000': [10,11,12,20,60,130],
                                                          'lc_2005': [10,11,12,20,60,130],
                                                          'lc_2010': [10,11,12,20,60,130],})

I can get nothing. That's because in the metadata of 3.09, the value of key variable is None. So I have to use:

ids = ismn_data.get_dataset_ids(variable=None,
                                        filter_meta_dict={'station': '3.09',
                                                          'variable'='soil_moisture',
                                                          'lc_2000': [10,11,12,20,60,130],
                                                          'lc_2005': [10,11,12,20,60,130],
                                                          'lc_2010': [10,11,12,20,60,130],})

I have to write 'variable'='soil_moisture', in the filter_meta_dict. Is that normal? because I can use the first command for other stations, except for 3.09. So does it mean there is a bug in metadata of 3.09? And is it ok to use the second one for other stations? For other details please refer to the description of the issue at the top of this page. Thanks!

@wpreimes
Copy link
Member

wpreimes commented Sep 16, 2022

Hi, I just downloaded the ISMN data for HOBE and tried the 2 function calls you posted and I got the same IDs for both of them.
About the metadata, I don't understand what you mean with "metadata of 3.09, the value of key variable is None.". You posted the metadata table in your initial comment, and there you see the "variable" is "soil moisture" for the selected sensor (the last 4 lines, the first line is only the labels for the data frame)

 variable        key       
 clay_fraction   val                           5.2
                 depth_from                    0.0
                 depth_to                     0.05
 climate_KG      val                           Dfb
 climate_insitu  val                       unknown
 elevation       val                         104.0
 instrument      val                 Decagon-5TE-B
                 depth_from                    0.0
                 depth_to                     0.05
 latitude        val                       55.8609
 lc_2000         val                            10
 lc_2005         val                            10
 lc_2010         val                            10
 lc_insitu       val                          None
 longitude       val                        9.2945
 network         val                          HOBE
 organic_carbon  val                           0.5
                 depth_from                    0.0
                 depth_to                      0.3
 sand_fraction   val                          85.1
                 depth_from                    0.0
                 depth_to                     0.05
 saturation      val                          0.41
                 depth_from                    0.0
                 depth_to                      0.3
 silt_fraction   val                           5.7
                 depth_from                    0.0
                 depth_to                     0.05
 station         val                          3.09
 timerange_from  val           2017-01-01 00:00:00
 timerange_to    val           2019-02-22 09:00:00
 variable        val                 soil_moisture
                 depth_from                    0.0
                 depth_to                     0.05
 Name: data, dtype: object)

and you can access it e.g. via

>> ismn_data.read_metadata(1098)['variable']

key
val           soil_moisture
depth_from              0.0
depth_to               0.05
Name: data, dtype: object

maybe you want to re-generate the python metadata if you feel that something is wrong there (removing or renaming the folder python_metadata in the ISMN data path should lead to re-collecting the metadata the next time you initialize the reader). Make sure you have the latest version of this package installed. In case the data is erroneous you can try and download the HOBE data separately again and replace the files in your collection with the new ones (make sure to re-collect the metadata when you change your local data collection).

@xushanthu-2014
Copy link
Author

Hi @wpreimes, thanks for your reply. By ismn.__version__ I got the version is '1.1.0'. Is it the latests one?
And I found that I can use the variable='soil_moisture' outside the filter_meta_dict. By the way, if I try ismn_data['HOBE']['3.09'], there is Decagon-5TE-B_soil_moisture_0.200000_0.250000' which means soil moisture from 0.2 to 0.25m. But just using ismn_data.read_metadata(1098)['variable'] doesn't show this....

@wpreimes
Copy link
Member

v1.2.0 would be the latest. You can try pip install -U ismn to upgrade.
The commands ISMN_Interface.read_metadata() (and ISMN_Interface.read_ts) read data for certain ID. the ID refers to a specific sensor (as indicated by contents of the metadata). At a station such as HOBE 3.09 there can be multiple sensors. In your case, 1098 is the ID of the soil moisture sensor at this station in 0-5 cm depth, and your command is reading the metadata for that sensor. The sensor in depth 0.2-0.25 is different, and therefore has a different ID.

@xushanthu-2014
Copy link
Author

Thanks for your reply! @wpreimes but when I tried v1.2.0, it seems to be an error when I was reading data Data_separate_files_header_20170101_20211231_9078_Zd6I_20220911 (this is a file I downloaded from ISMN station, containing lat from 36N to 58N and lon from 11.75W to 29.5E.):

ismn_data = ISMN_Interface(data_path)
Files Processed: 100%|██████████| 321/321 [00:00<00:00, 4521.32it/s]Processing metadata for all ismn stations into folder /Users/xushan/research/TUD/ISMN_westEurope/Data_separate_files_header_20170101_20211231_9078_Zd6I_20220911.
This may take a few minutes, but is only done once...
Hint: Use `parallel=True` to speed up metadata generation for large datasets
Metadata generation finished after 0 Seconds.
Metadata and Log stored in /Users/xushan/research/TUD/ISMN_westEurope/Data_separate_files_header_20170101_20211231_9078_Zd6I_20220911/python_metadata

Traceback (most recent call last):

  File "<ipython-input-23-84af3e3a7ed0>", line 1, in <module>
    ismn_data = ISMN_Interface(data_path)

  File "/Users/xushan/opt/anaconda3/lib/python3.7/site-packages/ismn/interface.py", line 135, in __init__
    self.activate_network(network=network, meta_path=meta_path, temp_root=temp_root)

  File "/Users/xushan/opt/anaconda3/lib/python3.7/site-packages/ismn/interface.py", line 166, in activate_network
    self.__file_collection.to_metadata_csv(meta_csv_file)

  File "/Users/xushan/opt/anaconda3/lib/python3.7/site-packages/ismn/filecollection.py", line 403, in to_metadata_csv
    dfs = pd.concat(dfs, axis=0, sort=True)

  File "/Users/xushan/opt/anaconda3/lib/python3.7/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)

  File "/Users/xushan/opt/anaconda3/lib/python3.7/site-packages/pandas/core/reshape/concat.py", line 304, in concat
    sort=sort,

  File "/Users/xushan/opt/anaconda3/lib/python3.7/site-packages/pandas/core/reshape/concat.py", line 351, in __init__
    raise ValueError("No objects to concatenate")

ValueError: No objects to concatenate

Can you please help me with this? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants