-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement a function to get data pertaining to more than 2 parameters #807
Comments
This will be an important addition to the STOQS code base! I think what we'd like to enhance is the
If I recall correctly, The already developed code for the UI constructs raw SQL statements that execute self-join statements in order to retrieve multiple Parameters for plotting in the Parameter-Parameter section of the UI. This code would be difficult to extend. Perhaps we can take a fresh approach to get the data in a suitable format for exploration and modeling using Machine Learning techniques. |
Here's a start on a fresh approach, a Django query that gets the first 20 data values from dorado: (venv-stoqs) [vagrant@localhost stoqsgit]$ stoqs/manage.py shell_plus
...
In [1]: mps = MeasuredParameter.objects.using('stoqs_september2013_o').filter(
...: measurement__instantpoint__activity__platform__name='dorado')
...:
In [2]: for i, mp in enumerate(mps[:20]):
...: if i == 0:
...: print("time, depth, latitude, longitude, parameter__name, measuredparameter__datavalue")
...: print(f"{mp.measurement.instantpoint.timevalue}, {mp.measurement.depth:.2f},"
...: f" {mp.measurement.geom.y:.6f}, {mp.measurement.geom.x:.6f}"
...: f" {mp.parameter.name}, {mp.datavalue}")
...:
time, depth, latitude, longitude, parameter__name, measuredparameter__datavalue
2013-09-17 18:42:20, -0.03, 36.734970, -122.128144 sigmat, 25.1383576072121
2013-09-17 18:42:20, -0.03, 36.734970, -122.128144 spice, 0.830712889765499
2013-09-17 18:42:20, -0.03, 36.734970, -122.128144 altitude, 1395.68956636994
2013-09-17 18:42:20, -0.03, 36.734970, -122.128144 temperature, 13.9910522171992
2013-09-17 18:42:20, -0.03, 36.734970, -122.128144 salinity, 33.6403972259011
2013-09-17 18:42:20, -0.03, 36.734970, -122.128144 oxygen, 5.670288605996
2013-09-17 18:42:20, -0.03, 36.734970, -122.128144 nitrate, 0.21
2013-09-17 18:42:20, -0.03, 36.734970, -122.128144 bbp420, 0.00231458255927606
2013-09-17 18:42:20, -0.03, 36.734970, -122.128144 bbp700, 0.00228426640768986
2013-09-17 18:42:20, -0.03, 36.734970, -122.128144 fl700_uncorr, 0.000823624706576738
2013-09-17 18:42:20, -0.03, 36.734970, -122.128144 biolume, 194666664.695293
2013-09-17 18:42:20, -0.03, 36.734970, -122.128144 roll, -4.08951048388392
2013-09-17 18:42:20, -0.03, 36.734970, -122.128144 pitch, -0.105888989907026
2013-09-17 18:42:20, -0.03, 36.734970, -122.128144 yaw, 175.513420572358
2013-09-17 18:42:20, -0.03, 36.734970, -122.128144 sepCountList, None
2013-09-17 18:42:20, -0.03, 36.734970, -122.128144 mepCountList, None
2013-09-17 18:42:18, -0.04, 36.734989, -122.128162 sigmat, 25.1403727711047
2013-09-17 18:42:18, -0.04, 36.734989, -122.128162 spice, 0.829269194464183
2013-09-17 18:42:18, -0.04, 36.734989, -122.128162 altitude, 1395.49904668803
2013-09-17 18:42:18, -0.04, 36.734989, -122.128162 temperature, 13.9828055034561 Maybe there's a way to pivot an output like this to get the data in a format amenable to analysis in Pandas? |
@MBARIMike that definitely looks like the direction we were trying to go in. Maybe "extension" of an existing function was the wrong way to word things given we would be starting fresh. Thank you for making that clarification. |
Also, Pandas has a In [1]: import pandas as pd
In [2]: mps = MeasuredParameter.objects.using('stoqs_september2013_o').filter(
...: measurement__instantpoint__activity__platform__name='dorado')
...:
In [3]: df = pd.DataFrame.from_records(mps.values(
...: 'measurement__instantpoint__timevalue', 'measurement__depth',
...: 'measurement__geom', 'parameter__name', 'datavalue', 'id'
...: ))
...:
In [4]: df.head(20)
Out[4]:
datavalue id measurement__depth measurement__geom measurement__instantpoint__timevalue parameter__name
0 2.476802e+01 5664562 -0.055507 [-121.934897431052, 36.90470983771924] 2013-09-16 20:55:49 sigmat
1 1.262683e+00 5673227 -0.055507 [-121.934897431052, 36.90470983771924] 2013-09-16 20:55:49 spice
2 2.546787e+01 5690556 -0.055507 [-121.934897431052, 36.90470983771924] 2013-09-16 20:55:49 altitude
3 1.582349e+01 5577911 -0.055507 [-121.934897431052, 36.90470983771924] 2013-09-16 20:55:49 temperature
4 3.367453e+01 5629901 -0.055507 [-121.934897431052, 36.90470983771924] 2013-09-16 20:55:49 salinity
5 6.593205e+00 5586576 -0.055507 [-121.934897431052, 36.90470983771924] 2013-09-16 20:55:49 oxygen
6 5.360300e+02 5595241 -0.055507 [-121.934897431052, 36.90470983771924] 2013-09-16 20:55:49 nitrate
7 9.528316e-03 5603906 -0.055507 [-121.934897431052, 36.90470983771924] 2013-09-16 20:55:49 bbp420
8 6.610731e-03 5612571 -0.055507 [-121.934897431052, 36.90470983771924] 2013-09-16 20:55:49 bbp700
9 4.761394e-04 5621236 -0.055507 [-121.934897431052, 36.90470983771924] 2013-09-16 20:55:49 fl700_uncorr
10 9.728126e+09 5638566 -0.055507 [-121.934897431052, 36.90470983771924] 2013-09-16 20:55:49 biolume
11 -1.292509e+01 5647231 -0.055507 [-121.934897431052, 36.90470983771924] 2013-09-16 20:55:49 roll
12 -6.497791e+00 5655896 -0.055507 [-121.934897431052, 36.90470983771924] 2013-09-16 20:55:49 pitch
13 5.802254e+01 5664561 -0.055507 [-121.934897431052, 36.90470983771924] 2013-09-16 20:55:49 yaw
14 NaN 5690705 -0.055507 [-121.934897431052, 36.90470983771924] 2013-09-16 20:55:49 sepCountList
15 NaN 5691417 -0.055507 [-121.934897431052, 36.90470983771924] 2013-09-16 20:55:49 mepCountList
16 2.476093e+01 5664563 -0.082238 [-121.93492018129153, 36.90469289678784] 2013-09-16 20:55:47 sigmat
17 1.270436e+00 5673228 -0.082238 [-121.93492018129153, 36.90469289678784] 2013-09-16 20:55:47 spice
18 2.544076e+01 5690555 -0.082238 [-121.93492018129153, 36.90469289678784] 2013-09-16 20:55:47 altitude
19 1.585611e+01 5577910 -0.082238 [-121.93492018129153, 36.90469289678784] 2013-09-16 20:55:47 temperature |
So instead of manipulating x and y such as |
I suggest creating a new file for now. Perhaps it could be a Jupyter Notebook that demonstrates an analysis. |
So looking at classify.py, would we need to construct a |
We'd need to understand the functional requirements better; perhaps a new option (or implementation of an aspirational option already in classify.py) is an approach. I'd like to see a Jupyter Notebook demonstration - that will help us decide. |
Extend the function implemented https://github.com/stoqs/stoqs/blob/master/stoqs/contrib/analysis/init.py, _
getMeasuredPPData
, which gets the measured data when given two parameters. Extending this function to get all parameters or a given list of parameters for a given platform will allow for more data and features when exploring and modeling for the output data. This can be vital to improving the performance of a machine learning algorithm.The goal is to get this data into a pandas dataframe, or similar, to have an easier base to work with when implementing further machine learning algorithms.
Myself, @MBARIMike, @bretstine and @markmocek will be exploring this issue further for part of Fall Capstone 2018
The text was updated successfully, but these errors were encountered: