Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trajectory Datasets #28

Open
2 of 3 tasks
JustinShenk opened this issue Dec 22, 2020 · 6 comments
Open
2 of 3 tasks

Trajectory Datasets #28

JustinShenk opened this issue Dec 22, 2020 · 6 comments
Assignees

Comments

@JustinShenk
Copy link
Collaborator

JustinShenk commented Dec 22, 2020

Enable loading trajectory datasets via Traja API:

An early attempt, designed for Pedestrian datasets (hence, ped_id): https://github.com/traja-team/traja/blob/master/traja/datasets/dataset.py and data/loader.py.

Returns a TrajaDataFrame (a pandas DataFrame converted via trj = traja.TrajaDataFrame(df) (see https://traja.readthedocs.io/en/latest/reading.html for more on this).

A similar API to GeoPandas would be nice (https://stackoverflow.com/a/51625390/6256888), eg, traja.datasets.available. Look here for more inspiration: https://github.com/geopandas/geopandas/tree/master/geopandas/datasets.

@JustinShenk
Copy link
Collaborator Author

@Saran-nns was the current dataset.py written by you? Do you mind if it is hacked up to output a dataframe instead of Torch tensor?

@Saran-nns
Copy link
Member

Saran-nns commented Dec 24, 2020

Apart from what mentioned above, dataset.py at PR #26 contains additional functions to prepare the data loaders. It burrows several utility functions from datasets.utils to extract and preprocess the data. So, I guess it is convenient to setup a new helper function at datasets.utils to create traja dataframe from the csv or available datasets, then it could be called inside datasets.utils.generate_dataset(df,n_past, n_future)

At the moment,generate_dataset(df,n_past, n_future) at datasets.utils receives pd.dataframe as input and return tensors of train and test time-series datasets along with corresponding categories(IDs) which are then fed into dataloaders.

So we expect a separate utility function for available dataset as,

def load_data(dataset:str):

    #Precheck

    try:

       dataset =  traja.datasets.utils.load_data(dataset) # read csv file using pandas

    except:

         raise exception(f'{dataset}' "is not in" f'list(traja.datasets.utils.available())')

    # Load the data

    df = pd.read_csv(dataset)
 
   return traja.dataframe(df)

Once this is done, we can easily set traja dataframe as default data format by replacing isinstance(pd.DataFrame) to isinstance(traja.dataframe) inside traja.datasets.utils.generate_dataset()

@WolfByttner
Copy link
Contributor

@JustinShenk the current handling is intended to be a middle ground between Torch and Pandas. The neural networks require time series and just about nothing else does, so time series are handled as tensors. However, I agree that the networks should output dataframes when they are 'done' so things can interoperate with the rest of Traja. I am just a bit unclear on the finer details of this interface.

@Saran-nns
Copy link
Member

We haven't added the functions for post-training predictions/inferences yet. I will update Trainer to return the network prediction on the test dataset as traja data frame.

@Saran-nns
Copy link
Member

@WolfByttner I am preparing the UML diagram for traja commit #26 . That might easily guide collaborators

@WolfByttner
Copy link
Contributor

his (rather huge) Mallard dataset has temperature, as a possible regression parameter: https://www.movebank.org/cms/webapp?gwt_fragment=page=studies,path=study3109235

You also have geese here (with temps - slightly less volatile such): https://www.movebank.org/cms/webapp?gwt_fragment=page=studies,path=study83912796

https://www.movebank.org/cms/webapp?gwt_fragment=page=studies,path=study577905925 - This dataset has genders and temporal classes. Very interesting

https://www.movebank.org/cms/webapp?gwt_fragment=page=studies,path=study933711994

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants