Skip to content

vlomonaco/US-TransportationMode

Repository files navigation

US-TransportationMode

license built with Python2.7

US-Transporation is the name of our dataset that contains sensor data from over 13 users. In light of the lack in the literature of a common benchmark for TMD, we have collected a large set of measurements belonging to different subjects and through a simple Android Application. We openly release the dataset, so that other researchers can benefit from it for further improvements and research reproducibility.
Our dataset is built from people of different gender, age and occupation. Moreover, we do not impose any restriction on the use of the application, hence every user records the data performing the action as she/he is used to, in order to assess real world conditions.
In this page in addition to downloadable datasets, you can find Python's code for extracting features,and building machine learning models to make predictions.
You can find more information about the dataset and our work at: http://cs.unibo.it/projects/us-tm2017/index.html.

Please cite the paper below in your publications if it helps your research:

@article{carpineti18,
  Author = {Claudia Carpineti, Vincenzo Lomonaco, Luca Bedogni, Marco Di Felice, Luciano Bononi},
  Journal = {Proc. of the 14th Workshop on Context and Activity Modeling and Recognition (IEEE COMOREA 2018)},
  Title = {Custom Dual Transportation Mode Detection by Smartphone Devices Exploiting Sensor Diversity},
  Year = {2018}
  DOI = {https://doi.org/10.1109/PERCOMW.2018.8480119}
}

On-line version available here: https://ieeexplore.ieee.org/abstract/document/8480119

Menù

Dependecies

In order to extecute the code in the repository you'll need to install the following dependencies:

Documentation

Code

In this section we show the functionalities developed in our work and the relative parameters used.

TMDataset.py

Function name Parameter Description
clean_file() Fix original raw files problems:
  • delete measure from sensor_to_exclude
  • if sound or speed measure rows have negative time, use module
  • if **time** have incorrect values ("/", ">", "<", "-", "_"...), delete file
  • if file is empty, delete file
transform_raw_data() Transform sensor raw data in orientation independent data (with magnitude metric)
__fill_data_structure Fill tm, users, sensors data structures with the relative data from dataset
__range_position_in_header_with_features(sensor_name) sensor_name: name of the sensor Return position of input sensor in header with features
create_header_files() Fill directory with all file consistent with the header without features
__create_time_files() Fill directory with all file consistent with the featured header divided in time window
__create_dataset() Create dataset file
__split_dataset() Split passed dataframe into test and train
preprocessing_files() Clean files and transform in orientation independent
analyze_sensors_support() For each sensors analyze user support, put support result in sensor_support.csv [sensor,nr_user,list_users,list_classes]
create_balanced_dataset(sintetic) sintetic: set if data are sintentic or not. Default the value is False. Analyze dataset composition in term of class and user contribution fill balance_time with minimum number of window for transportation mode
get_excluded_sensors(sensor_set) sensor_set: type of sensor dataset used with different sensor data. Return list of excluded sensor based on the correspondent classification level
get_remained_sensors(sensor_set) sensor_set: type of sensor dataset used with different sensor data. Return list of considered sensors based on the correspondent classification level
get_sensors_set_features() Return list of the sensors set with their features
get_sensor_features(sensor) sensor: data of a specific sensor Return the features of a specific sensor

TMDetection.py

Function name Parameter Description
decision_tree(sensors_set) sensor_set: type of sensor dataset used with different sensor data Decision tree algorithm training on training al train set and test on all test set
random_forest(sensors_set) sensor_set: type of sensor dataset used with different sensor data Random forest algorithm training on training al train set and test on all test set
neural_network(sensors_set) sensor_set: type of sensor dataset used with different sensor data Neural network algorithm training on training al train set and test on all test set
support_vector_machine(sensors_set) sensor_set: type of sensor dataset used with different sensor data Support vector machine algorithm training on training al train set and test on all test set
classes_combination(sensors_set) sensor_set: type of sensor dataset used with different sensor data Use different algorithms changing target classes, try all combination of two target classes
leave_one_subject_out(sensors_set) sensor_set: type of sensor dataset used with different sensor data
support_vector_machine(sensors_set) sensor_set: type of sensor dataset used with different sensor data Use different algorithms leaving one subject out from training and testing only on this subject considering all classes in dataset and only user classes
single_sensor_accuracy() Use feature relative to one sensor to build model and evaluate

Get started

Before starting, you must first download the data:

python download_dataset.py

Then you have to clean the raw data and extract the feature:

python TMDataset.py

Finally you can build models:

python TMDetection.py

For further and detail information about our code, see our tutorial section

Project Structure

Up to now the projects is structured as follows:

.
├── TransportationData
|   ├── datasetBalanced
|         └── ...
|   └── _RawDataOriginal
|         └── ...
├── README.md
├── LICENSE
├── const.py
├── function.py
├── TMDataset.py
├── TMDetection.py
├── util.py
├── sintetic_dataset_generator.py
├── sintetic_dataset_config.json
├── download_dataset.py
└── cleanLog.log

License

This work is licensed under a MIT License.

Team of collaborators

This project has been developed at the University of Bologna with the effort of different people:

Past collaborators

FAQ

I would need to know the units of the timestamps of each sensor measurements. In your article, you mention that the sampling frequency is approximately 20Hz. However, you do not specify the units of these time stamps. Are they given in seconds or milliseconds?

The timestamps are in milliseconds!

Can I assume that the units of the sensor data (accelerometer, gyroscope and magnetometer) are the standard ones (m/s^2, rad/s and uT, respectively)?

Yes.

Releases

No releases published

Packages

No packages published

Languages