RunPandas - Python Package for handing running data from GPS-enabled devices to worldwide race results.

Introduction

RunPandas is a project to add support for data collected by GPS-enabled tracking devices, heart rate monitors data to [pandas](http://pandas.pydata.org) objects. It is a Python package that provides infrastructure for importing tracking data from such devices, enabling statistical and visual analysis for running enthusiasts and lovers. Its goal is to fill the gap between the routine collection of data and their manual analyses in Pandas and Python.

Since the release 0.6.0 it comes with the support of handling race event results, so we can analyze from race split times, finish times, demographics, etc. The goal is to support several many races results available to anyone interested in running race results analytics.

Documentation

Stable documentation __ is available on github.io. A second copy of the stable documentation is hosted on read the docs for more details.

Development documentation is available for the latest changes in master.

==> Check out this Blog post for the reasoning and philosophy behind Runpandas, as well as a detailed tutorial with code examples.

==> Follow this Runpandas live book in Jupyter notebook format based on Jupyter Books.

Install

RunPandas depends on the following packages:

pandas
fitparse
stravalib
pydantic
pyaml
haversine
thefuzz`

Runpandas was tested to work on *nix-like systems, including macOS.

Install latest release version via pip

$ pip install runpandas

Install latest release version via conda ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

$ conda install -c marcelcaraciolo runpandas

Install latest development version

$ pip install git+https://github.com/corriporai/runpandas.git

or

$ git clone https://github.com/corriporai/runpandas.git
$ python setup.py install

Examples

Install using pip and then import and use one of the tracking readers. This example loads a local file.tcx. From the data file, we obviously get time, altitude, distance, heart rate and geo position (lat/long).

# !pip install runpandas
import runpandas as rpd
activity = rpd.read_file('./sample.tcx')

activity.head(5)

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	alt	dist	hr	lon	lat
time
00:00:00	178.942627	0.000000	62.0	-79.093187	35.951880
00:00:01	178.942627	0.000000	62.0	-79.093184	35.951880
00:00:06	178.942627	1.106947	62.0	-79.093172	35.951868
00:00:12	177.500610	13.003035	62.0	-79.093228	35.951774
00:00:16	177.500610	22.405027	60.0	-79.093141	35.951732

The data frames that are returned by runpandas when loading files is similar for different file types. The dataframe in the above example is a subclass of the pandas.DataFrame and provides some additional features. Certain columns also return specific pandas.Series subclasses, which provides useful methods:

print (type(activity))
print(type(activity.alt))

For instance, if you want to get the base unit for the altitude alt data or the distance dist data:

print(activity.alt.base_unit)
print(activity.alt.sum())

m 65883.68151855901

print(activity.dist.base_unit)
print(activity.dist[-1])

m 4686.31103516

The Activity dataframe also contains special properties that presents some statistics from the workout such as elapsed time, mean heartrate, the moving time and the distance of workout in meters.

#total time elapsed for the activity
print(activity.ellapsed_time)
#distance of workout in meters
print(activity.distance)
#mean heartrate
print(activity.mean_heart_rate())

0 days 00:33:11 4686.31103516 156.65274151436032

Occasionally, some observations such as speed, distance and others must be calculated based on available data in the given activity. In runpandas there are special accessors (runpandas.acessors) that computes some of these metrics. We will compute the speed and the distance per position observations using the latitude and longitude for each record and calculate the haversine distance in meters and the speed in meters per second.

#compute the distance using haversine formula between two consecutive latitude, longitudes observations.
activity['distpos']  = activity.compute.distance()
activity['distpos'].head()

time 00:00:00 NaN 00:00:01 0.333146 00:00:06 1.678792 00:00:12 11.639901 00:00:16 9.183847 Name: distpos, dtype: float64

#compute the distance using haversine formula between two consecutive latitude, longitudes observations.
activity['speed']  = activity.compute.speed(from_distances=True)
activity['speed'].head()

time 00:00:00 NaN 00:00:01 0.333146 00:00:06 0.335758 00:00:12 1.939984 00:00:16 2.295962 Name: speed, dtype: float64

Popular running metrics are also available through the runpandas acessors such as gradient, pace, vertical speed , etc.

activity['vam'] = activity.compute.vertical_speed()
activity['vam'].head()

time 00:00:00 NaN 00:00:01 0.000000 00:00:06 0.000000 00:00:12 -0.240336 00:00:16 0.000000 Name: vam, dtype: float64

Sporadically, there will be a large time difference between consecutive observations in the same workout. This can happen when device is paused by the athlete or therere proprietary algorithms controlling the operating sampling rate of the device which can auto-pause when the device detects no significant change in position. In runpandas there is an algorithm that will attempt to calculate the moving time based on the GPS locations, distances, and speed of the activity.

To compute the moving time, there is a special acessor that detects the periods of inactivity and returns the moving series containing all the observations considered to be stopped.

activity_only_moving = activity.only_moving()
print(activity_only_moving['moving'].head())

time 00:00:00 False 00:00:01 False 00:00:06 False 00:00:12 True 00:00:16 True Name: moving, dtype: bool

Now we can compute the moving time, the time of how long the user were active.

activity_only_moving.moving_time

Timedelta('0 days 00:33:05')

Runpandas also provides a method summary for summarising the activity through common statistics. Such a session summary includes estimates of several metrics computed above with a single call.

activity_only_moving.summary()

Session Running: 26-12-2012 21:29:53 Total distance (meters) 4686.31 Total ellapsed time 0 days 00:33:11 Total moving time 0 days 00:33:05 Average speed (km/h) 8.47656 Average moving speed (km/h) 8.49853 Average pace (per 1 km) 0 days 00:07:04 Average pace moving (per 1 km) 0 days 00:07:03 Average cadence NaN Average moving cadence NaN Average heart rate 156.653 Average moving heart rate 157.4 Average temperature NaN dtype: object

Now, let’s play with the data. Let’s show distance vs as an example of what and how we can create visualizations. In this example, we will use the built in, matplotlib based plot function.

activity[['dist']].plot()

Matplotlib is building the font cache; this may take a moment.

<AxesSubplot:xlabel='time'>

And here is altitude versus time.

activity[['alt']].plot()

<AxesSubplot:xlabel='time'>

Finally, lest’s show the altitude vs distance profile. Here is a scatterplot that shows altitude vs distance as recorded.

activity.plot.scatter(x='dist', y='alt', c='DarkBlue')

<AxesSubplot:xlabel='dist', ylabel='alt'>

Finally, let’s watch a glimpse of the map route by plotting a 2d map using logintude vs latitude.

activity.plot(x='lon', y='lat')

<AxesSubplot:xlabel='lon'>

The runpandas package also comes with extra batteries, such as our runpandas.datasets package, which includes a range of example data for testing purposes. There is a dedicated repository with all the data available. An index of the data is kept here.

You can use the example data available:

example_fit = rpd.activity_examples(path='Garmin_Fenix_6S_Pro-Running.fit')
print(example_fit.summary)
print('Included metrics:', example_fit.included_data)

Synced from watch Garmin Fenix 6S

Included metrics: [<MetricsEnum.latitude: 'latitude'>, <MetricsEnum.longitude: 'longitude'>, <MetricsEnum.elevation: 'elevation'>, <MetricsEnum.heartrate: 'heartrate'>, <MetricsEnum.cadence: 'cadence'>, <MetricsEnum.distance: 'distance'>, <MetricsEnum.temperature: 'temperature'>]

rpd.read_file(example_fit.path).head()

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	enhanced_speed	enhanced_altitude	unknown_87	fractional_cadence	lap	session	unknown_108	dist	cad	hr	lon	lat	temp
time
00:00:00	0.000	254.0	0	0.0	0	0	NaN	0.00	0	101	13.843376	51.066280	8
00:00:01	0.000	254.0	0	0.0	0	0	NaN	0.00	0	101	13.843374	51.066274	8
00:00:10	1.698	254.0	0	0.0	0	1	2362.0	0.00	83	97	13.843176	51.066249	8
00:00:12	2.267	254.0	0	0.0	0	1	2362.0	3.95	84	99	13.843118	51.066250	8
00:00:21	2.127	254.6	0	0.5	0	1	2552.0	16.67	87	100	13.842940	51.066231	8

In case of you just only want to see all the activities in a specific file type , you can filter the runpandas.activities_examples, which returns a filter iterable that you can iterate over:

fit_examples = rpd.activity_examples(file_type=rpd.FileTypeEnum.FIT)
for example in fit_examples:
    #Download and play with the filtered examples
    print(example.path)

https://raw.githubusercontent.com/corriporai/runpandas-data/master/activities/Garmin_Fenix_6S_Pro-Running.fit https://raw.githubusercontent.com/corriporai/runpandas-data/master/activities/Garmin_Fenix2_running_with_hrm.fit https://raw.githubusercontent.com/corriporai/runpandas-data/master/activities/Garmin_Forerunner_910XT-Running.fit

Exploring sessions

The package runpandas provides utilities to import a group of activities data, and after careful processing, organises them into a MultiIndex Dataframe.

The pandas.MultiIndex allows you to have multiple columns acting as a row identifier and multiple rows acting as a header identifier. In our scenario we will have as first indentifier (index) the timestamp of the workout when it started, and as second indentifier the timedelta of the consecutive observations of the workout.

The MultiIndex Runpandas Activity Dataframe

The MultiIndex dataframe result from the function runpandas.read_dir_aggregate, which takes as input the directory of tracking data files, and constructs using the read*() functions to build runpandas.Activity objects. Them, the result daframes are first sorted by the time stamps and are all combined into a single runpandas.Activity indexed by the two-level pandas.MultiIndex.

Let’s illustrate these examples by loading a bunch of 68 running activities of a female runner over the years of 2020 until 2021.

import warnings
warnings.filterwarnings('ignore')

import runpandas
session = runpandas.read_dir_aggregate(dirname='session/')

session

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

		alt	hr	lon	lat
start	time
2020-08-30 09:08:51.012	00:00:00	NaN	NaN	-34.893609	-8.045055
	00:00:01.091000	NaN	NaN	-34.893624	-8.045054
	00:00:02.091000	NaN	NaN	-34.893641	-8.045061
	00:00:03.098000	NaN	NaN	-34.893655	-8.045063
	00:00:04.098000	NaN	NaN	-34.893655	-8.045065
...	...	...	...	...	...
2021-07-04 11:23:19.418	00:52:39.582000	0.050001	189.0	-34.894534	-8.046602
	00:52:43.582000	NaN	NaN	-34.894465	-8.046533
	00:52:44.582000	NaN	NaN	-34.894443	-8.046515
	00:52:45.582000	NaN	NaN	-34.894429	-8.046494
	00:52:49.582000	NaN	190.0	-34.894395	-8.046398

48794 rows × 4 columns

Now let’s see how many activities there are available for analysis. For this question, we also have an acessor runpandas.types.acessors.session._SessionAcessor that holds several methods for computing the basic running metrics across all the activities from this kind of frame and some summary statistics.

#count the number of activities in the session
print ('Total Activities:', session.session.count())

Total Activities: 68

We might compute the main running metrics (speed, pace, moving, etc) using the session acessors methods as like the ones available in the runpandas.types.metrics.MetricsAcessor . By the way, those methods are called inside each metric method, but applying in each of activities separatedely.

#In this example we compute the distance and the distance per position across all workouts
session = session.session.distance()
session

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

		alt	hr	lon	lat	distpos	dist
start	time
2020-08-30 09:08:51.012	00:00:00	NaN	NaN	-34.893609	-8.045055	NaN	NaN
	00:00:01.091000	NaN	NaN	-34.893624	-8.045054	1.690587	1.690587
	00:00:02.091000	NaN	NaN	-34.893641	-8.045061	2.095596	3.786183
	00:00:03.098000	NaN	NaN	-34.893655	-8.045063	1.594298	5.380481
	00:00:04.098000	NaN	NaN	-34.893655	-8.045065	0.163334	5.543815
...	...	...	...	...	...	...	...
2021-07-04 11:23:19.418	00:52:39.582000	0.050001	189.0	-34.894534	-8.046602	12.015437	8220.018885
	00:52:43.582000	NaN	NaN	-34.894465	-8.046533	10.749779	8230.768664
	00:52:44.582000	NaN	NaN	-34.894443	-8.046515	3.163638	8233.932302
	00:52:45.582000	NaN	NaN	-34.894429	-8.046494	2.851535	8236.783837
	00:52:49.582000	NaN	190.0	-34.894395	-8.046398	11.300740	8248.084577

48794 rows × 6 columns

#comput the speed for each activity
session = session.session.speed(from_distances=True)
#compute the pace for each activity
session = session.session.pace()
#compute the inactivity periods for each activity
session = session.session.only_moving()

After all the computation done, let’s going to the next step: the exploration and get some descriptive statistics.

After the loading and metrics computation for all the activities, now let’s look further the data and get the basic summaries about the session: time spent, total distance, mean speed and other insightful statistics in each running activity. For this task, we may accomplish it by calling the method runpandas.types.session._SessionAcessor.summarize . It will return a basic Dataframe including all the aggregated statistics per activity from the season frame.

summary = session.session.summarize()
summary

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	moving_time	mean_speed	max_speed	mean_pace	max_pace	mean_moving_speed	mean_moving_pace	mean_cadence	max_cadence	mean_moving_cadence	mean_heart_rate	max_heart_rate	mean_moving_heart_rate	mean_temperature	min_temperature	max_temperature	total_distance	ellapsed_time
start
2020-07-03 09:50:53.162	00:25:29.838000	2.642051	4.879655	00:06:18	00:03:24	2.665008	00:06:15	NaN	NaN	NaN	178.819923	188.0	178.872587	NaN	NaN	NaN	4089.467333	00:25:47.838000
2020-07-05 09:33:20.999	00:05:04.999000	2.227637	6.998021	00:07:28	00:02:22	3.072098	00:05:25	NaN	NaN	NaN	168.345455	176.0	168.900000	NaN	NaN	NaN	980.162640	00:07:20.001000
2020-07-05 09:41:59.999	00:18:19	1.918949	6.563570	00:08:41	00:02:32	2.729788	00:06:06	NaN	NaN	NaN	173.894180	185.0	174.577143	NaN	NaN	NaN	3139.401118	00:27:16
2020-07-13 09:13:58.718	00:40:21.281000	2.509703	8.520387	00:06:38	00:01:57	2.573151	00:06:28	NaN	NaN	NaN	170.808176	185.0	170.795527	NaN	NaN	NaN	6282.491059	00:41:43.281000
2020-07-17 09:33:02.308	00:32:07.691000	2.643278	8.365431	00:06:18	00:01:59	2.643278	00:06:18	NaN	NaN	NaN	176.436242	186.0	176.436242	NaN	NaN	NaN	5095.423045	00:32:07.691000
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
2021-06-13 09:22:30.985	01:32:33.018000	2.612872	23.583956	00:06:22	00:00:42	2.810855	00:05:55	NaN	NaN	NaN	169.340812	183.0	169.655879	NaN	NaN	NaN	15706.017295	01:40:11.016000
2021-06-20 09:16:55.163	00:59:44.512000	2.492640	6.065895	00:06:41	00:02:44	2.749453	00:06:03	NaN	NaN	NaN	170.539809	190.0	171.231392	NaN	NaN	NaN	9965.168311	01:06:37.837000
2021-06-23 09:37:44.000	00:26:49.001000	2.501796	5.641343	00:06:39	00:02:57	2.568947	00:06:29	NaN	NaN	NaN	156.864865	171.0	156.957031	NaN	NaN	NaN	4165.492241	00:27:45.001000
2021-06-27 09:50:08.664	00:31:42.336000	2.646493	32.734124	00:06:17	00:00:30	2.661853	00:06:15	NaN	NaN	NaN	166.642857	176.0	166.721116	NaN	NaN	NaN	5074.217061	00:31:57.336000
2021-07-04 11:23:19.418	00:47:47.583000	2.602263	4.212320	00:06:24	00:03:57	2.856801	00:05:50	NaN	NaN	NaN	177.821862	192.0	177.956967	NaN	NaN	NaN	8248.084577	00:52:49.582000

68 rows × 18 columns

print('Session Interval:', (summary.index.to_series().max() - summary.index.to_series().min()).days, 'days')
print('Total Workouts:', len(summary), 'runnings')
print('Tota KM Distance:', summary['total_distance'].sum() / 1000)
print('Average Pace (all runs):', summary.mean_pace.mean())
print('Average Moving Pace (all runs):', summary.mean_moving_pace.mean())
print('Average KM Distance (all runs):', round(summary.total_distance.mean()/ 1000,2))

Session Interval: 366 days Total Workouts: 68 runnings Tota KM Distance: 491.77377537338896 Average Pace (all runs): 0 days 00:07:18.411764 Average Moving Pace (all runs): 0 days 00:06:02.147058 Average KM Distance (all runs): 7.23

At this point, I have the summary data to start some powerful visualization and analysis. At the charts below we illustrate her pace and distance evolution over time.

import matplotlib.pyplot as plt
import datetime

#let's convert the pace to float number in minutes
summary['mean_moving_pace_float'] = summary['mean_moving_pace'] / datetime.timedelta(minutes=1)
summary['pace_moving_all_mean'] = summary.mean_moving_pace.mean()
summary['pace_moving_all_mean_float'] = summary['pace_moving_all_mean'] / datetime.timedelta(minutes=1)

plt.subplots(figsize=(8, 5))

plt.plot(summary.index, summary.mean_moving_pace_float, color='silver')
plt.plot(summary.pace_moving_all_mean_float, color='purple', linestyle='dashed', label='average')
plt.title("Pace Evolution")
plt.xlabel("Runnings")
plt.ylabel("Pace")
plt.legend()

<matplotlib.legend.Legend at 0x7f82d8d83cd0>

plt.subplots(figsize=(8, 5))

summary['distance_all_mean'] = round(summary.total_distance.mean()/1000,2)

plt.plot(summary.index, summary.total_distance / 1000, color='silver')
plt.plot(summary.distance_all_mean, color='purple', linestyle='dashed', label='average')
plt.title("Distance Evolution")
plt.xlabel("Runs")
plt.ylabel("distance")
plt.legend()


plt.show()

Accessing historical data from running race results

One of the great features in Runpandas is the capability of accessing race’s result datasets accross several races around the world, from majors to local ones (if it’s available at our data repository). In this example we will analyze the 2022 Berlin Marathon using runpandas methods specially tailored for handling race results data.

First, let’s load the Berlin Marathon data by using the runpandas method runpandas.get_events. This function provides a way of accessing the race data and visualize the results from several marathons available at our datasets repository. Given the year and the marathon identifier you can filter any marathon datasets that you want analyze. The result will be a list of runpandas.EventData instances with race result and its metadata. Let’s look for Berlin Marathon results.

import pandas as pd
import runpandas as rpd
import warnings
warnings.filterwarnings('ignore')

results = rpd.get_events('Berlin')
results

[<Event: name=Berlin Marathon Results from 2022., country=DE, edition=2022>]

The result comes with the Berlin Marathon Result from 2022. Let’s take a look inside the race event, which comes with a handful method to describe its attributes and a special method to load the race result data into a runpandas.datasets.schema.RaceData instance.

berlin_result = results[0]
print('Event type', berlin_result.run_type)
print('Country', berlin_result.country)
print('Year', berlin_result.edition)
print('Name', berlin_result.summary)

Event type RunTypeEnum.MARATHON Country DE Year 2022 Name Berlin Marathon Results from 2022.

Now that we confirmed that we requested the corresponding marathon dataset. We will load it into a DataFrame so we can further explore it.

#loading the race data into a RaceData Dataframe
race_result = berlin_result.load()
race_result

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	position	position_gender	country	sex	division	bib	firstname	lastname	club	starttime	...	10k	15k	20k	25k	30k	35k	40k	grosstime	nettime	category
0	1	1	KEN	M	1	1	Eliud	Kipchoge	–	09:15:00	...	0 days 00:28:23	0 days 00:42:33	0 days 00:56:45	0 days 01:11:08	0 days 01:25:40	0 days 01:40:10	0 days 01:54:53	0 days 02:01:09	0 days 02:01:09	M35
1	2	2	KEN	M	1	5	Mark	Korir	–	09:15:00	...	0 days 00:28:56	0 days 00:43:35	0 days 00:58:14	0 days 01:13:07	0 days 01:28:06	0 days 01:43:25	0 days 01:59:05	0 days 02:05:58	0 days 02:05:58	M30
2	3	3	ETH	M	1	8	Tadu	Abate	–	09:15:00	...	0 days 00:29:46	0 days 00:44:40	0 days 00:59:40	0 days 01:14:44	0 days 01:30:01	0 days 01:44:55	0 days 02:00:03	0 days 02:06:28	0 days 02:06:28	MH
3	4	4	ETH	M	2	26	Andamlak	Belihu	–	09:15:00	...	0 days 00:28:23	0 days 00:42:33	0 days 00:56:45	0 days 01:11:09	0 days 01:26:11	0 days 01:42:14	0 days 01:59:14	0 days 02:06:40	0 days 02:06:40	MH
4	5	5	KEN	M	3	25	Abel	Kipchumba	–	09:15:00	...	0 days 00:28:55	0 days 00:43:35	0 days 00:58:14	0 days 01:13:07	0 days 01:28:03	0 days 01:43:08	0 days 01:59:14	0 days 02:06:49	0 days 02:06:49	MH
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
35566	DNF	–	USA	M	–	65079	michael	perkowski	–	–	...	NaT	NaT	NaT	NaT	NaT	NaT	NaT	NaT	NaT	M65
35567	DNF	–	USA	M	–	62027	Karl	Mann	–	–	...	NaT	NaT	NaT	NaT	NaT	NaT	NaT	NaT	NaT	M55
35568	DNF	–	THA	F	–	27196	oraluck	pichaiwongse	STATE to BERLIN 2022	–	...	NaT	NaT	NaT	NaT	NaT	NaT	NaT	NaT	NaT	W55
35569	DNF	–	SUI	M	–	56544	Gerardo	GARCIA CALZADA	–	–	...	NaT	NaT	NaT	NaT	NaT	NaT	NaT	NaT	NaT	M50
35570	DNF	–	AUT	M	–	63348	Harald	Mori	Albatros	–	...	NaT	NaT	NaT	NaT	NaT	NaT	NaT	NaT	NaT	M60

35571 rows × 23 columns

Now you can get some insights about the Berlin Marathon 2022, by using its tailored methods for getting basic and quick insights. For example, the number of finishers, number of participants and the winner info.

print('Total participants', race_result.total_participants)
print('Total finishers', race_result.total_finishers)
print('Total Non-Finishers', race_result.total_nonfinishers)

Total participants 35571 Total finishers 34844 Total Non-Finishers 727

race_result.winner

position 1 position_gender 1 country KEN sex M division 1 bib 1 firstname Eliud lastname Kipchoge club – starttime 09:15:00 start_raw_time 09:15:00 half 0 days 00:59:51 5k 0 days 00:14:14 10k 0 days 00:28:23 15k 0 days 00:42:33 20k 0 days 00:56:45 25k 0 days 01:11:08 30k 0 days 01:25:40 35k 0 days 01:40:10 40k 0 days 01:54:53 grosstime 0 days 02:01:09 nettime 0 days 02:01:09 category M35 Name: 0, dtype: object

Eliud Kipchoge of Kenya won the 2022 Berlin Marathon in 2:01:09. Kipchoge’s victory was his fourth in Berlin and 17th overall in a career of 19 marathon starts. And who was the women’s race winner?

race_result[(race_result['position_gender'] == 1) & (race_result['sex'] == 'F')].T

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	32
position	33
position_gender	1
country	ETH
sex	F
division	1
bib	F24
firstname	Tigist
lastname	Assefa
club	–
starttime	09:15:00
start_raw_time	09:15:00
half	0 days 01:08:13
5k	0 days 00:16:22
10k	0 days 00:32:36
15k	0 days 00:48:44
20k	0 days 01:04:43
25k	0 days 01:20:48
30k	0 days 01:36:41
35k	0 days 01:52:27
40k	0 days 02:08:42
grosstime	0 days 02:15:37
nettime	0 days 02:15:37
category	WH

Tigist Assefa of Ethiopia won the women’s race in a stunning time of 2:15:37 to set a new course record in Berlin.

Runpandas also provides a race’s summary method for showing the compilation of some general insights such as finishers, partipants (by gender and overall).

race_result.summary()

Event name berlin marathon Event type 42k Event country DE Event date 25-09-2022 Number of participants 35571 Number of finishers 34844 Number of non-finishers 727 Number of male finishers 23314 Number of female finishers 11523 Winner Nettime 0 days 02:01:09 dtype: objec

Runpandas for some race results come with the splits for the partial distances of the race. We can fetch for any runner the splits using the method runpandas.acessors.splits.pick_athlete. So, if we need to have direct access to all splits from a specific runner, we will use the splits acesssor.

race_result.splits.pick_athlete(identifier='1')

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	time	distance_meters	distance_miles
split
0k	0 days 00:00:00	0	0.0000
5k	0 days 00:14:14	5000	3.1069
10k	0 days 00:28:23	10000	6.2137
15k	0 days 00:42:33	15000	9.3206
20k	0 days 00:56:45	20000	12.4274
half	0 days 00:59:51	21097	13.1091
25k	0 days 01:11:08	25000	15.5343
30k	0 days 01:25:40	30000	18.6411
35k	0 days 01:40:10	35000	21.7480
40k	0 days 01:54:53	40000	24.8548
nettime	0 days 02:01:09	42195	26.2187

With plotting libraries such as matplotlib you can analyze the splits data through a impressive visualization!

eliud_kipchoge_splits = race_result.splits.pick_athlete(identifier='1')

def timeTicks(x, pos):
    seconds = x / 10**9
    d = datetime.timedelta(seconds=seconds)
    return str(d)

fig, ax2 = plt.subplots()
#plot the splits time
#format the y-axis to show the labels as timedelta.
formatter = matplotlib.ticker.FuncFormatter(timeTicks)
#plot the paces per segment
line2, = ax2.plot(eliud_kipchoge_splits_filtered.index, eliud_kipchoge_splits_filtered['pace'],  linestyle='dashed', color='cyan',  lw=5, alpha=0.8)
#plot the overall mean pace
line3, = ax2.plot(eliud_kipchoge_splits_filtered.index, eliud_kipchoge_splits_filtered['mean_pace'], color='#1b9e77', linestyle='dashed',  lw=5, alpha=0.8)

#annotate the pace line with time splits
yvalues = line2.get_ydata()
for index, y in zip(eliud_kipchoge_splits_filtered.index, yvalues):
    formated_time = datetime.timedelta(seconds=eliud_kipchoge_splits_filtered.loc[index,'split_time'].total_seconds())
    ax2.text(index, y, formated_time, weight="bold", size=12,   )

ax2.yaxis.set_major_formatter(formatter)

ax2.grid(False)

ax2.legend(
            (line2, line3),
            ('Splits Time', 'Splits Pace', 'Mean Pace'),
            loc='lower right',
            frameon=False
)


ax2.set_title("Eliud Kipchoge splits time and pace in Berlin Marathon 2022")
ax2.set_xlabel("Splits in kms")
ax2.set_ylabel("Pace min/km")

plt.show()

Get in touch

Report bugs, suggest features or view the source code [on GitHub](https://github.com/corriporai/runpandas).

I'm very interested in your experience with runpandas. Please drop me an note with any feedback you have.

Contributions welcome!

- Marcel Caraciolo

License

Runpandas is licensed under the MIT License. A copy of which is included in LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 277 Commits
.github		.github
ci		ci
conda		conda
docs		docs
examples		examples
runpandas		runpandas
scripts		scripts
.codecov.yml		.codecov.yml
.coveragerc		.coveragerc
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.rst		CHANGELOG.rst
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.rst		README.rst
github_deploy_key_corriporai_runpandas.enc		github_deploy_key_corriporai_runpandas.enc
readthedocs.yml		readthedocs.yml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
test.sh		test.sh
tox.ini		tox.ini
versioneer.py		versioneer.py

License

corriporai/runpandas

Folders and files

Latest commit

History

Repository files navigation

RunPandas - Python Package for handing running data from GPS-enabled devices to worldwide race results.

Introduction

Documentation

Install

Install latest release version via pip

Install latest development version

Examples

Exploring sessions

Accessing historical data from running race results

Get in touch

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages