Data sciencing women's cycling, pt 2

Posted by Amelia on Sunday, August 25, 2019

tl;dr The number of seasons a rider has been professional is a much better predictor of UCI ranking than age in female professional cycling and the 6th season as a professional is when the most riders have their best ranking.

In part 1 of my data sciencing of women’s professional cycling, I focused on the demographics of the women’s peloton (how the number of riders has changed over time, where the riders are from, how many seasons they stay). In this post, I’ve focused on rider rankings and their relationship to age and the number of seasons a rider has been professional for.

Raw data covering women’s road rankings from 2009-2018 was obtained from the website of the UCI (Union Cycliste Internationale; the world governing body for professional cycling) in August of 2019. The rider database available on the site actually goes back a bit further than the ranking datasets, so it was possible to infer how many seasons riders had been professional prior to the ranking starting as well. If you are interested in the code used to generate this post, it can be found here.

Caveats: there is a large amount of variability in the dataset due to the athletes themselves and the stochastic nature of the sport, but I have focused my analyses on identifying overall trends from the data. I also acknowledge that focusing on UCI ranking is not without its problems due to the team nature of the sport and the presence of domestiques (riders who work solely for the team or lead rider, rather than their own placing).

Data processing note: the UCI reports numeric rankings (1 to XXX) for each rider, but to normalize for the two-fold increase in the number of riders that occurred in the last 10 years, I transformed the data into percent rank across each year, where the top ranked rider becomes the 100% percentile and the bottom rider is at the 0% percentile.

Riders in their late 20s are the highest ranked on average, but riders of all age groups can be found across the rankings.

As covered in my previous post, the median age for professional cyclist according to the UCI dataset is 25, but the distribution is skewed towards young riders in the U23 range with a long tail of older riders. However, cycling is somewhat unique among professional sports in that it is not uncommon for people to pick it up later in life or transition to it from other sports (whereas I cannot imagine someone “picking up” the sport of soccer with no prior experience and turning professional at the age of 30).

If we do a smoothing of the dataset we find that, on average, riders in their mid to late 20s are the highest ranked.

Young riders (18-19 years old) are the lowest ranked on average, even lower than the 40+ riders, but the uncertainty among older riders is higher because there just aren’t that many of them. I also examined whether this curve has changed over the past 10 years and found it has been largely consistent, though there have been some fluctuations in the tail end of the curve due to dominant older riders in particular seasons.

However, this is a bit of an oversimplification, as there are riders of all rankings across the age spectrum as we can see in the graph below where each dot corresponds to an individual rider.

From this, we can conclude that age alone is not sufficient as a predictor of ranking.

Data visualization detour: I also made a raster map showing the density of different ages and rankings that gets around some of the overplotting issues from such a large, overlapping dataset. While it’s message might not be as intuitive as a simple smoothed curve or a dotplot, I think it really highlights the concentration of the U23 rider population in the bottom 50% of the peloton and how the top 20th percentiles are really dominated by riders from in their mid to late 20s.

Rider rank is highly correlated with the number of seasons as a professional

So, if age itself is not strongly predictive of rider ranking, what might do a better job? Perhaps not surprisingly, we find a stronger correlation with the number of seasons a rider has been professional.

It makes sense that rider skill, and consequently ranking, increases with the number of seasons as a professional. Additionally, riders who are not improving or getting good results are likely falling out of the dataset because they can’t get a contract. Lastly, I think the exponential increase during the first few seasons can be partially explained by the hierarchical nature of the sport, where new riders often serve as domestiques or workers for the team’s lead rider(s).

However, it is a little surprising that ranking doesn’t seem to decrease after a certain point. Is it truly the case that riders are quitting at their peak? Or, are rider’s ranks dropping off, but not dramatically enough to be visible at this scale?

If we look at the individual career trajectories for the 73 riders in the data with at least 8 seasons, we see that they actually stratify into two populations: riders who eventually go onto have a placing in the top 10% of riders peak in their 5th-7th season and then decline a little bit in later seasons, and riders who never crack the top 10% who steadily increase until the end of the dataset, with the sharpest increase around season 5.

The underlying cause of this difference remains unknown, but one potential explanation for top riders decline is that the competition at the top of the sport is simply too hard to maintain as riders age. Alternately, these top riders transition from being the protected rider towards a road captain working for other instead of their own placing, whereas the riders who never rank as highly were always working as domestiques instead of riders going for their own placing.

(This graph also highlights the noise in the dataset. There are trends, but there is by no means a “standard” career trajectory among riders.)

The other interesting observation from this data is that the median first season ranking for riders who go onto have long professional careers (54th percentile) is much higher than the overall ranking for first season riders (32nd percentile), indicating that the riders who go onto have long professional careers start even their first season well above average.

Riders have their highest ranked season during their 6th year as a professional

Looking at the overall trend of when riders peak during their career, the 6th season was the best ranked for cyclists who were professional for at least eight seasons. Not surprisingly, no rider had their best season during their first year as a pro, but some riders had their best ranked year after 10 years in the professional peloton!!

Career length stratifies the interaction between age and ranking

Finally, circling back to age and its relationship to ranking, we can see that rider ranking by age stratifies quite nicely by the length of a rider’s career. In the figure below, the colored line represents a generalized additive model for each group, while grey indicates a 95% confidence interval.

For riders who are only professional for 1-3 years, age does not signifiantly effect their rankings, which stay largely constant across the age spectrum whether they start at age 18 or age 40. Looking at rankings during when riders are in their 20s, riders with long, 10+ year careers are the highest ranked during this age period, followed by those who are professional for 7-9 years, and then 4-6 years.

If you are interested in following additional content related to women’s professional cycling, you can check out the following resources:

  • UCI YouTube Channel Video coverage of women’s races are often hard to find so it’s nice that the UCI often has race recap videos on it’s YouTube channel

  • Voxwomen Videos, rider blogs, and a podcast to take you behind the scenes of women’s bike racing

  • Velo Focus Great race and candid photographs of womens cycling. Also on Twitter

  • Follow Sarah Connoll on Twitter Former host of the podcast “Pro Women’s Cycling” who still tweets about women’s racing