When city meets bike: how people in Oslo use the shared bike service Bysykkel?
As a small foreign creature walking on the streets of Oslo, I get mixed feelings. Car drivers are usually friendlier to me than their peers elsewhere. They do not drive, like in some cities, as if they are practicing for the next Formula 1 race and they forgive my accidental and unintentional traffic rules transgressions with a smile. On the other hand, bicycles in Oslo move around in a way very different from what I have experienced in cities like Shanghai, Copenhagen and Amsterdam. As a statistician, without data, I cannot really say what that difference is. Knowing it is difficult to get data on the use of private bikes, I took a look at the city’s shared bikes.
Purpose of study
We did a simple study on how the Oslo residents use the shared bike service Bysykkel. Bysykkel literally means city bike in Norwegian and is the only shared bike service in Oslo that has network in the whole city. We hope this study will help me and other pedestrians to better know how to move safely around in Oslo. Perhaps, it can also help Bysykkel’s owning company to plan their construction of new stations, Statens Vegvesen to plan new bike lanes, and Ruter to plan their bus routes1.
When it comes to helping city planners account for cyclists’ usage patterns, it is obviously better if we include data for private bikes, but given its unavailability on a large scale, we think Bysykkel’s data is a good starting point.
Data: source, preprocessing and feature engineering
We extracted data from the site (https://developer.oslobysykkel.no/data) for the shared bike usage in four full months, from the April to July of 20182. This dataset contains the start station ID, the end station ID, the start time (yyyy-mm-dd hh-mm-ss) and the end time of each trip. 99% of the trips started and ended on the same day. We assume people use shared bikes for short trips and, for those trips that lasted more than 1 day, people probably forgot the bikes somewhere. So we removed all trips longer than 1 day. (Some trips start and end right before and after midnight could be relevant for this analysis, but it is hard to define how close to the midnight to call it relevant, so given that they are not a lot, we just remove them as well for simplicity.)
In the next step, we made more features, including time elapsed (in second), weekday and holiday (using Norwegian calendar) indicator, same-start-end-station flag, hour of start time, hour of end time, etc., and removed records in which the end time is before the start time, the trip started and ended at the same station and the time elapsed is longer than mean + 3 standard deviations.
Exploratory data analysis: 3 Ws questions
We use the good old 3 Ws questions to understand people’s usage of Bysykkel: when, where and how. Why is there no “who” question? The answer is that again due to the strict privacy protection law, even in our wildest dream, we never expected to get users’ personal data. Neither did we make any effort to reverse engineer to identify the person who made the trips. If one day we manage to do that, we will not update you here, but call the GDPR lawyers to make some extra bucks.
1. When: time
When do people use Bysykkel? Will the pattern differ from weekday to weekends and holidays? A plot of hours during the day vs. count of trips confirmed this.
This graph also shed some lights on Oslo people’s work schedule. Different from the 9-5 routine in many countries, Norwegian start working long before 9 am and some start to leave office around 3 pm. If we assume the morning peak is mostly related to transportation to work and the afternoon peak comes both from transportation from work and other activities, we can see that the peak comes slightly later for Monday and Tuesday than other weekdays (Monday blues?) and people use Bysykkel a lot more than commuting to and from work, especially on Monday and Tuesday afternoons.
How long do they use Bysykkel for each trip? Most trips lasted less than 20 minutes, and very long trips are rare. We plotted the distribution of the duration for the trips and found it is skewed to the right, while the logarithm of it is much closer to normal distribution, perfect for later use if we want to do some statistical analysis or machine learning.
BILDE 2 BILDE 3
2. Where: stations and routes
Which stations are most often used by Oslo people? The graph below answered this question with different text sizes. The larger text size, the more often the station is used.
It seems that most popular stations are in the very center of the city, especially near the coastal line, except Alexander Kjellands Plass.
Visualizing the routes became somehow hard because there are too many possible routes.
Not all routes are equally popular. The graph below used darkness of the line to indicate how popular a pair of start-end stations is, blue circle for start station and red circle for end station.
Still hard to read? We list the five most popular uphill and downhill routes. We obtained altitude data from Google Map for all stations (except 15 for which we do not have location data). We define the elevation as Meter above Sea Level of end station minus Meter above Sea Level of start station and uphill trip as elevation > 0 and downhill trip as elevation < 0.
The top 5 popular uphill routes are:
1. Sukkerbiten - Paléhaven
2. Sukkerbiten - Sjøsiden øst
3. Sukkerbiten - Sjøsiden vest
4. Frognerstranda - Filipstadveien
5. Alexander Kiellands Plass - Ringnes Park
The top 5 popular downhill routes are:
1. Paléhaven - Sukkerbiten
2. Sjøsiden øst - Sukkerbiten
3. Saga Kino - Tjuvholmen
4. Jernbanetorget - Sukkerbiten
5. Tjuvholmen – Frognerstranda
These graphs and data show that the most used bike stations are in the heart of Oslo, in the area framed by the coastal line, Central Station and National Theatre. And short trips in these areas are most popular.
3. How: speed
We found the coordinates of all stations, except 15, also from Bysykkel’s site and then we were able to calculate the distance between start and end station for each trip, using Haversine formula3. With this distance and time elapsed for each trip, we defined the average speed for each trip. The real speed must be higher than this speed for at least two reasons: 1) The distance is direct distance between two stations instead of actual road distance which must be longer. 2) People might stop their bikes to do grocery shopping, to wait for traffic lights, or just to do something else and the actual time used is less than the time recorded by Bysykkel.
Three trips have reached speed over 10 meter per second and they were all short-distance trips, between 200 and 300 meters. These trips were recorded in city center. The mean speed of 2.287 m/s is equivalent to 8.233 km/hour which is much lower than the average speed of cyclists in Oslo according to a survey conducted by the Transportation Economics Institute in Oslo with GPS data from 720 cyclists and 56000 trips in this summer4.
We have in mind that downhill trips might be of higher speed than uphill trip. So we plotted elevation against speed.
This shows clearly that the speed on average is higher for downhill trips (negative elevation) than uphill trips (positive elevation). And a lot of trips happened between stations of similar altitude (close-to-zero elevation).
During our four-month observation period, Oslo people registered more than 140,000 downhill trips than uphill trips with their city bike. This hints that people probably go downhill by the shared city bike and come back by bus or other means. Below we plotted the distribution of speed for downhill (left) and uphill (right) trips. We can see that the speed of downhill trips has a wide distribution and higher average than that of uphill trips. It makes sense when we consider our own experience of struggling with going up north while the speed for going downhill can vary a lot depending on whether the rider uses the brake.
We plotted the speed on a heatmap using binned latitude and longitude for trips, i.e., we took the middle point of the trips’ start and end stations and binned them. The darker color indicates higher speed.
By visually observing the heatmap, we think the average speed is lower on the west central part of Oslo than elsewhere and we suspect that it is the Frogner area.
We were curious between which stations people rode the city bikes fastest. To determine this, we therefore calculated the average speed at which people reach and leave each station using all trips data. We found the average speed is highest when people arrive at stations:
And when they leave stations:
Among the top five stations people arrive most quickly, four of them are around Skøyen, which is the main traffic hub by the sea. We guess people come to this station in the morning to catch their train or bus after a quick and easy downhill trip with the city bike. All top five stations people leave most quickly are in the middle to north part of the city. Tåsen Senter is the second most northern station and the one with the highest altitude among all stations (121 m above sea level). This is also consistent with our guess that people who leave a place higher up in the north and head south are fastest riders.
1. Oslo is not as famous for widely using bikes as daily transportation means as some cities like Shanghai, Copenhagen and Amsterdam. The altitude data from Google Map prompted us to consider, in addition to culture, climate and construction of bike lanes, what else could be the reason for this? All these four are cities by the sea, but while the highest altitude in Shanghai and Amsterdam is less than 10 meters above sea level, in Copenhagen less than 20 meters above sea level, Oslo’s highest Bysykkel station is 121 meters above sea level and Sognsvann area is more than 200 meters above sea level. This gives a lot of momentum to downhill trips and makes uphill trips harder. When the weather is not cooperating with the cyclists, it could be dangerous or tiring. Is Oslo really a city suitable for bicycling? Politicians promoting bicycling for daily transport claim this is for the environment. But improving the public transportation will also encourage people to drive less and help the city to be more environmentally friendly, won’t it?
2. In Shanghai, where half of the shared bikes carry GPS, the data are much enriched. Studies found that shared bikes are mostly used as supplement to metros, i.e., they are used for travelling between metro station and offices in city center or between metro station and homes out of the city center. In Oslo, the use of city bike in the city center has a similar pattern, while there is no station in the vast residential areas outside the city center. Bydeler (districts) such as Alna, Østensjø and Nordstrand are some of the biggest districts in Oslo with many residents, but sadly they do not have any city bike station. On the other hand, while there are marked bike lanes in almost all roads in Shanghai, in Oslo, marked bike lanes are limited and they sometimes come to an abrupt stop. In the heart of the city, where the city bike stations and routes are most dense, the roads seldomly have bike lanes5. Around some of the city’s most population bike stations, there is no bike lanes. This pushes the cyclists to sidewalks, making pedestrians nervous.
3. Finance matters. A positive consequence of Oslo’s lukewarm shared bike market is that the market has not attracted too many players and too much investment. Oslo people are still using the shared bike as bike, not as a business idea to get investments and / or deposits. Once people view shared bike as a business concept to exploit, there will quickly come over-supply of bikes and clogged sidewalks, then need for graveyards for bikes… I don’t want to tell you where I got to know this.
What else can we do? Just a few thoughts.
1. Using Google Map data to compare our Bysykkel trips against Google’s suggested time and route/distance given the same start and end points and guestimate what people did during those trips. If we see, for example, on some routes, people tend to spend much more or less time than Google suggested, then there is something to explore. Maybe they spend more time because there is a grocery store on the way and people tend to do their grocery shopping there. Then we can suggest building a station there so people do not take their bike into the shop or leave it unattended. If people spend much less time on some other routes, that could be results from taking short-cuts through parks or pedestrian areas. Then it is up to the policeman if these cyclists should get same fines as car drivers that break the traffic rules. But, before we do that, we have some concerns: can we use Google Map’s estimated time for a bike trip? Hmm… Not sure. Google map does not take weather condition into consideration (Google has not explicitly said this but it is my observation. In the last winter, for a long while, Oslo was simply a big rough ice rink. Google’s estimate of my travel time by walk was not different from warm days while I often found myself stuck or fell on the icy street and could not move at all.)
2. Predict the demand for (removal of) bikes at each station by hour (or minute) using, of course, machine learning methods. Machine learning is super-hot nowadays. It seems as if nobody is still making “statistical models”, or “economic models”. Everybody is machine learning. I am not arguing against it. I am just thinking how to get the features I will need. It probably only takes one line of code to call xgboost6, but many lines and hours to do feature engineering, which, as I read, is more helpful in improving predictability than more complicated models. In this case, I can think of two main types of features that will be useful in prediction: the weather features and the social demographic features. For the first type, it is not enough just to have weather data of temperature, precipitation, wind speed, etc., but we also need to study what these elements leave on the city: whether the streets are full of water or covered by ice. Whether the bikes are wet in the morning; in the morning people wear nice clothes to work and don’t want to get wet, while in the afternoon they probably care less about outlook and just want to go home. The social demographic features include where people live and work and what kind of people they are and what kind of work they do, etc.; if people have to deliver kids to kindergarten on the way to work, they are probably less likely to use shared-bikes on the way to kindergarten. These features probably deserve their own models.
3. Include information on private bikes. As shown above, we cannot really blame the city bike users for being the fastest riders and scaring pedestrians. It might be more interesting to use private bike data to catch those who practice for the next Tour de France on the streets of Oslo. For that purpose, we do not know how to get private data without paying for a survey or breaking the law. Maybe data from Strava, the site where people can share their physical activity on social media, but will that be a biased sample or unbiased sample? Self-reported social media data tends to depict one in a way that looks better. I never used that site and even if I do, I will never share a trip from National theatre to Pilestredet Park in 35 minutes either with a city bike or with my short legs…
Anyway, have a safe trip!
Total number of records in data set: 1,196,873
Number of trips started and ended on the same day: 1,187,194
Number of trips that ended on the next day: 9,364
Number of trips that ended on the third day: 157
Longest period of keeping the bike: 21 days
Number of trips in the analysis (after data cleaning): 1,097,852
Mean trip duration: 11.8 minutes, median: 9 minutes
Average speed: 2.287 m/s
Median speed: 2.346 m/s
Uphill average speed: 2.085 m/s, median: 2.142 m/s, percentage of trips in total: 43%
Downhill average speed: 2.448 m/s, median: 2.477 m/s, percentage of trips in total: 57%
1To be honest, this is only partly true. I used Bysykkel data because my manager wanted me to test out two softwares: SPSS Modeler and Tableau. So I needed some data to play with. Due to the strict privacy protection laws in Norway, it is hard to find high-quality, real data that are related to people’s daily life (to be interesting) and Bysykkel data meets all these requirements.
2The dataset is fully online and can be obtained by anyone without any security clearance. So please kindly do not raise suspects on my foreign background.
3dlon = difference in longitude; dlat = difference in latitude; a = (sin(dlat/2))^2 + cos(lat1) * cos(lat2) * (sin(dlon/2))^2; c = 2 * atan2( sqrt(a), sqrt(1-a) ); d = R * c (where R is the radius of the Earth)
6An open-source software library that provides framework of gradient boosting which is a machine learning method of using emsemble of many weaker learners to get a strong learner for both regression and classification.