Monthly Archives: March 2015

DB & Google Transit & The Best Laid Plans Of Mice & Men

Rome2rio CTO Bernie Tschirren and I have just returned from a couple of weeks in Europe; we visited Amadeus at their grand campus on the Cote d’Azur, attended the ITB conference in Berlin, and met with a number of companies in Paris. On the in-between weekend Bernie headed to The Netherlands while I visited friends in Italy.

All that travel was made easier by journey planning on Rome2rio — no surprise there — but we also took the opportunity to see what our friends at Google would suggest for our travels. And that, as they say in the classics, is quite a story!

There was quite a stir back in 2012 with the announcement of an exclusive data sharing deal between German Railways (DB) and Google. DB defended that decision in an open letter, saying (forgive the translation if it’s not quite perfect) “The quality of information for our customers is our top priority here.” By that, we assume DB meant that the schedule data must be displayed carefully and in line with various protocols that ensure customers are always seeing accurate, timely information. Restricting distribution of the data to a single partner, Google, was DB’s way of making the data more readily available to consumers while at the same time maintaining control over the quality.

Or so they thought. Check out this Google Transit result for “Amsterdam to Paris”:

Screen Shot 2015-03-16 at 4.36.06 pm

Amsterdam to Paris via Cologne and Frankfurt? Eight hours? I don’t think so! I suspect that seeing one of their services proposed for this route is actually quite an embarrassment for the folks at DB. They entrusted their schedule data to Google to ensure just this sort of ham-fisted result wouldn’t occur, and here is their chosen partner doing exactly what they were trying to avoid. Of course the correct result for this journey is the excellent Thalys high-speed train, a direct service that takes just over three hours.

The wildly inappropriate use of DB routes doesn’t end there. For my trip from Turin to Paris, Google proposed a scenic, 22-hour route that included back-tracking to Milan, meandering through Switzerland, a handful of train changes in Germany, and… well, you get the idea. Here’s Google Transit’s suggestion:

Screen Shot 2015-03-16 at 4.38.38 pm

Goodness me! Twenty hours, seven train changes… DB had sought to protect travelers from seeing poor quality information on 3rd party sites, but surely didn’t imagine that its schedules, even in cases like this where they are displayed quite accurately, could form part of such a bad result for users. They could be forgiven for feeling let down by Google, who don’t appear to be holding up their end of the bargain.

Anyway, back to my travels. I took the more obvious solution, the direct TGV service from Turin to Paris, which runs five times each day and really is a delightful, low-stress way to travel from Italy to the French capital. While on board, I poked around in Google Transit a little more. I found all sorts of routes where, rather than display no results — which might be a better option for their users — Google is displaying similarly inappropriate results along with this disclaimer:

“These results may be incomplete – not all transit agencies in this area have provided their info.

While some operators have joined DB in sharing data with Google — OBB (Austria) and SBB (Switzerland), for example — others, including France’s SNCF, have not. Laying the blame on the holdouts seems a little unfair, and is unlikely to convince any that they should play ball. All of this is clearly an argument for more data openness; after all, the sky hasn’t yet fallen in the UK, the Netherlands or Sweden, all places where government has legislated for public access to all transport data.

We hope the industry sees another lesson here: closed, exclusive arrangements deprive consumers of the benefits that flow from open markets. Making schedule data available to all comers, including highly focused and innovative startups like Rome2rio, Wanderio, GoEuro, FromAtoB and others will always lead to better outcomes for consumers than exclusive arrangements with corporate giants. The proof is out there, plain to see.

Rod Cuthbert

Advertisements

Political and geographic complexity in multi-modal search

We have blogged previously about two interesting challenges that present themselves when you get into the nitty-gritty of building a multi-modal search system:

1) Detecting landmasses. Accurate routing needs to be aware of landmasses and islands, and what is connected to what by land, road, ferry and air.This is critical for finding all possible routes between points, and what combination of ferries and flights will make that connection possible.

2) Political borders. Certain borders cannot be easily crossed; suggesting a driving route from South Korea to North Korea is both unrealistic and an embarrassing user experience. Similar complexities exist in places such as Israel, Pakistan, and Afghanistan.

Political data is also important for displaying accurate place names in our geocoder system. For example, North Elizabeth Station is located in New Jersey state in the USA, hence the fully qualified name of North Elizabeth Station, NJ, USA.

This week we launched significant improvements to the accuracy of our internal system for detecting both landmasses and political regions. Our original implementation utilized data from the Natural Earth dataset, however this data was limited by insufficient resolution and some landmasses were missing. We have now transitioned to the more comprehensive Open Street Maps (OSM) planet data.

The difference is illustrated in the maps below of the Puget Sound area near Seattle, with each landmass represented by a unique color. The original data (left) has much smoother, lower resolution coastlines that are missing much of the detail in the landmass shapes. Some of the smaller islands are completely missing. The new OSM data (right) is more detailed and includes the smaller islands.

landmasses

Steilacoom to Anderson Island is an example query that has been improved. The original, low resolution data caused the routing system to suggest a ferry to nearby Ketron Island instead of Anderson Island (left). The new data fixes this problem (right).

anderson-island

On the political front, Cairo to Amman was a problematic query where Rome2rio suggested driving through Israel (left). The more popular Taba-Aqaba ferry route is now displayed (right) as well as various bus and ferry combinations.

cairo-amman

Whilst developing the new technology, Miles on our team also tackled the engineering challenges involved with implementing a system for very fast lookups of this data. Each lookup provides the landmass and political information for a latitude / longitude co-ordinate, and a search on the Rome2rio site requires thousands of such lookups.

Miles learnt a few interesting facts in the process:

  • The OSM data contains 497,040 separate landmasses (that is, land with a closed coastline).
  • 89% of those landmasses have a coastline perimeter of less than 2 kilometers.
  • The longest bridge between two landmasses is the Donghai Bridge near Shanghai.
  • The region with the greatest density of separate landmasses is around Horsey Island in the UK.

Horsey Island

We will continue geeking out on geospatial data as we keep refining Rome2rio’s search results.