Data maturity challenges for the Scottish transport network

23 May 2024 6 min. read

The reality of ‘open data’ often falls short of what is promised, and as it looks to implement the practice across its functions, Scotland’s transport network is finding there are a multitude of hurdles to clear. Consultants from Zühlke have been assessing the network to address siloes – and have identified three key fronts on which improvement is needed.

Open data is a concept which champions data being openly accessible, exploitable, editable and shared by anyone for any purpose. Open data is licensed under an open license – under the premise that sharing in data equally can help organisations involved excel in their mutual commercial interests, or to do social good.

However, according to a new report from experts at Zühlke, despite the ongoing discourse and efforts, “the reality of open data often falls short of its idealistic vision”. As a result, “data that may at first glance seem ‘open’ still isn’t.”

Data maturity challenges for the Scottish transport network

Lead Data Engineer Charles Roadnight, Professional Data Engineers Sherri Chuah and Tabitha Day, Principal UX Consultant Neelesh Sonawane, and Expert UX Designer at Zühlke Anna Ronco each found this disparity came to the forefront in an engagement which explored Scotland’s transport network data.

They explain, “With Scotland investing significantly into its data infrastructure, we decided to take a look at the country’s frontline services and investigate how operational data across various settings could create meaningful efficiencies and service improvements. We started with the transport sector and what unfolded was a revealing journey into the challenges that hinder the realisation of truly open, interoperable, mature data.”

How open data can help

Before getting into the key hurdles faced by Scotland’s transport department, however, the team are keen to highlight what defines a “good open data” standard. In particular, four characteristics are important. Data should be available, and easy to share or discuss; structured, so that it is easy to process in a standardised format; consistent, with guaranteed availability and reliability to help ensure the quality of research based on it; and traceable, with clear lines drawn back to its origins so that users can determine how trust-worthy it is.

During Zühlke’s exploration of the Scottish transport network’s data, the team examined 14 datasets, including public transport access points, road network details, as well as train, ferry, bus routes and timetables. Using the previously mentioned criteria, they found that there were several issues with the available datasets that were out in the open, but not easy to operate.

The researchers add, “Of the 14 datasets we explored only Network Rail’s open data was in good condition. It was easily accessible via an API and provided wiki documentation with clear instructions that were short and understandable. The issues we encountered are real roadblocks to achieving data maturity and interoperability. They are also not unique to the Scottish transport network. So, if you recognise your organisation as we outline the top challenges below, it might be time to innovate your data infrastructure.”

Outdated datasets
The first of three recurring difficulties Zühlke encountered was outdated datasets. Often, information that could have been very interesting and relevant hasn’t been updated for ten years.

They note, “This immediately raised the question of, ‘how can there be any effective decision-making if these decisions are not even relying on the latest information?’.”

Examples of outdated datasets they encountered include NaPTAN: Great Britain’s dataset of all public transport access points had its schema last updated in 2014. This means that users must perform ‘manual scraping’ to discover what each code meant. Besides this, data itself was either available as one batch or individually for each local authority – so to obtain data for Scotland, it was necessary to manually select all 32 local authorities and download them.

Unstructured data
“Another major roadblock to data interoperability is data that is poorly structured or doesn’t follow a standard format,” Zühlke’s experts continue, “making it impossible to seamlessly connect datasets and requiring time-consuming manual work. Our analysis revealed several datasets with unstructured data, including odd names, no instructions, and generally poor organisation.”

They point to two key examples of this. First, the Traveline national dataset: Great Britain’s dataset, containing public transport timetables for bus, light rail, tram, and ferry services. It had a 300-page long schema that didn’t explain the dataset in words but instead used UML diagrams. Additionally, the schema was in zip folders that were not explicitly described, making it difficult to know where the relevant data was located.

Second, was ferry data. The timetables and operator statistics are “provided by individual companies and not centralised”. Due to this, all the data is formatted differently, making access difficult. For example, CalMac had monthly statistics while Northlink had yearly ones.

Missing information
Finally, some datasets lacked the intended information. This presents users with a major challenge when trying to connect different systems and carry out an analysis – as even after downloading huge amounts of data, they might later find it was not useful, or incomplete.

Another example of this issue came from the Traveline national dataset previously mentioned. After exploring all its schema, Zühlke’s researchers discovered that some timetables “didn’t include the bus times at all, making the manual efforts redundant.”

The road ahead

So, what is the effect of these data issues on residents in Scotland? According to Zühlke’s experts, after digitising Scotland’s transport schedule data internally, “it became evident that residents in rural Scotland face significant challenges when attempting certain public transport journeys during the work week, particularly if their departure time is not precisely timed.”

This issue stems from the lack of accessible open data and established data practices, making it challenging to integrate information across different transport systems. But how can Scotland’s transport authorities resolve this?

To address these challenges and build a more efficient and connected transport system, the researchers recommend a strategic approach to data engineering. This involves cleaning and organising datasets, ensuring they are updated regularly, and promoting standardised formats are crucial steps toward achieving true data maturity.

They conclude, “Our analysis of Scotland’s transport network data has shed light on the key barriers to achieving true data maturity and interoperability for organisations, not just within Scotland, but on a global scale. As we move forward, it is essential for stakeholders, including government bodies and transport organisations to recognise the transformative power of data interoperability. Investing in robust data engineering practices, fostering collaboration between various data providers, and promoting a culture of openness and transparency are key components in building Scotland's data future.”