European directives state that transport data must be available to the public to fuel the development of new and innovative ICT solutions for the transport sector. The data sets currently available to the public have however so far not led to many innovative solutions.
The number of data sets published as open data is limited even though the amount of data from systems and sensors in vehicles and infrastructures is increasing. The existing ICT solutions are not designed with data sharing. Thus, it is very resource demanding to prepare the data and make them open. The opening of all these data will require a lot of resources and prioritizing is a challenge. More knowledge on data needs, user expectations and related success factors and barriers is needed to unlock the potential of open transport data. Research is required to generate knowledge on
- Which data to publish?
- What are the user expectations and related success factors and barriers?
- How to better publish open data to maximize the usability for the users?
The project aims to establish a virtual lab offering
- The federated catalogue system is composed of catalogue services for discovery of open data and data characteristics (security issues, quality, type of data, etc.) across different data sources and providers. Semantic technologies based on for example metadata and ontologies will enable applications to find relevant data during runtime, also data depending on the location.
- The tools will ease data publishing and use of open transport data. Today, systems have in general their own or no strategies for how to publish data, and similarly, each project more or less has to sort out how to access data. The tools will for example support data transformations by means of semantic mappings; data management operations such as efficient data store access, data import/export, data enrichment and linking, search and indexing, and caching; efficient data management in a distributed infrastructure that tackles Big Data concerns such as velocity, volume, variety, etc.
The Lean Start-up method is used in the development of the virtual lab and opening of data. The main idea is an iterative approach where feedback among others is collected automatically by mechanisms integrated in the systems being developed. Catalogue search requests will for example be logged to support decisions on the opening of new data sources. Similar mechanisms will be used to get input to the refinements of tools and catalogue services. In addition, users of open data and the virtual lab will be asked to provide feedback on their needs for data and their experiences from use of the virtual lab.
The prototypes are established based on the CKAN open source software and continuously refined during the project. Several system components are developed such as
- Harvesters for harvesting of data from other catalogues.
- A CKAN plug-in for definition of ontologies and annotations of catalogue entries based on these ontologies.
- A CKAN front-end (Web service) for semantic search in the catalogue by means of terms from ontologies.
The prototypes are based on the DataGraft and BigML open source software. The prototypes make access and use of data more easy by support
- Processing (clean, transform, combine, etc.) of historical data
- Processing (clean, transform, combine, etc.) of real-time data streams.
The prototypes are demonstrated on traffic and traffic condition data from road transport and AIS-data from sea transport.