Open and Reproducible Technology

Reproducibility is necessary to ensure the highest quality of research outcomes, ensuring that the same analysis when applied to the same data produces the same outcome. Open Source approaches support reproducibility by enabling access not only to the software or tools but also to their building blocks such as methods, data, code, analysis workflows and documentation. Open and reproducible research practices are not often integrated into projects from the beginning, making it extremely challenging to ensure that all results will be easily accessed, openly examined, reused and built upon by others. Furthermore, the importance of ethics in AI, though widely acknowledged, is not practically taught. Open, reproducible and ethical data practices require a common practical framework for development, computational reproducibility, project management, co-creation and transparent communication processes.

Open source code and open data are part of open science practices to ensure that code and data are available under a permissive licence allowing third-party users to reuse different components of research outcomes for any purpose. Commitment to applying open source practices strengthens the transferability of data models by making them discoverable via online repositories and understandable through thorough documentation.

Recognising the importance of open science in environmental research and data science more widely, ASG boasts an array of projects that integrate open research and reproducibility principles from the start. Scivision’s adoption of MapReader developed for humanities researchers by the Living with Machines programme is already a great example of open source creating new opportunities for technological solutions. The software underlying IceNet, TRU-NET and PV solar panel forecasting are also open source tools that have been shared online in public GitHub repositories. Each project accompanies data and additional code to fully reproduce all the results and figures from the published articles, allowing users to evaluate or understand the study and identify their future use. For instance, data generated by IceNet are published on the Polar Data Centre, whereas TRU-NET uses model field data from IFS-ERA5 linked to the Copernicus Knowledge Base. Currently hosted under the Open Climate Fix (GitHub repository), The PV solar panel forecasting project, alongside government data, makes extensive use of open data collected by massive online crowdsourcing projects such as OpenStreetMap and PVoutput.org.

The IceNet codebase and infrastructure closely adhere to open source and reproducibility practices for creating sustainable software used across multiple problem areas. IceNet is already venturing into new directions and has found another practical use in a BAS project to plan the route for the Royal Research Ship - Sir David Attenborough (RRS-SDA) research vessel in the Antarctic. IceNet forecasting is being used to identify the most fuel-efficient routes for the RRS-SDA to take through sea ice. The data products from IceNet are being incorporated as part of efforts to build a Digital Twin of the ship and the polar regions. In the case of the RRS-SDA route planner, this receives real-time data from the actual ship and provides a decision-making aid to the navigator on the bridge. Efforts are also underway to use IceNet in conservation with an active collaboration with the WWF. Lead developers of this project have also participated in a Cambridge Venture Project at the Judge Business School to explore how to build future sustainability pathways for the project as a non-profit initiative.

Developed under the Turing’s Climate Action project Solar nowcasting with machine vision, the PV solar panel forecasting data was established in collaboration with Open Climate Fix (an open science initiative to computationally address climate change issues), and OpenStreetMap (a free community-edited geographic database of the world) through a crowdsourcing effort. Volunteers and citizen scientists tagged the locations of solar panels mapping 25% of all the solar panels in the UK on OpenStreetMap. The team is also working on machine learning methods to detect solar panels from satellite images, which would fill gaps in the PV solar panel location data due to unregulated or independent service providers. This project will establish a worldwide open data “clearinghouse” – data collection from various sources to systematise the billing or service allocation. Offering solar PV geodata, and further formatting and transforming them for machine learning algorithms, data will be useful for the prediction of energy production by regional and national operators like National Grid, or commercial market participants. These outputs will help provide short-term solar power forecasting, demand forecasting, and fleet management and enable new demand management and energy-trading innovations, subsequently helping to cut carbon emissions.

Open science workflows extend beyond open data and open source code. The transparent processes for development have allowed researchers to learn about software engineering, while real-world applications of their work result in new collaborations and the co-creation of innovative tools. will extend the use of predictive models in creating new forecasting technologies and enable users to develop need-based solutions – contributing significantly to reducing the impact of the climate crisis on a regional, national and international scale.