Cross-disciplinary Data Integration and real-world deployment

While climate change is evident through observational data, many details about the underlying mechanisms and potential impacts on society have not been studied in detail. Data gathered at different Spatio-temporal resolutions provide additional features and information, each adding a new dimension to help assess how climate change impacts a specific area, such as agriculture. Data also come with different limitations in terms of discipline-specific standards, formats, biases, quality, missingness, contexts and scope of their applications, as well as unknown factors resulting from human and technological interventions at different stages of the data lifecycle. By building a complete digital picture of our natural environment, we can find better ways to track the impacts of climate change on agriculture, biodiversity, oceans, land, water, and the cryosphere. To do this we need to computationally integrate data from different scales, modalities, sources and disciplines considering a range of parameters making it a challenging and extremely expensive endeavour, both in terms of time and resources required. These requirements make it even more important to embed interventions throughout the project lifecycle to enable real-world deployment of existing data-based solutions. It is critical to work with stakeholders across academia, industries, government and the public to respond to climate crises on an unprecedented scale. Unfortunately, it is extremely risky in practice.

The Turing researchers have been able to apply innovative approaches to facilitate the integration of datasets across disciplines, build robust tools and technology and enable the deployment of existing software tools in ways that the original creators may not have originally envisioned. ASG projects have actively convened data scientists, algorithm developers, data owners, software engineers, practitioners and potential end-users from across disciplines. Working on projects like Scivision, DeepSensor, IceNet, Dyme-CHH and ‘Impact of climate change on agriculture’, they have created highly successful frameworks for data and model integration. Open source, reproducible and ethical practices further amplified the iterative improvement and reuse of those computational frameworks by different users. For instance, many models are supplemented by a collection of interactive notebooks with research narratives and code describing real-world applications of data models, exemplifying reproducible processes. Environmental Data Science Book has published multiple of them as executable python-based notebooks showcasing sensor data and models which could be highlighted through Scivision. Scivision’s tree crown use case was initially published as an Environmental Data Science notebook. Thanks to its incorporation in Scivision’s public catalogue, the tree crown model and data are discoverable for the scientific image analysis community. These projects share common challenges in communicating science to experts and non-experts through reproducible and shareable computational notebooks.

Leading the real-world deployment and reuse of research components, RAMs bring new synergies and collaborative opportunities for core developers and external stakeholders For example, the RAM team members in DyME-CHH helped identify the value of creating the CLIM-RECAL project as a starting point so that Turing research team could share their extensive data processing and literature review work on climate projections data with other researchers working in the space so they would not need to duplicate efforts. Through engagement with potential adopters of ASG projects through workshops (AIUK 2022), hackathon-like events (Data Study Group) and partnership meetings, RAM members further amplified the values of the project’s outcomes and impacts as well as created a feedback loop for the potential users to get involved in the projects during the development stage. For several projects described in this paper, the RAM team spearheaded the adoption of an open, regularly maintained GitHub repository so that progress, code, and outputs are available to external stakeholders.