Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Architecture

Legend

In the following diagrams we use colours to indicate who has access or control over particular resources. These are mapped to our roles,

pink items indicate externally controlled resources, outside of the scope of our roles.

Arrows indicate the flow of permitted traffic. Solid lines indicate pushes, that is, they are triggered from the beginning of the arrow. Dotted lines indicate pulls, triggered from the end of the arrow.

Satellite TRE

Figure 1 demonstrates the high-level concept of a satellite TRE. It shows the connection of an existing TRE to a satellite TRE instance deployed remote infrastructure.

The secure TRE tenancy boundary enables the extension of existing governance to the satellite TRE. On the remote infrastructure, a dashed line indicate the boundary of the TRE tenancy. All resources within the tenancy are within the governance domain of the Home TRE, through the governance boundary extension agreed in the shared responsibility model.

A block diagram depicting a satellite TRE. It shows how the satellite TRE is an adjunct to an existing TRE. The TRE admins configure the satellite TRE on remote infrastructure, while the remote infrastructure admins configure the TRE tenancy. TRE Researchers are able to dispatch jobs and manage data from their home TRE workspace.

Figure 1:A schematic of the satellite TRE concept, showing the home TRE and TRE Tenancy.

FRIDGE and TRE Tenancy

Overview

Figure 2 gives an overview of the design of FRIDGE. Compared to Figure 1, Figure 2 reveals detail of the structure of a FRIDGE satellite TRE. It shows how management traffic is isolated from research traffic and part of how the TRE Tenancy is defined through network isolation.

A block diagram depicting a high-level overview of FRIDGE architecture. Shown at the home TRE, FRIDGE instance and the TRE Tenancy boundary. Arrows show the direction of data flow.

Figure 2:A high-level overview of a FRIDGE instance, showing the home TRE and TRE Tenancy.

Requirements

Figure 2 represents a generic FRIDGE deployment, and specific details may vary between implementations. However, there are some requirements which must be met by all implementations,

Dual Network

The FRIDGE instance is split into two networks, each of which contains a K8s cluster. The Access Cluster is responsible for routing traffic from the Home TRE to the Isolated Cluster. The Isolated Cluster has access to sensitive data, and runs jobs on that data. Traffic between the two clusters is strongly restricted by a firewall, with only the connections shown in Figure 2 permitted. In addition, the Isolated Network has no outbound access, beyond the Container Runtime being able to pull container images from the container repository in the Access Network.

The dual-network design forms an important part of our approach to Defence in Depth, in addition to K8s-native network control. In the event of container breakout, or otherwise compromising the K8s nodes, there is still no route to exfiltrate sensitive data.

Connection from Home TRE

Bastion

To avoid publicly exposing the Kube API of the Access Cluster, some sort of bastion (for example a virtual machine running an SSH server, or wireguard) should be used. The nature of this bastion may vary between implementations.

Router and Ingress

To correctly route traffic intended for the Access Cluster, a router or reverse proxy is used. This may route traffic based on port, hostname, prefix or some combination. The nature of this may vary between implementations. All must point to the Access Cluster where a K8s Ingress Controller will direct traffic to the correct service.

Proxies

For Job Submitters, the local API interface and FRIDGE proxy provide transparent access to the FRIDGE API. It will appear to them as a service in the network of their TRE workspace with endpoints for submitting and managing jobs dispatched to the FRIDGE instance. Similarly, TRE Administrators are able to manage the K8s components of their FRIDGE instance through their own API interface.

The proxies and Access Cluster's Kube API are distinct pods. Proxy pods run an SSH daemon and are used to pass requests through to the Isolated Cluster's Kube API or FRIDGE API via an SSH tunnel. Each API Interface at the Home TRE is required to generate an SSH key pair. Hence by installing the correct public key on each proxy, the TRE Operator Organisation can control who has access to the APIs in the Isolated Cluster. It would also be possible to further restrict traffic through network controls such as IP allowlists or exposing the Access Cluster only through a VPN.

FRIDGE internal

A block diagram showing the internal components of FRIDGE K8s clusters.

Figure 3:A diagram showing the key internal components of the FRIDGE Kubernetes clusters. Lines indicate access to private volumes.

Network Policy

Network traffic within the FRIDGE clusters is restricted. This is achieved using Cilium CNI plugin. This is in addition to the network isolation enforced by the networks.

TLS

cert-manager will automatically provision and renew TLS certificates for services which can be reached over HTTPS. For example, the container repository.

Proxies

FRIDGE API

The FRIDGE API provides users with endpoints to manage data, and submit and monitor jobs. Writing a custom API separates Job Submitters from the underlying implementation, so that they may use a single FRIDGE interface irrespective. This API will then be resilient to changes to the FRIDGE Workflow Manager and storage. It will also enable the creation of user-focused FRIDGE tools such as CLIs or web interfaces for job submission and management.

Workflow Manager

The workflow manager receives job specifications from the FRIDGE API and launches jobs in the Job Namespace. The workflow manager is an instance of Argo Workflows.

Job Namespace

To isolate Job Submitters' processes from the rest of the Isolated Cluster, including components which enforce security, jobs may only be run in a dedicated namespace. This namespace has no access to external resources, other than research data and container images, and jobs are restricted to run without privileges.

Container Repository

An instance of the Harbor container registry provides access to container images for the isolated cluster. It acts both as a read-through cache for allowed public registries (such as Docker Hub, Quay and GitHub Container Registry) and as a repository for Job Submitters' own container images. This allows Job Submitters to easily use custom software, by building a container image and pushing to the repository.

Storage

Storage classes

FRIDGE defines two storage classes. One is for holding sensitive data, and the other for non-sensitive data. These storage classes need to be implemented for each target platform, as the appropriate CSI and options will vary.

For secure storage, if an available CSI supports encryption with keys provided by Kubernetes, that can be used. Otherwise, FRIDGE can deploy Longhorn which will create Kubernetes volumes, backed by block storage, with data encrypted at rest.

Object storage

An object storage system is used for managing data assets in the FRIDGE instance. This provides a convenient way to handle the ingress of inputs and egress of results.

Buckets are created for inputs and results. The inputs bucket is read-only to jobs, to prevent the corruption of input data.

The object storage is provided by an instance of Minio and uses a volume of the secure storage class for its backend.

Secure volumes

For higher performance than object storage, encrypted block devices can be accessed directly by jobs.

Insecure volumes

Unencrypted volumes are used by the Container Repository for caching container images.

Glossary

Access Cluster

A Kubernetes cluster with services to manage the connection of the Home TRE to the Isolated Cluster, where sensitive-data workloads are run. It also hosts the Container Repository which enable the Isolated Cluster to pull container images, despite being isolated.

Access Network

The FRIDGE network hosting the Access Cluster. This network acts as a bridge connecting the Home TRE to FRIDGE job execution components.

Container Runtime

The container runtime is the component of a Kubernetes distribution which is responsible for running containers. Between distributions, the particular container runtime may differ, but all will communicate with Kubernetes through a standard interface.

In FRIDGE, it is important that the container runtime of the Isolated Cluster is configured to fetch container images from the container repository, as it will not be able to access public container registries.

Home TRE

An existing TRE, complete with infrastructure, data governance and processes. Research questions are established in this TRE, before data and job specifications are dispatched to the satellite TRE for execution. The satellite TRE formally belongs within the governance boundary of the home TRE.

Isolated Cluster

A Kubernetes cluster with services to run workloads on sensitive data, and manage inputs and results

Isolated Network

The FRIDGE network hosting the Isolated Cluster. This network creates a secure boundary around the FRIDGE components which run sensitive-data workloads. It is a key part of the security of a FRIDGE instance.