Deploy a DSH SRE#

Prerequisites#

Building the SRE#

Follow the Data Safe Haven (DSH) deployment guide.

Warning

Make sure you are reading the version of the docs (see the bottom right) that matches the DSH release you recorded in the TRE GitHub issue.

Important:

  1. Ensure you have the correct release of the DSH installed. You should have recorded this in the TRE GitHub issue. Details on installing DSH CLI via pipx are in the Data Safe Haven documentation

  2. You may want to keep a log of the deployment terminal output

  3. A single Safe Haven Management (SHM) environment can support multiple secure research environments, and they do not need to be hosted on the same subscription. These instructions assume the SHM has already been set up. It has for prod5 and details are in the deployments GitHub board

  4. If the SRE you are building is to be used in a Data Study Group (DSG), follow the Data Study Group (DSG) TRE setup.

Create SRE config file#

Configurations for SRE deployments can be adapted from either a blank template or a security-tier specific template as outlined in the DSH docs and are saved as yaml files. To create the config, do the following:

  1. Generate a template config yaml file - or start from scratch

  2. The DSH docs describe the config file structure: Below are suggestions for Turing-specific values:

    • location: typically this will be uksouth

    • subscription_id: This will have been provided by RCP - see Request Azure Credits

    • name: Suggest only use letters and numbers but hyphens and underscores are allowed

    • admin_email_address: trustedresearch@turing.ac.uk

    • admin_ip_addresses: This must be an IP address (or addresses) that the deployment team have access to. We suggest using193.60.220.253 which is the IP address associated with the Turing VPN and/or the Turing Guest network.

    • research_user_ip_addresses: For Tier 3 use 193.60.220.240 which is limited to ChromeBooks set up by IT. For Tier 2 use 193.60.220.242 which is can be accessed by external users via the GlobalProtect App VPN Support.

    • workspace_skus: A very basic Standard_D2s_v3 will suffice, for a better user experience, something like Standard_D8s_v5 is recommended

  3. Save the config as a yaml

Uploading and deploying an SRE#

The config can then be uploaded and deployed as per the DSH docs

Data Study Group (DSG) TRE setup#

Our current approach for DSG deployments has four steps:

Initial setup#

Per DSG challenge, we initially create an SRE with a single workspace. A Standard_D2s_v3 VM will usually be sufficient for a single data provider to confirm data has been ingressed correctly prior to the DSG event. This step can be started three weeks before the start of the DSG event week.

Deploying additional Workspaces#

DSGs will need more compute power than a single small SRD can provide.

To minimise costs, we deploy additional workspaces ~5 days before the start of the DSG event week. This gives us time to verify that the SRDs are working correctly and to request quota increases if necessary. For teams of 10 participants, we found that a total of 4 VMs of the following sizes satisfy the teams compute requirements:

  • Dv5 and Dsv5-series (CPU only)

    • One SRD of size Standard_D8_v5.

    • Two SRDs of size Standard_D16_v5.

  • NCasT4 series (GPU enabled)

    • One SRD of size Standard_NC8as_T4_v3.

If you deployed a single small SRD in the initial setup, you can replace it to the appropriate sizes by updating the config file, reuploading and re-deploying.

The SRD of size Standard_NC8as_T4_v3 is powered by a GPU, and most of the time we will need to contact Azure support for a quota increase. Check the next section for details.

Important

All VMs should be deployed in the UK South region to ensure that the data remains within the UK and there is no incompatibility with the rest of the SRE infrastructure. Additionally, all VMs should have Intel processors, not AMD.

Adding a GPU-powered Workspace#

For Data Study Groups at the Turing, it’s common for participants to request access to a GPU enabled compute VM, so they can use applications such as CUDA in their research. Allocating a GPU-enabled VM that supports CUDA at short notice in Azure can prove tricky, but TRESA needs to be able to quickly provide this on request, so we’ve decided it’s prudent to set this up in advance. From experience, we have found that a single Standard_NC8as_T4_v3 VM satisfies the GPU needs of most DSG teams.

When you try to deploy a new VM or resize an existing one, you might find that the VM family you want is not available. This is likely due to insufficient quota for your desired VM Family. The solution is to request a new quota increase for that VM Family via the Azure Portal. Once quota increase has been approved - update the config file, re-upload and deploy We recommend requesting a quota for the NCasT4_v3 family, which will give you access to a Standard_NC8as_T4_v3 VM.

You might find that, even if you have sufficient quota for the desired VM family, you might not be able to deploy a VM of that type due to high demand in the UK South region. We found that the solution in that case is to contact Microsoft support and request to make that VM family available to you in the region that you want.

Once the GPU machine has been deployed, you can log in to the SRD and verify that the GPU is visible to the OS by running nvidia-smi in a terminal window.

Installing Nvidia drivers on GPU workspaces via the serial console (as dshadmin) can be achieved by:

sudo ubuntu-drivers --gpgpu list
sudo ubuntu-drivers --gpgpu install nvidia:535-server
sudo reboot
nvidia-smi

Stopping and restarting VMs#

VMs can be stopped and restarted via the Azure portal.