(deploy_dsh_sre)= # Deploy a DSH SRE ## Prerequisites - You must have already {ref}`deployed the SHM ` before deploying any SREs. - You must have completed {ref}`sysadmin_onboarding` before deploying any SREs. ## Building the SRE Follow the Data Safe Haven (DSH) [deployment guide](https://data-safe-haven.readthedocs.io/). :::{warning} Make sure you are reading the version of the docs (see the bottom right) that matches the DSH release you recorded in the TRE GitHub issue. ::: **Important:** 1. Ensure you have the correct release of the DSH installed. You should have recorded this in the TRE GitHub issue. Details on installing DSH CLI via pipx are in the [Data Safe Haven documentation](https://data-safe-haven.readthedocs.io/en/latest/deployment/index.html#install-the-project) 2. You may want to keep a log of the deployment terminal output 3. A single Safe Haven Management (SHM) environment can support multiple secure research environments, and they do not need to be hosted on the same subscription. These instructions assume the SHM has already been set up. It has for `prod5` and details are in the deployments GitHub board 4. If the SRE you are building is to be used in a Data Study Group (DSG), follow the {ref}`dsg_setup`. (create_tre_config)= ## Create SRE config file Configurations for SRE deployments can be adapted from either a blank template or a security-tier specific template as outlined in the [DSH docs](https://data-safe-haven.readthedocs.io/en/latest/deployment/deploy_sre.html#configuration) and are saved as yaml files. To create the config, do the following: 1. Generate a template config yaml file - or start from scratch 2. The [DSH docs](https://data-safe-haven.readthedocs.io/en/latest/deployment/deploy_sre.html#configuration) describe the config file structure: Below are suggestions for Turing-specific values: - **location**: typically this will be `uksouth` - **subscription_id**: This will have been provided by RCP - see {ref}`request_azure_credits` - **name**: Suggest only use letters and numbers but hyphens and underscores are allowed - **admin_email_address**: trustedresearch@turing.ac.uk - **admin_ip_addresses**: This must be an IP address (or addresses) that the deployment team have access to. We suggest using`193.60.220.253` which is the IP address associated with the Turing VPN and/or the Turing Guest network. - **research_user_ip_addresses**: For Tier 3 use `193.60.220.240` which is limited to ChromeBooks set up by IT. For Tier 2 use `193.60.220.242` which is can be accessed by external users via the GlobalProtect App {ref}`vpn_support`. - **workspace_skus**: A very basic `Standard_D2s_v3` will suffice, for a better user experience, something like `Standard_D8s_v5` is recommended 3. Save the config as a yaml (upload_deploy_tre_config)= ## Uploading and deploying an SRE The config can then be uploaded and deployed as per the [DSH docs](https://data-safe-haven.readthedocs.io/en/latest/deployment/deploy_sre.html#upload-the-configuration-file) (dsg_setup)= ## Data Study Group (DSG) TRE setup Our current approach for DSG deployments has four steps: ### Initial setup Per DSG challenge, we initially create an SRE with a single workspace. A `Standard_D2s_v3` VM will usually be sufficient for a single data provider to confirm data has been ingressed correctly prior to the DSG event. This step can be started three weeks before the start of the DSG event week. ### Deploying additional Workspaces DSGs will need more compute power than a single small SRD can provide. To minimise costs, we deploy additional workspaces ~5 days before the start of the DSG event week. This gives us time to verify that the SRDs are working correctly and to request quota increases if necessary. For teams of 10 participants, we found that a total of 4 VMs of the following sizes satisfy the teams compute requirements: - [Dv5 and Dsv5-series](https://learn.microsoft.com/en-us/azure/virtual-machines/dv5-dsv5-series) (CPU only) - One SRD of size `Standard_D8_v5`. - Two SRDs of size `Standard_D16_v5`. - [NCasT4 series](https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/gpu-accelerated/ncast4v3-series?tabs=sizebasic#sizes-in-series) (GPU enabled) - One SRD of size `Standard_NC8as_T4_v3`. If you deployed a single small SRD in the initial setup, you can replace it to the appropriate sizes by updating the config file, reuploading and re-deploying. The SRD of size `Standard_NC8as_T4_v3` is powered by a GPU, and most of the time we will need to contact Azure support for a quota increase. Check the next section for details. :::{important} All VMs should be deployed in the `UK South` region to ensure that the data remains within the UK and there is no incompatibility with the rest of the SRE infrastructure. Additionally, all VMs should have Intel processors, not AMD. ::: ### Adding a GPU-powered Workspace For Data Study Groups at the Turing, it's common for participants to request access to a GPU enabled compute VM, so they can use applications such as CUDA in their research. Allocating a GPU-enabled VM that supports CUDA at short notice in Azure can prove tricky, but TRESA needs to be able to quickly provide this on request, so we've decided it's prudent to set this up in advance. From experience, we have found that a single `Standard_NC8as_T4_v3` VM satisfies the GPU needs of most DSG teams. When you try to deploy a new VM or resize an existing one, you might find that the VM family you want is not available. This is likely due to insufficient quota for your desired VM Family. The solution is to request a new quota increase for that VM Family via the Azure Portal. Once quota increase has been approved - update the config file, re-upload and [deploy](https://data-safe-haven.readthedocs.io/en/latest/deployment/deploy_sre.html#upload-the-configuration-file) We recommend requesting a quota for the [NCasT4_v3](https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/gpu-accelerated/ncast4v3-series?tabs=sizebasic) family, which will give you access to a `Standard_NC8as_T4_v3` VM. You might find that, even if you have sufficient quota for the desired VM family, you might not be able to deploy a VM of that type due to high demand in the UK South region. We found that the solution in that case is to contact Microsoft support and request to make that VM family available to you in the region that you want. Once the GPU machine has been deployed, you can log in to the SRD and verify that the GPU is visible to the OS by running `nvidia-smi` in a terminal window. Installing Nvidia drivers on GPU workspaces via the serial console (as dshadmin) can be achieved by: :::{code} shell sudo ubuntu-drivers --gpgpu list sudo ubuntu-drivers --gpgpu install nvidia:535-server sudo reboot nvidia-smi ::: ### Stopping and restarting VMs VMs can be stopped and restarted via the Azure portal.