(build_tre)=

# Build TRE

> This task is the responsibility of {ref}`role_tresa`.

## Prerequisutes

In order to deploy your TRE (Data Safe Haven SRE) to work with the `prod4` Safe Haven Management Environment, you'll need to have completed the {ref}`tresa_onboarding`.

## Building the TRE

Follow the Data Safe Haven (DSH) [Secure Research Environment deployment guide](https://data-safe-haven.readthedocs.io/en/v4.2.2/deployment/deploy_sre.html), making sure you are reading the version of the docs (see the left-hand sidebar) that matches the DSH release you recorded in the TRE GitHub issue.

**Important:**

1. Ensure you have the correct release of the DSH codebase checked out. You should have recorded this in the TRE GitHub issue. Since you will have forked the codebase, it's worth fetching the release tags from the upstream repo first (skip the first line if you have set the upstream remote previously):

    ```shell
    git remote add upstream https://github.com/alan-turing-institute/data-safe-haven
    git fetch --tags upstream
    git checkout tags/vX.X.X
    ```

2. Use `Start-Transcript` as suggested in the guide to save a log of the deployment
    - Save it somewhere memorable like `logs/<SRE ID>/deploy.txt`

3. See {ref}`create_tre_config` when you reach the `SRE Configuration properties` step.

4. If the TRE you are building is to be used in a Data Study Group (DSG), follow the {ref}`dsg_setup`.

(create_tre_config)=

## Create TRE config file

Once you reach the [SRE Configuration properties](https://data-safe-haven.readthedocs.io/en/v4.2.2/deployment/deploy_sre.html#sre-configuration-properties) step in the DSH deployment documentation, you should generate a JSON configuration file used for generating a new environment.
This is where you should include specific requirements for the TRE for deployment.

To create the config, do the following:

1. Copy the [template codeblock](https://data-safe-haven.readthedocs.io/en/v4.2.2/deployment/deploy_sre.html#sre-configuration-properties) into a new comment on the GitHub issue for this TRE (this will allow other members of TRESA to have a copy of the config)

2. You should set some of the config fields to Turing-specific recommendations:

   - **shmId**: the name of the currently active SHM (as of 21st August 2023 this is `prod4`: the SHM deployed using AzureAD `prod4.turingsafehaven.ac.uk`)
   - **sreId**: You should have recorded this when completing {ref}`create_tre_github_issue`
   - **tier**: This is recorded in the Project Initialisation form on Sharepoint
   - **subscriptionName**: This will have been provided by RCP - see {ref}`azure_credits_request`. Either the subscription name or the ID work. It's probably safer to use the subscription ID to avoid any issues with special characters in the subscription name (if the subscription name cannot be found at deployment time it will fail with the error `Please provide a valid tenant or a valid subscription`). You can find the subscription ID in the Azure Portal by going to `Subscriptions`, once the subscription has been created by RCP.
   - **computeVmImage.version**: The Data Safe Haven SRD image you should also have recorded this when completing {ref}`create_tre_github_issue`
   - **deploymentIpAddresses**: This must be an IP address (or addresses) that the deployment team have access to. We suggest using`193.60.220.253` which is the IP address associated with the Turing VPN and/or the Turing Guest network.
   - **inboundAccessFrom**: For Tier 3 use `193.60.220.240`, which is the IP address associated with the `Turing Secure` network. For Tier 2 use `193.60.220.253`, which is the IP address associated with the Turing VPN and/or the Turing Guest network. For tier 0/1, simply use `Internet` to allow access from anywhere.

3. Save the config as a JSON in the location described in the DSH docs. You'll also need to save the `prod4` SHM config file, which can be found [here](https://github.com/alan-turing-institute/trusted-research/issues/121)

(dsg_setup)=

## Data Study Group (DSG) TRE setup

Our current approach for DSG deployments has four steps:

### Initial setup

Per DSG challenge, we create a [SRE with a single Secure Research Desktop (SRD)](https://data-safe-haven.readthedocs.io/en/v4.2.2/deployment/deploy_sre.html#id1).
In this step, we rely on the default values for VM size provided by the DSH scripts, which deploy a VM of the smallest size.
This is sufficient to run the smoke tests and verify that the SRE is working correctly.
This step can be started three weeks before the start of the DSG event week.

### Deploying additional SRDs

DSGs will need more compute power than a single small SRD can provide.

To minimise costs, we deploy additional SRDs 10 days before the start of the DSG event week.
This gives us time to verify that the SRDs are working correctly and to request quota increases if necessary.
For teams of 10 participants, we found that a total of 4 VMs of the following sizes satisfy the teams compute requirements:

- [Dv5 and Dsv5-series](https://learn.microsoft.com/en-us/azure/virtual-machines/dv5-dsv5-series) (CPU only)
  - One SRD of size `Standard_D8_v5`.
  - Two SRDs of size `Standard_D16_v5`.
- [NCv3-series](https://learn.microsoft.com/en-us/azure/virtual-machines/ncv3-series) (GPU enabled)
  - One SRD of size `Standard_NC6s_v3`.

If you deployed a single small SRD in the initial setup, you can resize it to the appropriate size and then deploy additional SRDs.
The Data Safe Haven docs explain how to [resize the VM](https://data-safe-haven.readthedocs.io/en/v4.2.2/roles/system_manager/manage_deployments.html#resize-the-virtual-machine-vm-of-a-secure-research-desktop-srd) of an existing SRD or [deploy additional SRDs](https://data-safe-haven.readthedocs.io/en/v4.2.2/roles/system_manager/manage_deployments.html#add-a-new-srd).

The SRD of size `Standard_NC6s_v3` is powered by a GPU, and most of the time we will need to contact Azure support for a quota increase.
Check the next section for details.

```{important}
All VMs should be deployed in the `UK South` region to ensure that the data remains within the UK and there is no incompatibility with the rest of the SRE infrastructure.
Additionally, all VMs should have Intel processors, not AMD.
```

```{note}
When resizing a VM in Azure, it sometimes helps to turn off the VM before resizing. Especially the error message includes that the size is "not available in the current hardware cluster".
```

### Adding a GPU-powered SRD

For Data Study Groups at the Turing, it's common for participants to request access to a GPU enabled compute VM, so they can use applications such as CUDA in their research.
Allocating a GPU-enabled VM that supports CUDA at short notice in Azure can prove tricky, but TRESA needs to be able to quickly provide this on request, so we've decided it's prudent to set this up in advance.
From experience, we have found that a single `Standard_NC6s_v3` VM satisfies the GPU needs of most DSG teams.

When you try to deploy a new VM or resize an existing one, you might find that the VM family you want is not available.
This is likely due to insufficient quota for your desired VM Family.
The solution is to request a new quota increase for that VM Family via the Azure Portal.
Follow the instructions under the `Tip` in the [DSH docs](https://data-safe-haven.readthedocs.io/en/v4.2.2/roles/system_manager/manage_deployments.html#resize-the-virtual-machine-vm-of-a-secure-research-desktop-srd), making sure to choose the `UK South` region.
We recommend requesting a quota for the `Standard NCSv3 Family vCPUs` with a vCPU quota of `6`, which will give you access to a `Standard_NC6s_v3` VM.

You might find that, even if you have sufficient quota for the desired VM family, you might not be able to deploy a VM of that type due to high demand in the UK South region.
We found that the solution in that case is to contact Microsoft support and request to make that VM family available to you in the region that you want.
We also found that, even when Microsoft make the VM family available, sometimes you cannot resize an existing VM to the desired VM size.
In that case it seems the only solution is to deploy a new VM of the desired type.
To do so, follow the instructions in the DSH docs to [add a new SRD](https://data-safe-haven.readthedocs.io/en/v4.2.2/roles/system_manager/manage_deployments.html#add-a-new-srd).

When deploying a GPU-enabled VM, make sure to set the `-ipLastOctet` to something different from the CPU-enabled compute VMs; we recommend `180`.
Once the GPU machine has been deployed, you can log in to the SRD and verify that the GPU is visible to the OS by running `nvidia-smi` in a terminal window.

### Shut down the VMs

 Leaving all VMs running is very expensive (especially for the GPU VM), so once deployed and tested they should be stopped and deallocated until a couple of days before the start of the DSG.
 Deallocation means that the resources are allocated to someone else, so you don't get charged for them.
 You can stop and deallocate a VM by clicking the stop button in the Azure Portal.
 The easiest way to find all VMs is to go to the `Virtual Machines` section in the Azure Portal and filter by subscription.
 From experience, we found that starting a VM after it has been deallocated is relatively fast (a matter of minutes).
 But to be on the safe side, you can start all VMs in the afternoon of the Friday prior to the start of the DSG event week (or during the weekend if you are OK with that).