Migrating ResDiary’s TeamCity Server to AKS

In my previous post I described how to run a TeamCity server in AKS, allowing you to gain all the benefits of containerisation and Kubernetes including efficiency savings, a consistent and easily reproducible way of deploying your infrastructure, and a clear separation of your compute and data.

What I didn’t cover in my previous post was what to do if you already have an existing TeamCity server installed on a Virtual Machine, or bare metal. So in this post, I want to talk about the steps involved in migrating our existing TeamCity server from a virtual machine to an AKS cluster.

Migration Plan

The rough plan for the migration was as follows:

Turn off our current TeamCity server.
Take a backup of the existing TeamCity database.
Restore the TeamCity database into our new Azure hosted MySQL server.
Copy our existing TeamCity data onto a new Azure Managed disk.
Edit our TeamCity config for our new server.
Start the new TeamCity server.
Update our DNS entries to point to it.
Update our existing build agents to connect to our updated TeamCity server.

Resource group structure

I’m going to assume we have the following resource groups setup:

Group	Description
my-group	Contains our AKS cluster
my-group-data	Used to store our TeamCity data disk

Although we could have used a Persistent Volume Claim to automatically provision storage for TeamCity, I decided against this for a few reasons:

I wanted to keep our storage separate from our AKS cluster, and using a PVC would have created it under the AKS cluster’s automatically created resource group.
I wanted a single resource group that contained all of our data for the cluster, and nothing else.
Because we had an existing disk I wanted to migrate, it was easier to manually create a new disk and clone the data than allowing a PVC to be created and then copying onto the automatically provisioned disk.

Stopping the Server

Before migrating any data, I shut down the TeamCity service to make sure none of the data changes while we’re migrating it:

$ sudo service teamcity stop

Backing up the existing MySQL server

Our existing TeamCity server virtual machine ran both the TeamCity server itself, along with its MySQL database. As part of the work of moving to AKS, I wanted to separate the data from the compute, and make use of the Azure Database for MySQL Server managed database service. This meant that I needed to backup the current database, and restore it into an Azure database.

To create the database backup, I ran the following command on the machine hosting the current TeamCity MySQL database:

$ mysqldump -u teamcityuser -h localhost -p teamcity > teamcity-<date>-dmp.sql

Creating our new MySQL database

I used the mysql command to connect to our new Azure MySQL server:

$ mysql -u admin@my-sql-database -h my-sql-database.mysql.database.azure.com -p

I then created a new, empty database and a user account, following the recommendations from Jetbrains:

create database teamcity collate utf8_bin;
create user teamcity identified by '<password>';
grant all privileges on teamcity.* to teamcity;
grant process on *.* to teamcity;

Once the database had been created, I switched to it and imported the backup:

use teamcity;
source teamcity-<date>-dmp.sql

TIP: Using the source command instead of running the script directly via the mysql command handles continuing the import after a connection loss, which was occasionally happening to me.

Copying our TeamCity data onto a new disk

To copy our TeamCity data onto a new Azure managed disk, I created a copy of our existing disk, created a new disk, and then used dd to copy the data from one to the other. This is a bit of a convoluted process, but the reason I had to do it was because of an issue that I came across while trying to do the migration, described in https://github.com/kubernetes/kubernetes/issues/63235.

The problem is that the Azure Disk Kubernetes driver doesn’t support disks containing partitions. Since our existing TeamCity server was setup by hand, the data disk contains a single partition. What happened was that when Kubernetes mounted the disk, it didn’t detect an ext4 filesystem, and it formatted the disk.

To work around this, I had to copy the ext4 partition on our existing TeamCity disk into the root of a new disk. You can visualise what I was starting with, and what I needed to get to in the following diagram:

TeamCity Data Partition Diagram

I started by creating a new disk based on my existing TeamCity disk:

az disk create --name teamcitydata-old --resource-group my-group-data --size-gb 100 --location westeurope --source "https://abcd.blob.core.windows.net/vhds/v123456.87954632212315656.vhd"

You can get the URI for the --source parameter by viewing your existing disk in the Azure portal.

Next, I created an empty disk to contain our TeamCity data:

az disk create --name teamcitydata --resource-group my-group-data --size-gb 100 --location westeurope

I then attached both disks to an Azure VM, as shown in the following screenshot. This VM is just being used to copy the data from one disk to the other, so you can either use an existing VM or create a new one for the purpose:

Attach Disks to VM

Once the disks are attached, the physical devices can be found at /dev/disk/azure/scsi1. You can list them by running ls -l /dev/disk/azure/scsi1:

$ ls -l /dev/disk/azure/scsi1/
total 0
lrwxrwxrwx 1 root root 12 May 24 10:47 lun0 -> ../../../sdd
lrwxrwxrwx 1 root root 12 May 24 10:47 lun1 -> ../../../sdc
lrwxrwxrwx 1 root root 13 May 24 10:47 lun1-part1 -> ../../../sdc1

In the example above, we have two devices (lun0, and lun1), and lun1 has a single partition on it (lun1-part1).

I copied the data from the TeamCity data partition onto the device using the following command:

$ sudo dd if=/dev/disk/azure/scsi1/lun1-part1 of=/dev/disk/azure/scsi1/lun0 bs=64M status=progress
42948624384 bytes (43 GB, 40 GiB) copied, 4128.42 s, 10.4 MB/s

This process took just over an hour when I did it.

Once the data was copied, I resized the ext4 filesystem to use up all the available space on the disk. First I ran e2fsck on the new disk:

$ sudo e2fsck -f /dev/disk/azure/scsi1/lun0
e2fsck 1.43.5 (04-Aug-2017)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
/lost+found not found.  Create<y>? yes
Pass 4: Checking reference counts
Pass 5: Checking group summary information

/dev/disk/azure/scsi1/lun0: ***** FILE SYSTEM WAS MODIFIED *****
/dev/disk/azure/scsi1/lun0: 186966/2621440 files (6.3% non-contiguous), 8714920/10485504 blocks

I then ran resize2fs to extend the filesystem to use up the full disk:

$ sudo resize2fs /dev/disk/azure/scsi1/lun0
resize2fs 1.43.5 (04-Aug-2017)
Resizing the filesystem on /dev/disk/azure/scsi1/lun0 to 26214400 (4k) blocks.
The filesystem on /dev/disk/azure/scsi1/lun0 is now 26214400 (4k) blocks long.

Adjusting the TeamCity data

Before we could actually use our TeamCity data drive, I had to fix a few things:

I had to reorganise the directory structure slightly because our existing TeamCity data directory wasn’t in the root of the drive.
I altered the hostname in the conf/main-config.xml file.
I altered the database connection properties in the conf/database.properties file to point at the new database.

I started by mounting the new data disk onto a folder on my VM:

$ sudo mkdir -p /mnt/teamcitydata
$ sudo mount /dev/disk/azure/scsi1/lun0 /mnt/teamcitydata

I could then access the TeamCity data files at /mnt/teamcitydata. Once finished, I unmounted the disk and removed the /mnt/teamcitydata directory:

$ sudo umount /mnt/teamcitydata
$ sudo rmdir /mnt/teamcitydata/

Finally - I detached both disks from the VM so that they could be attached to a node in the Kubernetes cluster. At this point the teamcitydata-old disk can be deleted.

Giving AKS permission to access the disk

By default your AKS cluster won’t be able to access your disk. In order for it to be able to access it, you need to grant the Service Principal for the cluster the Contributor role on the disk. From my testing I seemed to need to grant the access at the disk level, rather than at the subscription level, but it’s possible that I was doing something wrong.

To do this, I first found the Service Principal Id of the AKS cluster:

az aks show --resource-group $AksResourceGroup --name $AksName --query "servicePrincipalProfile.clientId" --output tsv

Next, I got the Id of the disk:

az disk show --resource-group $DiskResourceGroup --name $DiskName --query "id" --output tsv

Finally, I used az role assignment create to give the AKS cluster access to the disk:

az role assignment create --assignee $ServicePrincipalId --role Contributor --scope $DiskId

Mapping the disk to the TeamCity container

In order for TeamCity to be able to access the data disk, I added an azureDisk volume to the pod spec:

- name: teamcitydata
  azureDisk:
    kind: Managed
    cachingMode: ReadWrite
    diskName: teamcitydata
    diskURI: /subscriptions/<subscriptionId>/resourceGroups/my-group-data/providers/Microsoft.Compute/disks/teamcitydata

Next, I mounted it onto the TeamCity container:

containers:
  - name: teamcity
      image: jetbrains/teamcity-server:10.0.5
      ...
      volumeMounts:
      - mountPath: "/data/teamcity_server/datadir"
          name: teamcitydata

The full deployment definition looked something like this:

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: teamcity
spec:
  replicas: 1
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: teamcity
    spec:
      volumes:
        - name: teamcitydata
          azureDisk:
            kind: Managed
            cachingMode: ReadWrite
            diskName: *volName
            diskURI: /subscriptions/<subscriptionId>/resourceGroups/my-group-data/providers/Microsoft.Compute/disks/teamcitydata
      containers:
        - name: teamcity
          image: jetbrains/teamcity-server:10.0.5
          env:
            - name: TEAMCITY_SERVER_MEM_OPTS
              value: "-Xmx4g -XX:ReservedCodeCacheSize=350m"
          volumeMounts:
            - mountPath: "/data/teamcity_server/datadir"
              name: teamcitydata

I was then able to start the TeamCity server the usual way:

kubectl apply -f teamcity.dep.yaml

Wrapping up

In this post, I covered how to take an existing TeamCity server and migrate it into an AKS cluster without losing build configurations, history or artifacts. I described a fairly simple process for doing the migration that also left the existing server unmodified in case we needed to switch back for any reason.

Although it took me some experimentation before I came up with the plan described in this post, when I actually performed the migration itself, it went smoothly without any problems.

In future posts I’ll continue describing our journey of moving our build infrastructure over to Kubernetes.