Maintenance

Enter the interactive Ruby shell for maintenance operations. If the control plane is not started yet, start it.

# Start interactive Ruby shell
sudo docker exec -it ubicloud-app ./bin/pry

Graceful reboot

You should not reboot a VM host directly. Instead, follow these steps:

  1. (Optional) Shutdown all VMs from inside.
  2. From the control plane, issue a command to the VmHost in an interactive shell, which will handle shutdown and recovery. The procedure is defined in prog/vm/host_nexus.rb.
# Get your VmHost. If you have multiple ones, you can select by ID etc.
VmHost.all
VmHost.count
vmh = VmHost.first
vmh = VmHost["vhwkcvts540j1npnsdygqb5jvp"]
 
# List and shutdown VMs. Skipped for now because `stop` is not properly implemented.
# Your VMs are likely safe anyways.
#vmh.vms
#vmh.vms.map { |vm| vm.vm.incr_stop }
 
# Debug: get a list of all tasks, which should have a `Vm::HostNexus` in "wait" label.
Strand.all
 
# . You can see available semaphores here:
# https://github.com/ubicloud/ubicloud/blob/cf0d7c1c462e6e4bc3ff0c1ef1933e3994cecb1a/prog/vm/host_nexus.rb#L300
vmh.incr_reboot
 
# After reboot
 
# If ufw rules are broken
sudo nft list ruleset
sudo systemctl restart ufw docker

Start control plane after host failure

WARNING

Only use demo/docker-compose.yml in demo environments. You need a different setup for production.

cd ubicloud/demo/
docker-compose up -d

After a successful start, open the control plane to check status of VMs. You could use SSH port forwarding like so:

ssh -NL127.0.0.1:3000:127.0.0.1:3000 ubuntu@YOU-SERVER-IP

If there was an accidental reset and you deployed the control plane as a VM host, try reboot the VM host again in the correct way below and start the control plane again.

If you encountered this error, which shouldn’t happen if you have disabled ufw, try restarting ufw.service and docker.service, and run docker-compose up again.

ERROR: Failed to Setup IP tables: Unable to enable SKIP DNAT rule:  (iptables failed: iptables --wait -t nat -I DOCKER -i br-927c92f74753 -j RETURN: iptables: No chain/target/match by that name.
sudo systemctl restart ufw docker

Recover stopped VMs

Ubicloud considers power off from VM guest OS an unavailable state, but the console still shows that it’s running.

 
  label def unavailable
    # If the VM become unavailable due to host unavailability, it first needs to
    # go through start_after_host_reboot state to be able to recover.
    when_start_after_host_reboot_set? do
      incr_checkup
      hop_start_after_host_reboot
    end
 
    begin
      if available?
        decr_checkup
        hop_wait

In this case, manually start both systemd services for the VM if they are not already running to make available? return true and hop back to wait state.

sudo systemctl start {vm.inhost_name} {vm.inhost_name}-dnsmasq

NOTE

Don’t run vm.incr_start_after_host_reboot by hand. You should reboot the VM host in that case.

Start / Stop VM (manual)

Run poweroff from the guest OS, or stop it from the host with systemctl.

sudo systemctl status vm8rf42j.service
sudo systemctl stop vm8rf42j.service

Start it again with systemctl, which takes about 20 seconds.

sudo systemctl start vm8rf42j.service

Resize disks (disabling bdev_ubi)

vhost-user-blk now supports live resize, by means of a new device-sync-config command.

Live resize might be a feature request to cloud-hypervisor, but let’s wait for Ubicloud’s response as well. See also qapi: introduce device-sync-config and https://github.com/qemu/qemu/commit/9eb9350c0e519be97716f6b27f664bd0a3c41a36 on how QEMU implements this.

For now, we have to accept rebooting the VM as a workaround, which is good enough compared to restarting SPDK that also affects other VMs on the same host.

To resize disks this way, we need to disable bdev_ubi by modifying model/spdk_installation.rb before VM creation.

Disk size in Ubicloud is measured in GiB, with 16 MiB added if bdev_ubi is enabled. You should add disk space in GiB as well to match the unit.

With bdev_ubi disabled, follow these instructions to resize the raw disk file, and reboot the VM to let it recognize the change.

# List vdevs
sudo /opt/spdk-v23.09-ubi-0.3/scripts/rpc.py -s /home/spdk/spdk-v23.09-ubi-0.3.sock bdev_get_bdevs
 
vm=vmtr5tbq
 
# Resize the disk
sudo truncate -c -s +160G /var/storage/${vm}/0/disk.raw
# The SIZE argument is an integer and optional unit (example: 10K is 10*1024).
# Units are K,M,G,T,P,E,Z,Y (powers of 1024) or KB,MB,... (powers of 1000).
# Binary prefixes can be used, too: KiB=K, MiB=M, and so on.
 
# Rescan size of the Linux AIO bdev
sudo /opt/spdk-v23.09-ubi-0.3/scripts/rpc.py -s /home/spdk/spdk-v23.09-ubi-0.3.sock bdev_aio_rescan ${vm}_0

Here is a snippet of bdev_get_bdevs output.

  {
    "name": "vm6kqa7e_0",
    "aliases": [
      "d1c07b28-6a6e-4f92-8f17-57f0bfc120d7"
    ],
    "product_name": "AIO disk",
    "block_size": 512,
    "num_blocks": 167772160,
    "uuid": "d1c07b28-6a6e-4f92-8f17-57f0bfc120d7",
    "assigned_rate_limits": {
      "rw_ios_per_sec": 0,
      "rw_mbytes_per_sec": 0,
      "r_mbytes_per_sec": 0,
      "w_mbytes_per_sec": 0
    },
    "claimed": false,
    "zoned": false,
    "supported_io_types": {
      "read": true,
      "write": true,
      "unmap": false,
      "write_zeroes": true,
      "flush": true,
      "reset": true,
      "compare": false,
      "compare_and_write": false,
      "abort": false,
      "nvme_admin": false,
      "nvme_io": false
    },
    "driver_specific": {
      "aio": {
        "filename": "/var/storage/vm6kqa7e/0/disk.raw",
        "block_size_override": true,
        "readonly": false
      }
    }
  }

Stop VM from Ruby (broken)

You can initiate command from the Ruby shell to increment these semaphores, but only restart command works for now.

  semaphore :restart, :stop

First, check the the VM :label, which represents its current state.

st = Strand[prog: "Vm::Nexus"]
st.label
st.semaphores
 
vm = st.subject

WARNING

A stopped VM might be stuck for 1 hour and is difficult to recover. Don’t do it until Ubicloud officially supports stopping VMs.

Then, you can operate on the VM. For example, stop it:

vm.incr_stop
 
# Check state
st.reload
st.label
st.semaphores

Start it again (not implemented):

vm.incr_restart
 
# Check state
st.reload
st.label
st.semaphores

You can observe the state with:

Strand[prog: "Vm::Nexus"].label

Project Quota

Modify config/default_quotas.yml and create a new ubicloud/ubicloud:latest image to increase the project quota.

You can also create multiple projects to workaround this.

Set up

Prerequisites

  • CPU architecture: x64 or arm64.
  • OS: according to spdk_setup.rb, recommended Linux distributions for VM hosts are ubuntu-22.04 and ubuntu-24.04.
  • Host upgrades: disable automatic upgrades with sudo apt purge unattended-upgrades.
  • RAM: most memory will be reserved as huge pages. Do not run other services on the host.
  • Storage: VM images and disks will be stored in /var/storage/, make sure you have sufficient space.
  • Encryption: VM disks are encrypted by default, affecting performance.
  • Network: VM host must have a public IPv6 prefix of at least /64 that is statically routed. Public IPv4 subnet is optional and can be added in the interactive Ruby shell.
  • NIC: Disable Generic Receive Offload (GRO) on the interface used for NAT masquerading.
  • Firewall: let Ubicloud mange the nftables firewall, and don’t host other services directly on the control plane node.
  • License: Ubicloud is licensed under AGPL-3.0 but bdev_ubi is licensed under Elastic License 2.0. The latter may be revised in the future.

yes, you can use Ubicloud against your on-prem servers. The prerequisite is that you should have a public ipv6 prefix for your servers and ssh access. You can add your servers using one of the non hetzner regions in the location picker. It will only serve as a region name. Your servers don’t have to be located in that provider specifically. If you want to use IPv4 for your resources, you should also make sure that there is an IPv4 subnet that is statically routed to the server as well (no ARP).

# /etc/default/ufw
DEFAULT_FORWARD_POLICY="ACCEPT"

To avoid nftables.service reload clearing ufw rules, override flush ruleset commands from ExecStop and the /etc/nftables.conf file ExecReload reads. This breaks the cleanup function block_ip4, but that’s for public IPv4 addresses which we don’t use.

mkdir /etc/systemd/system/ufw.service.d
cat >/etc/systemd/system/ufw.service.d/10-nftables.conf <<EOF
[Unit]
After=nftables.service
EOF
 
mkdir /etc/systemd/system/nftables.service.d
cat >/etc/systemd/system/nftables.service.d/15-no-flush.conf <<EOF
[Service]
ExecReload=
ExecReload=/usr/bin/true
ExecStop=
ExecStop=/usr/bin/true
EOF
 
sudo systemctl daemon-reload
 
systemctl cat nftables.service ufw.service

You should test the firewall rules after deploying the control plane.

# ports: [3000:3000] only exposes the port over IPv4
curl -v <control plane IPv4 address>:3000

Disabling GRO

If you are experiencing packet loss in the VM, try disabled NIC offloads on the host.

Create a udev rule file:

sudo vim /etc/udev/rules.d/99-disable-gro.rules

Add the following line (replace enp193s0f0np0 with your interface name):

ACTION=="add", SUBSYSTEM=="net", KERNEL=="enp193s0f0np0", RUN+="/sbin/ethtool -K enp193s0f0np0 gro off"

You can also run sudo ethtool -K enp193s0f0np0 gro off manually to test the fix first.

sudo ethtool -K enp193s0f0np0 gro off
sudo ethtool -k enp193s0f0np0 | grep receive-offload
 
# Test in VM with curl

IPv4 NAT

NOTE

Add the nftables.d configuration by hand and build the image from https://github.com/l2dy-forks/ubicloud/tree/patch-l2dy instead of patching.

WARNING

This may break subnet isolation. Use with care.

Ubicloud does not configure NAT masquerading by default, and the packet is dropped at interface enp193s0f0np0.

ncrxvwnpz3 -> vethivms21c2t -> vethovms21c2t -> enp193s0f0np0

There are two steps to configure IPv4 NAT for VMs.

First, we need to set up masquerading. Save the following as /etc/nftables.d/99-custom-ubicloud-nat.conf,

#!/usr/sbin/nft -f
table ip ubicloud_nat;
delete table ip ubicloud_nat;
table ip ubicloud_nat {
  chain prerouting {
    type nat hook postrouting priority srcnat; policy accept;
    oifname == "enp193s0f0np0" ip saddr 10.0.0.0/8 counter masquerade
  }
}

and apply it with /usr/sbin/nft -f /etc/nftables.d/99-custom-ubicloud-nat.conf.

Then, we need to let ubicloud set up routes for local_ipv4 along with the existing public_ipv6. To implement this, we have to modify ubicloud code and build the image.

In vm_setup.rb, there are several addresses involved.

  1. gua: public_ipv6 param from ephemeral_net6.to_s.
  2. ip4: public_ipv4 param from ip4.to_s || "".
  3. local_ip4: local_ipv4 param from local_vetho_ip.to_s.shellescape || "".
  4. nics: [nic.private_ipv6.to_s, nic.private_ipv4.to_s, nic.ubid_to_tap_name, nic.mac, nic.private_ipv4_gateway] deserialized into (:net6, :net4, :tap, :mac, :private_ipv4_gateway).
class VmSetup
  Nic = Struct.new(:net6, :net4, :tap, :mac, :private_ipv4_gateway)
end

and derivate addresses:

  1. local_ip = NetAddr::IPv4Net.parse(local_ip4)
    1. local_ip.network.to_s: vetho address.
    2. local_ip.next_sib.network.to_s: vethi address.

The addresses we care about is private_ipv4, which is now :net4 from nics.

diff --git a/rhizome/host/lib/vm_setup.rb b/rhizome/host/lib/vm_setup.rb
index 144e1860..15ae0b2c 100644
--- a/rhizome/host/lib/vm_setup.rb
+++ b/rhizome/host/lib/vm_setup.rb
@@ -329,6 +329,10 @@ add element inet drop_unused_ip_packets allowed_ipv4_addresses { #{ip_net} }
 
     r "ip addr replace #{vetho}/32 dev vetho#{q_vm}"
     r "ip route replace #{vm} dev vetho#{q_vm}" if ip4
+    # BEGIN IPv4 NAT
+    local_ip4 = NetAddr::IPv4Net.parse(nics.first.net4).network.to_s
+    r "ip route replace #{local_ip4} dev vetho#{q_vm}"
+    # END IPv4 NAT
     r "echo 1 > /proc/sys/net/ipv4/conf/vetho#{q_vm}/proxy_arp"
 
     r "ip -n #{q_vm} addr replace #{vethi}/32 dev vethi#{q_vm}"

Params: https://github.com/ubicloud/ubicloud/blob/cf0d7c1c462e6e4bc3ff0c1ef1933e3994cecb1a/model/vm.rb#L203-L223, and nic.net4 is nic.private_ipv4.to_s.

Finally, we build the docker image and run it in production.

sudo docker build -t ubicloud/ubicloud:latest .
 
sudo docker-compose down
sudo docker-compose up -d

Note that the changes does not immediately apply to existing VMs. You have to reboot the VM host or recreate the VMs.

Also, the change does not apply to existing VM hosts. You have to modify the files by hand.

sudo -iu rhizome
cd ..
patch -p1 < xxx.diff

Set up control plane

First, you need to patch ubicloud to get IPv4 NAT.

Then, follow https://www.ubicloud.com/docs/quick-start/build-your-own-cloud to set up the control plane, and create a user. You could patch docker-compose.yml to make containers auto-restart on failure:

git clone https://github.com/ubicloud/ubicloud.git
 
cd ubicloud
git apply <<EOF
diff --git a/demo/docker-compose.yml b/demo/docker-compose.yml
index a52d77f5..0f631a7b 100644
--- a/demo/docker-compose.yml
+++ b/demo/docker-compose.yml
@@ -1,6 +1,7 @@
 services:
   postgres:
     image: postgres:15.4
+    restart: unless-stopped
     container_name: ubicloud-postgres
     env_file: .env
     ports:
@@ -25,6 +26,7 @@ services:
 
   app:
     image: ubicloud/ubicloud:latest
+    restart: unless-stopped
     container_name: ubicloud-app
     depends_on:
       db-migrator:
EOF
 
# Generate secrets for demo
./demo/generate_env
 
# Run containers: db-migrator, app (web & respirate), postgresql
docker-compose -f demo/docker-compose.yml up
 
# Visit localhost:3000

Then, downloading images you need on all VM hosts and change RACK_ENV=development to RACK_ENV=production in .env and restart the stack to prevent self-registration.

cd demo
 
# If you started the services with -d
docker-compose down
 
docker-compose up -d

Cloudify a bare-metal server

WARNING

This will reboot the server!

Run the following commands in the interactive Ruby shell on the control plane host, replacing hostname with your server’s IP address and host_id with its host identifier.

strand = Prog::Vm::HostNexus.assemble(
  hostname,
  provider_name: "leaseweb",
  server_identifier: host_id,
  location: "leaseweb-wdc02",
  default_boot_images: ["ubuntu-jammy"]
)
 
puts "Waiting public SSH keys\n\n"
until (ssh_key = strand.reload.subject.sshable.keys.map(&:public_key).first)
  sleep 2
end
puts "Add following public SSH key to '/root/.ssh/authorized_keys' on your machine\n\n"
puts ssh_key

Then add the SSH key to the bare-metal server’s root user and cloudification will proceed automatically.

See https://github.com/ubicloud/ubicloud/discussions/2595 for more information. We are using the leaseweb provider because the Hetzner provider calls APIs in Hosting::Apis abstracted from lib/hosting/hetzner_apis.rb and doesn’t work in self-hosted environment.

Steps ran in the Cloudify process are roughly:

  1. setup_ssh_keys
  2. bootstrap_rhizome
  3. prep
  4. wait_prep
  5. setup_hugepages
  6. setup_spdk
  7. download_boot_images
  8. wait_download_boot_images
  9. prep_reboot
  10. reboot
  11. verify_spdk
  12. verify_hugepages
  13. start_slices
  14. start_vms
  15. wait (reached ready state)

Follow hop_* in https://github.com/ubicloud/ubicloud/blob/main/prog/vm/host_nexus.rb#L47.

Download images

The default version is defined in https://github.com/ubicloud/ubicloud/blob/a670fc0946abc52d945edb2b72caae679d9f073f/config.rb#L151.

Note that "github", "postgres", "ai-" images are not available in self-hosted version, because we don’t have access to the ubicloud-images bucket. https://github.com/ubicloud/ubicloud/blob/a670fc0946abc52d945edb2b72caae679d9f073f/prog/download_boot_image.rb#L30-L59

# Start interactive Ruby shell
sudo docker exec -it ubicloud-app ./bin/pry
# Get your VmHost. If you have multiple ones, you can select by ID etc.
vmh = VmHost.first
 
# Get a list of current images
vmh.boot_images
 
# I assume your vm's boot image is `ubuntu-noble`. You can see available boot images here: https://github.com/ubicloud/ubicloud/blob/cf0d7c1c462e6e4bc3ff0c1ef1933e3994cecb1a/prog/download_boot_image.rb#L61
st = vmh.download_boot_image("ubuntu-noble", version: "20240702")
 
# Wait to download. It will be deleted and raise `No Record Found` when done
st.reload
 
# If you need to abort the task
st.destroy
st.reload
 
# To get a list of all tasks
Strand.all

Config.production? References

spdk CPU usage

It is expected that spdk consumes 200% CPU constantly in polling mode. See https://news.ycombinator.com/item?id=37154138 for the background and design decisions.

sudo /opt/spdk-v23.09-ubi-0.3/bin/spdk_top -r /home/spdk/spdk-v23.09-ubi-0.3.sock

VM disk encryption

VM disks are encrypted by default.

[1] ⚠️ clover-production(main)> VmStorageVolume.all
=> [#<VmStorageVolume["v1..."] @values={:vm_id=>"vm...", :boot=>true, :size_gib=>160, :disk_index=>0, :key_encryption_key_1_id=>"k...", :key_encryption_key_2_id=>nil, :spdk_installation_id=>"etw...", :use_bdev_ubi=>true, :skip_sync=>false, :storage_device_id=>"etr...", :boot_image_id=>"et...", :max_ios_per_sec=>nil, :max_read_mbytes_per_sec=>nil, :max_write_mbytes_per_sec=>nil}>]

With encryption disabled:

[1] ⚠️ clover-production(main)> VmStorageVolume.all
=> [#<VmStorageVolume["v1..."] @values={:vm_id=>"vm...", :boot=>true, :size_gib=>320, :disk_index=>0, :key_encryption_key_1_id=>nil, :key_encryption_key_2_id=>nil, :spdk_installation_id=>"etw...", :use_bdev_ubi=>true, :skip_sync=>false, :storage_device_id=>"etr...", :boot_image_id=>"et...", :max_ios_per_sec=>nil, :max_read_mbytes_per_sec=>nil, :max_write_mbytes_per_sec=>nil}>]

With encryption, read performance is

$ sudo fio --filename=/dev/vda --direct=1 --rw=randread --bs=4k --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 --time_based --group_reporting --name=iops-test-job --eta-newline=1 --readonly
...
iops-test-job: (groupid=0, jobs=4): err= 0: pid=1077: Sun Mar 16 14:15:17 2025
  read: IOPS=211k, BW=825MiB/s (866MB/s)(96.7GiB/120002msec)
    slat (usec): min=3, max=5484, avg=17.18, stdev=42.32
    clat (usec): min=859, max=30932, avg=4827.97, stdev=2601.95
     lat (usec): min=864, max=30983, avg=4845.16, stdev=2608.66
    clat percentiles (usec):
     |  1.00th=[ 1696],  5.00th=[ 2114], 10.00th=[ 2573], 20.00th=[ 3032],
     | 30.00th=[ 3359], 40.00th=[ 3752], 50.00th=[ 4146], 60.00th=[ 4621],
     | 70.00th=[ 5276], 80.00th=[ 6128], 90.00th=[ 7701], 95.00th=[ 9896],
     | 99.00th=[14877], 99.50th=[16712], 99.90th=[21890], 99.95th=[23200],
     | 99.99th=[25035]
   bw (  KiB/s): min=702648, max=1015296, per=100.00%, avg=846076.44, stdev=14476.17, samples=956
   iops        : min=175662, max=253824, avg=211519.07, stdev=3619.04, samples=956
  lat (usec)   : 1000=0.01%
  lat (msec)   : 2=3.68%, 4=42.66%, 10=48.79%, 20=4.67%, 50=0.20%
  cpu          : usr=5.85%, sys=55.20%, ctx=7668070, majf=0, minf=1077
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=25358100,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256

Run status group 0 (all jobs):
   READ: bw=825MiB/s (866MB/s), 825MiB/s-825MiB/s (866MB/s-866MB/s), io=96.7GiB (104GB), run=120002-120002msec

Disk stats (read/write):
  vda: ios=25336953/74, sectors=202696360/968, merge=15/35, ticks=25926982/112, in_queue=25927112, util=100.00%

You can see a 25% performance improvement by disabling disk encryption:

$ sudo fio --filename=/dev/vda --direct=1 --rw=randread --bs=4k --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 --time_based --group_reporting --name=iops-test-job --eta-newline=1 --readonly
...
iops-test-job: (groupid=0, jobs=4): err= 0: pid=1212: Mon Mar 17 14:08:35 2025
  read: IOPS=264k, BW=1033MiB/s (1083MB/s)(121GiB/120001msec)
    slat (usec): min=3, max=19537, avg=13.46, stdev=58.75
    clat (usec): min=544, max=46115, avg=3858.24, stdev=1545.45
     lat (usec): min=554, max=46124, avg=3871.70, stdev=1548.92
    clat percentiles (usec):
     |  1.00th=[ 2245],  5.00th=[ 2409], 10.00th=[ 2573], 20.00th=[ 2900],
     | 30.00th=[ 3130], 40.00th=[ 3326], 50.00th=[ 3523], 60.00th=[ 3720],
     | 70.00th=[ 3982], 80.00th=[ 4424], 90.00th=[ 5407], 95.00th=[ 6456],
     | 99.00th=[10028], 99.50th=[12256], 99.90th=[17433], 99.95th=[19530],
     | 99.99th=[24511]
   bw (  MiB/s): min=  898, max= 1154, per=100.00%, avg=1033.76, stdev=11.69, samples=956
   iops        : min=230030, max=295614, avg=264643.08, stdev=2993.21, samples=956
  lat (usec)   : 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.06%, 4=70.03%, 10=28.90%, 20=0.97%, 50=0.04%
  cpu          : usr=7.28%, sys=78.70%, ctx=2902637, majf=0, minf=1076
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=31732883,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256

Run status group 0 (all jobs):
   READ: bw=1033MiB/s (1083MB/s), 1033MiB/s-1033MiB/s (1083MB/s-1083MB/s), io=121GiB (130GB), run=120001-120001msec

Disk stats (read/write):
  vda: ios=31700158/91, sectors=253601480/1233, merge=27/57, ticks=27805348/107, in_queue=27805461, util=99.99%

Deleting projects

If you really want to delete a project, after deleting all resources you can from the console, run the following in the interactive Ruby shell.

project = Project["pjmppb65bm4a4m241bdfky3v3r"]
 
# These should be done from the console
# project.private_subnets.each(&:destroy) # or use .delete to skip callbacks
# project.firewalls.each(&:destroy) # or use .delete to skip callbacks
 
project.destroy

Creating a VM

You must use a non-Hetzner region in the location picker to avoid calling Hetzner APIs. See https://github.com/ubicloud/ubicloud/discussions/2615.

Configure the firewall for your private subnet, and you can leave ufw or firewalld in guest OS disabled.

Load Balancer (need to configure a domain)

When you create a load balancers, Ubicloud attempts to order a certificate for the LB and got stuck because the domain is invalid.

https://github.com/ubicloud/ubicloud/blob/cf0d7c1c462e6e4bc3ff0c1ef1933e3994cecb1a/prog/vnet/load_balancer_nexus.rb#L83

Upgrading Cloud Hypervisor

A newer version of Cloud Hypervisor may provide better performance.

This can be tested on a per-VM-host basis. because bootstrap_rhizome setup is only ran once during the cloudify process.

# Modify `VERSION =` to customize cloud-hypervisor version. See the following for an example.
sudo -iu /home/rhizome
 
vim host/lib/cloud_hypervisor.rb
sudo ./host/bin/download-cloud-hypervisor 44.0 \
  f58e5d8684a5cbd7c4b8a001a1188ac79b9d4dda8115e1b3d5faa8c29038119c \
  6d268b947adf2b9b72c13cc8bda156e27c9a450474001d762e9bd211f90136fa
 
# Patch for compatibility with newer cloud-hypervisor
patch -p1 host/lib/vm_setup.rb < xxx.diff
# (:version, :sha256_ch_bin, :sha256_ch_remote)
VersionClass.new("44.0", "f58e5d8684a5cbd7c4b8a001a1188ac79b9d4dda8115e1b3d5faa8c29038119c", "6d268b947adf2b9b72c13cc8bda156e27c9a450474001d762e9bd211f90136fa")

Fixes for v44.0:

  1. --disk should only be specified once. https://github.com/cloud-hypervisor/cloud-hypervisor/issues/6130
    1. Note that if there are no volumes attached, the VM unit file may become invalid.
  2. Grant CAP_NET_ADMIN to fix the TapSetIp permission error. This should not have happened according to https://github.com/cloud-hypervisor/cloud-hypervisor/issues/1274, so keep an eye on Ubicloud changes when they upgrade.
Mar 16 12:29:06 vhwkcvts540j1npnsdygqb5jvp cloud-hypervisor[34366]: Error booting VM: VmBoot(DeviceManager(CreateVirtioNet(OpenTap(TapSetIp(IoctlError(35094, Os { code: 1, kind: PermissionDenied, message: "Operation not permitted" }))))))
diff --git a/host/lib/vm_setup.rb.orig b/host/lib/vm_setup.rb
index 15ae0b2..95f4d92 100644
--- a/host/lib/vm_setup.rb.orig
+++ b/host/lib/vm_setup.rb
@@ -657,9 +657,9 @@ DNSMASQ_SERVICE
 
     disk_params = storage_volumes.map { |volume|
       if volume.read_only
-        "--disk path=#{volume.image_path},readonly=on \\"
+        "path=#{volume.image_path},readonly=on \\"
       else
-        "--disk vhost_user=true,socket=#{volume.vhost_sock},num_queues=1,queue_size=256 \\"
+        "vhost_user=true,socket=#{volume.vhost_sock},num_queues=1,queue_size=256 \\"
       end
     }
 
@@ -688,13 +688,14 @@ Wants=#{@vm_name}-dnsmasq.service
 [Service]
 Slice=#{slice_name}
 NetworkNamespacePath=/var/run/netns/#{@vm_name}
+AmbientCapabilities=CAP_NET_ADMIN
 ExecStartPre=/usr/bin/rm -f #{vp.ch_api_sock}
 
 ExecStart=#{CloudHypervisor::VERSION.bin} -v \
 --api-socket path=#{vp.ch_api_sock} \
 --kernel #{CloudHypervisor::FIRMWARE.path} \
-#{disk_params.join("\n")}
---disk path=#{vp.cloudinit_img} \
+--disk #{disk_params.join("\n")}
+path=#{vp.cloudinit_img} \
 --console off --serial file=#{vp.serial_log} \
 --cpus #{cpu_setting} \
 --memory size=#{mem_gib}G,hugepages=on,hugepage_size=1G \

Disk I/O test results seems to imply a performance regression after upgrading to v44.0. This could be related to num_queues=1,queue_size=256 in cloud-hypervisor’s arguments or the default encryption, but we’d rather hold and wait for official updates from Ubicloud.

Features missing

  1. Stop and start VMs. https://github.com/ubicloud/ubicloud/issues/2989
    1. But can be done by hand.
  2. Change VM size when VM is powered off. https://github.com/ubicloud/ubicloud/issues/2989
  3. Granular firewall per-VM without splitting subnets.
    1. With just HTTP services, this does not bother us too much.
    2. A firewall can be attached to multiple subnets, but not VMs, so rules are composable, but using different subnets brings connectivity barrier.
  4. Connect to serial console.
    1. Only needed if you or the Linux distribution messed up.