🌻Infrastructure And OperationsRathouse

rathouse is for capsulv2

OS stuff

buster-backports was enabled for later versions of qemu & libvirtd, since
we need them for virsh backup-begin

Disk Stuff

serial numbers

frontplate number | SSD serial number
0                 | 24010919300014
1                 | 22062712800355
2                 | PHYF209106Y93P8EGN
3                 | PHYF209300LX3P8EGN

use smartctl -x /dev/sdb | grep Serial to print serial numbers.

How to check disk health:

root@rathouse:~# smartctl -x /dev/sda | grep -B1 -A4 'Percentage Used Endurance Indicator' 
0x07  =====  =               =  ===  == Solid State Device Statistics (rev 1) ==
0x07  0x008  1               0  ---  Percentage Used Endurance Indicator
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

root@rathouse:~# smartctl -x /dev/sdb | grep -B1 -A4 'Percentage Used Endurance Indicator'
0x07  =====  =               =  ===  == Solid State Device Statistics (rev 1) ==
0x07  0x008  1               1  ---  Percentage Used Endurance Indicator
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

root@rathouse:~# smartctl -x /dev/sdc | grep -B1 -A4 'Percentage Used Endurance Indicator'
0x07  =====  =               =  ===  == Solid State Device Statistics (rev 1) ==
0x07  0x008  1               0  ---  Percentage Used Endurance Indicator
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

root@rathouse:~# smartctl -x /dev/sdd | grep -B1 -A4 'Percentage Used Endurance Indicator'
0x07  =====  =               =  ===  == Solid State Device Statistics (rev 1) ==
0x07  0x008  1               0  ---  Percentage Used Endurance Indicator
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

EFI / bootloader

Right now only one of the two "host" disks has an EFI boot entry :S please fix this?

# look at current EFI boot settings
efibootmgr -v

# lets say the currently selected boot device is the one that has to be removed.
# in order to run grub-install on the new disk, we have to first unmount the current EFI boot partition
umount /boot/efi

mount /dev/sdb1 /boot/efi

# now that the correct EFI boot partition is mounted at /boot/efi, we can run grub-install
grub-install /dev/sdb

# check our work. Note that this REPLACES the previous boot option instead of adding a new one. 
efibootmgr -v

# further checking our work:
ls /boot/efi
ls /boot/efi/EFI

# for info about how to add an additional  boot entry, see 
# https://www.linuxbabe.com/command-line/how-to-use-linux-efibootmgr-examples#add-uefi-boot-entry

raid setup

setup

# when setting up the partition, I used +3520G as the size and "raid" as the type
fdisk /dev/sda
fdisk /dev/sdd

# i left the chunk size default
# also i chose near over far because
# > mdadm cannot reshape arrays in far X layouts
mdadm --create --verbose --level=10 --metadata=1.2 --raid-devices=2 --layout=n2 /dev/md/tank /dev/sda1 /dev/sdd1
mkfs.ext4 /dev/md/tank
mount /dev/md/tank /tank

recovery

if a disk is pulled from a running system, mdadm assumes the worst & disconnects it from the RAID permanently.

If the system had a way of knowing that the removal and restoration was intentional, it could automatically pick it up. But a software RAID has no such knowledge, so it assumes the worst and acts as if the disk or its connection has become unreliable for some reason, until told otherwise.

to re-attach a disconnected disk, do this:

mdadm --manage /dev/md0 --add /dev/<disk><partition>

benchmarks

mdadm + ext4 using disk-benchmark

 hostname | write throughput | write iops | read throughput | read iops
------------------------------------------------------------------------
 rathouse |     440MB/s      |   39.2k    |  1132MB/s       | 134k
  alpine  |     314MB/s      |   36.3k    |  18.1GB/s (wat) | 81.1k

note: redo these tests when the raid thing is complete

disk-benchmark

this is the script we use to benchmark disks

pls don't change any of the values tbh because it throws off everything

# write throughput
fio --name=write_throughput --directory=. --numjobs=8 \
--size=10G --time_based --runtime=60s --ramp_time=2s --ioengine=libaio \
--direct=1 --verify=0 --bs=1M --iodepth=64 --rw=write \
--group_reporting=1

# write iops
fio --name=write_iops --directory=. --size=10G \
--time_based --runtime=60s --ramp_time=2s --ioengine=libaio --direct=1 \
--verify=0 --bs=4K --iodepth=64 --rw=randwrite --group_reporting=1

# read throughput (sequential reads)
fio --name=read_throughput --directory=. --numjobs=8 \
--size=10G --time_based --runtime=60s --ramp_time=2s --ioengine=libaio \
--direct=1 --verify=0 --bs=1M --iodepth=64 --rw=read \
--group_reporting=1

# read iops (random reads)
fio --name=read_iops --directory=. --size=10G \
--time_based --runtime=60s --ramp_time=2s --ioengine=libaio --direct=1 \
--verify=0 --bs=4K --iodepth=64 --rw=randread --group_reporting=1

# clean up tbh
rm write* read*

alpine vm

ssh -J rathouse.layerze.ro root@192.168.122.9
password: welcome

Subhyphae