🌻Infrastructure And OperationsRathouse
rathouse is for capsulv2
OS stuff
buster-backports was enabled for later versions of qemu & libvirtd, since
we need them for virsh backup-begin
Disk Stuff
serial numbers
frontplate number | SSD serial number
0 | 24010919300014
1 | 22062712800355
2 | PHYF209106Y93P8EGN
3 | PHYF209300LX3P8EGN
use smartctl -x /dev/sdb | grep Serial
to print serial numbers.
How to check disk health:
root@rathouse:~# smartctl -x /dev/sda | grep -B1 -A4 'Percentage Used Endurance Indicator'
0x07 ===== = = === == Solid State Device Statistics (rev 1) ==
0x07 0x008 1 0 --- Percentage Used Endurance Indicator
|||_ C monitored condition met
||__ D supports DSN
|___ N normalized value
root@rathouse:~# smartctl -x /dev/sdb | grep -B1 -A4 'Percentage Used Endurance Indicator'
0x07 ===== = = === == Solid State Device Statistics (rev 1) ==
0x07 0x008 1 1 --- Percentage Used Endurance Indicator
|||_ C monitored condition met
||__ D supports DSN
|___ N normalized value
root@rathouse:~# smartctl -x /dev/sdc | grep -B1 -A4 'Percentage Used Endurance Indicator'
0x07 ===== = = === == Solid State Device Statistics (rev 1) ==
0x07 0x008 1 0 --- Percentage Used Endurance Indicator
|||_ C monitored condition met
||__ D supports DSN
|___ N normalized value
root@rathouse:~# smartctl -x /dev/sdd | grep -B1 -A4 'Percentage Used Endurance Indicator'
0x07 ===== = = === == Solid State Device Statistics (rev 1) ==
0x07 0x008 1 0 --- Percentage Used Endurance Indicator
|||_ C monitored condition met
||__ D supports DSN
|___ N normalized value
EFI / bootloader
Right now only one of the two "host" disks has an EFI boot entry :S please fix this?
# look at current EFI boot settings
efibootmgr -v
# lets say the currently selected boot device is the one that has to be removed.
# in order to run grub-install on the new disk, we have to first unmount the current EFI boot partition
umount /boot/efi
mount /dev/sdb1 /boot/efi
# now that the correct EFI boot partition is mounted at /boot/efi, we can run grub-install
grub-install /dev/sdb
# check our work. Note that this REPLACES the previous boot option instead of adding a new one.
efibootmgr -v
# further checking our work:
ls /boot/efi
ls /boot/efi/EFI
# for info about how to add an additional boot entry, see
# https://www.linuxbabe.com/command-line/how-to-use-linux-efibootmgr-examples#add-uefi-boot-entry
raid setup
setup
# when setting up the partition, I used +3520G as the size and "raid" as the type
fdisk /dev/sda
fdisk /dev/sdd
# i left the chunk size default
# also i chose near over far because
# > mdadm cannot reshape arrays in far X layouts
mdadm --create --verbose --level=10 --metadata=1.2 --raid-devices=2 --layout=n2 /dev/md/tank /dev/sda1 /dev/sdd1
mkfs.ext4 /dev/md/tank
mount /dev/md/tank /tank
recovery
if a disk is pulled from a running system, mdadm assumes the worst & disconnects it from the RAID permanently.
If the system had a way of knowing that the removal and restoration was intentional, it could automatically pick it up. But a software RAID has no such knowledge, so it assumes the worst and acts as if the disk or its connection has become unreliable for some reason, until told otherwise.
to re-attach a disconnected disk, do this:
mdadm --manage /dev/md0 --add /dev/<disk><partition>
benchmarks
mdadm + ext4 using disk-benchmark
hostname | write throughput | write iops | read throughput | read iops
------------------------------------------------------------------------
rathouse | 440MB/s | 39.2k | 1132MB/s | 134k
alpine | 314MB/s | 36.3k | 18.1GB/s (wat) | 81.1k
note: redo these tests when the raid thing is complete
disk-benchmark
this is the script we use to benchmark disks
pls don't change any of the values tbh because it throws off everything
# write throughput
fio --name=write_throughput --directory=. --numjobs=8 \
--size=10G --time_based --runtime=60s --ramp_time=2s --ioengine=libaio \
--direct=1 --verify=0 --bs=1M --iodepth=64 --rw=write \
--group_reporting=1
# write iops
fio --name=write_iops --directory=. --size=10G \
--time_based --runtime=60s --ramp_time=2s --ioengine=libaio --direct=1 \
--verify=0 --bs=4K --iodepth=64 --rw=randwrite --group_reporting=1
# read throughput (sequential reads)
fio --name=read_throughput --directory=. --numjobs=8 \
--size=10G --time_based --runtime=60s --ramp_time=2s --ioengine=libaio \
--direct=1 --verify=0 --bs=1M --iodepth=64 --rw=read \
--group_reporting=1
# read iops (random reads)
fio --name=read_iops --directory=. --size=10G \
--time_based --runtime=60s --ramp_time=2s --ioengine=libaio --direct=1 \
--verify=0 --bs=4K --iodepth=64 --rw=randread --group_reporting=1
# clean up tbh
rm write* read*
alpine vm
ssh -J rathouse.layerze.ro root@192.168.122.9
password: welcome