一个程序员的辩白

25 Jul 2017

Use BBR TCP congestion control in OpenVZ

Assumption

I’m assuming you’re using Linux distribution, and in a KVM environment(Which the loop device will be used).

Any step need root privilege, the sudo prefix will be added.

Making an Alpine Linux image

First thing you should do is to choose a Linux distribution, among all known Linux distros, Alpine Linux is a good choice, it’s small and simple, so it gonna be our choice.

The core idea is to making an UML(User-mode Linux) image.

Before get started, we assuming current directory is ~/uml, you may put all your UML-related stuff in here.

Now making a empty image file, and mount to a folder:

ROOTFS="alpine_rootfs.img"

# Block size 1M, block count 192.  thus the image sized 192M
dd if=/dev/zero of=$ROOTFS bs=1M count=192

# File system label name
LBL_NAME="ALPINE_ROOT"

# Using the file system of yours
mkfs.ext4 -L $LBL_NAME $ROOTFS

mkdir alpine

# Mount image file via loop device into alpine/ folder
sudo mount -o loop $ROOTFS alpine/

After that, lsblk and mount command will give you some clues:

# lsblk output
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
loop0    7:0    0   192M  0 loop /home/x/uml/alpine

# mount output
/home/x/uml/alpine_uml.img on /home/x/uml/alpine type ext4 (rw,relatime,data=ordered)

 

Choose your Alpine Linux version(currently it’s 3.6), and using apk tool to build the base system:

# Or "latest-stable"
REL="v3.6"

# Or using this to fetch latest version
# curl -sL https://dl-cdn.alpinelinux.org/alpine/ | grep -Eo "v[0-9]+\.[0-9]+" | sort -V | tail -1


# Mirror url
MIRROR="https://dl-cdn.alpinelinux.org/alpine"

# Repository url
REPO="$MIRROR/$REL/main"

# Get architecture of current machine
ARCH=`uname -m`

# Mount point of for $ROOTFS
MNTDIR="alpine"

# Get apk tools version
APKV=`curl -s $REPO/$ARCH/APKINDEX.tar.gz | tar -Oxz | grep -a '^P:apk-tools-static$' -A1 | tail -1 | cut -d: -f2`

# Download apk tools and put into sbin/
curl -s $REPO/$ARCH/apk-tools-static-${APKV}.apk | tar -xz sbin/apk.static

# Install alpine-base into $MNTDIR
sudo sbin/apk.static --repository $REPO --update-cache --allow-untrusted --root $MNTDIR --initdb add alpine-base
# (Try again if any error)

# A brief form
# sudo sbin/apk.static -X $REPO -U --allow-untrusted -p $MNTDIR --initdb add alpine-base

# Write repository url into apk mirrorlist
sudo sh -c "echo $REPO > $MNTDIR/etc/apk/repositories"

Then put partition table into $MNTDIR/etc/fstab:

sudo sh -c "echo LABEL=$LBL_NAME / auto defaults 1 1 >> $MNTDIR/etc/fstab"

The content of $MNTDIR/etc/fstab may like this:

#
# /etc/fstab: static file system information
#
# <file system>   <dir> <type>    <options> <dump>    <pass>
/dev/cdrom	/media/cdrom	iso9660	noauto,ro 0 0
/dev/usbdisk	/media/usb	vfat	noauto,ro 0 0
LABEL=ALPINE_ROOT / auto defaults 1 1

 

If you have anything need to copy from host to UML, you can do at this stage, but make sure it’s static built. Otherwise the dependent libraries can’t be found.

mkdir $ROOTFS/etc/shadowsocks

# Replace SS_STATIC_PATH to your static shadowsocks path
cp $SS_STATIC_PATH/ss-* $ROOTFS/usr/local/bin

# Replace SS_CFG_PATH to yours
cp $SS_CFG_PATH/config.json $ROOTFS/etc/shadowsocks

 

Also, you may want to change some system preferences, which be found at $MNTDIR/etc/sysctl.conf

For exmaple, a common used network optimization would be:

# max open files
fs.file-max = 51200
# max read buffer
net.core.rmem_max = 67108864
# max write buffer
net.core.wmem_max = 67108864
# default read buffer
net.core.rmem_default = 65536
# default write buffer
net.core.wmem_default = 65536
# max processor input queue
net.core.netdev_max_backlog = 4096
# max backlog
net.core.somaxconn = 4096
# resist SYN flood attacks
net.ipv4.tcp_syncookies = 1
# reuse timewait sockets when safe
net.ipv4.tcp_tw_reuse = 1
# turn off fast timewait sockets recycling
net.ipv4.tcp_tw_recycle = 0
# short FIN timeout
net.ipv4.tcp_fin_timeout = 30
# short keepalive time
net.ipv4.tcp_keepalive_time = 1200
# outbound port range
net.ipv4.ip_local_port_range = 10000 65000
# max SYN backlog
net.ipv4.tcp_max_syn_backlog = 4096
# max timewait sockets held by system simultaneously
net.ipv4.tcp_max_tw_buckets = 5000
# turn on TCP Fast Open on both client and server side
net.ipv4.tcp_fastopen = 3
# TCP receive buffer
net.ipv4.tcp_rmem = 4096 87380 67108864
# TCP write buffer
net.ipv4.tcp_wmem = 4096 65536 67108864
# turn on path MTU discovery
net.ipv4.tcp_mtu_probing = 1
# BBR
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr

 

When all things done, remember umount the image:

sudo umount $MNTDIR

 

Making a User-mode Linux boot image(vmlinux)

In this phase, we need to making to User-mode Linux image, you should choose the kernel at your needs, you can find the kernel source at kernel.org.

I highly recommend you guys choose the stable/longterm version. now I assmunig the 4.12.3 is used, it’s a stable version when I writing this article.

Before that, you should install the build dependencies:

# Change to your package manager
sudo pacman -S ncurses bc screen
wget --no-check-certificate https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.12.3.tar.xz
# Extract to current directory(~/uml)
tar xvf linux-4.12.3.tar.xz
cd linux-4.12.3
# Generate a default .config file
make defconfig ARCH=um
# Config through ncursor(CLI)
make menuconfig ARCH=um

You can config by your own flavour(the * indicates checked), following is a example:

UML-specific options
    ==> [*] Force a static link
Device Drivers
    ==> [*] Network device support
        ==> <*> Universal TUN/TAP device driver support
[*] Networking support
    ==> Networking options
        ==> [*] IP: TCP syncookie support
        ==> [*] TCP: advanced congestion control
            ==> <*> BBR TCP
            ==> <*> Default TCP congestion control (BBR)
        ==> [*] QoS and/or fair queueing
            ==> <*> Quick Fair Queueing scheduler (QFQ)
            ==> <*> Controlled Delay AQM (CODEL)
            ==> <*> Fair Queue Controlled Delay AQM (FQ_CODEL)
            ==> <*> Fair Queue

After that, you can build up the boot image and strip all symbols:

# Compile UML in nproc jobs(it may takes you some time to finish this)
make -j`nproc` ARCH=um vmlinux

VMLINUX="vmlinux-4.12.3-`uname -m`"
mv vmlinux $VMLINUX

strip -s $VMLINUX

# Move $VMLINUX image into ~/uml
mv $VMLINUX ..

 

Final steup in host machine

Before you launch the UML, you should make a tunnel to allow internet connection between host and guest.

The following script can done this for you(in host machine):

tap.sh (Need root privilege)

#!/bin/bash
# [NOTE] Make sure the TUN/TAP is available

if [ $EUID -ne 0 ]; then
    echo "[ERROR] Must run as root"
    exit 1
fi

# If you want to delete it
# ip tuntap del tap0 mode tap

# The ethernet(prefer) or wireless will be chosen
NET=`ls /sys/class/net | grep [we] | head -1`

# Port reflection range
RNG="8000:10000"

# Host tunnel address
SRC="10.0.0.1"

# UML tunnel address
DST="10.0.0.2"

ip tuntap add tap0 mode tap
ip addr add $SRC/24 dev tap0
ip link set tap0 up
iptables -P FORWARD ACCEPT
iptables -t nat -A POSTROUTING -o $NET -j MASQUERADE
iptables -t nat -A PREROUTING -i $NET -p tcp --dport $RNG -j DNAT --to-destination $DST
iptables -t nat -A PREROUTING -i $NET -p udp --dport $RNG -j DNAT --to-destination $DST

Note that the $RNG indicates how host ports can get transported into guest, you should not transport all ports into guest, that would be inefficient and affect load balance of guest.

 

At the same time, you may need to pack all things up to transform them into a remote host using scp, you can do this:

XZ_NAME="alpine_uml.tar.xz"
# Using xz to pack things up
# Make sure guest machine with xz or xz-utils installed
tar cvfJ $XZ_NAME $ROOTFS $VMLINUX tap.sh uml.sh
scp -P<port> $XZ_NAME root@<remote_ip>:<some_dir>

# extract it in your remote machine(xz is needed)
# tar xvf $XZ_NAME

 

Start UML in user space

You can run the following script to start your UML up:

uml.sh (Need root privilege)

#!/bin/bash

if [[ $EUID -ne 0 ]]; then
    echo "[ERROR] Must run as root"
    exit 1
fi

VMLINUX="vmlinux-4.12.3-`uname -m`"
ROOTFS="alpine_rootfs.img"

# Yes, Alpine Linux can be run with memory 64mb
./$VMLINUX ubda=$ROOTFS rw eth0=tuntap,tap0 mem=64m

# If it prompts you:
#	...
#	Checking environment variables for a tempdir...none found
#	Checking if /dev/shm is on tmpfs...OK
#	Checking PROT_EXEC mmap in /dev/shm...Operation not permitted
#	/dev/shm must be not mounted noexec
# You should:
# export TMPDIR=/tmp

After the UML booted up, you could see the output:

Virtual console 1 assigned device ‘/dev/pts/2’

Virtual console 6 assigned device ‘/dev/pts/8’

You should use screen tool to connect the pseudo-tty:

# If cannot find terminfo entry for 'xxx'
# export TERM="xterm-256color"

# The X is any available number of your pts
sudo screen /dev/pts/X

# NOTE:
# Once screen starts up, the terminal turns blank
# Now you should press [Enter]  then the login CLI shows up
# And no password for root user

Useful screen shortcuts and commands you may need:

Terminate current screen: Control-A + \

Detach current screen: Control-A + D

Restore detached screen: screen -r

If you want to shutdown the UML, you can type halt or poweroff in Alpine Linux.

 

Once logged in guest machine, you should run setup-alpine to auto-configure your machine.

Remember to set your:

IP Address for eth0: 10.0.0.2

Netmask: 255.255.255.0

Gateway: 10.0.0.1

DNS nameserver(s): 8.8.8.8 8.8.4.4

Or configure manually, the first thing is to setup the network, generally there’re three things you need to do:

  • Turn your ethernet device state up

  • Add $DST ip address to that ethernet device

  • Make the $SRC as default router of that ethernet device

And write some DNS servers at /etc/resolv.conf

nameserver 8.8.8.8
nameserver 8.8.4.4

If you with trouble with networking, you may need to restart it:

/etc/init.d/networking restart

 

One gracefully thing then is to make a swapfile:

# Making a swapfile with 64M(Make sure your $ROOTFS have enough space)
dd if=/dev/zero of=/swapfile bs=1M count=64
chmod 600 /swapfile

Yet, the following script can done all those things for you:

/etc/local.d/local.start

#!/bin/bash

# swap on
/sbin/mkswap /swapfile
/sbin/swapon /swapfile

# fix net(Choose one)

#   Auto-configured
/etc/init.d/networking restart

#   Manually-configured
/sbin/ip link set eth0 up
/sbin/ip addr add 10.0.0.2/24 dev eth0
/sbin/ip route add default via 10.0.0.1 dev eth0
# shadowsocks-libev
/usr/bin/nohup /usr/local/bin/ss-server -c /etc/shadowsocks-libev/config.json &

Make sure it havs eXecution privilege:

# Add local service to runlevel(Done once)
rc-update add local default

# Files inside /etc/local.d is belongs to local services
chmod a+x /etc/local.d/local.start

If you want to disable some unnecessary(NOT all) pseudo-ttys, you may wan to edit /etc/inittab to comment out tty*::...

Finally, apk is the default package manager of Alpine Linux, it’s easy to install a tool:

apk add vim

Hopefully, the $ROOTFS and the $VMLINUX are reusable in Linux systems, so you don’t need to rebuild those image and kernel again.

Alpine UML built under Debian 8 x64(3.16.0-4-amd64)

The rootfs.img sized 192M, with 64M swapfile inside

The tap.sh is used to generate a tunnel adapter
The uml.sh is used to launch the Alpine UML


Inside Alpine UML  two major tools available:
    1) vi
    2) shadowsocks-libev

Shadowsocks-libev binary files located at:
    /usr/local/bin

also its config file located at:
    /etc/shadowsocks-libev/config.json

The startup script located at:
    /etc/local.d/local.start

NOTE: This UML cannot be used in any form of production scenario

References

Alpine Linux Wiki

User-mode Linux - Wikipedia

OpenVZ下开启BBR拥塞控制

Alpine Linux Init System - Alpine Linux

User-mode Linux - ArchWiki

User Mode Linux HOWTO

iproute2 cheat sheet