Intro

SadServers is a LeetCode style puzzle for Sysadmins/Site Reliability Engineers/DevOps Engineers or whatever Ops people in IT are called nowadays. The following is a writeup of walking through the challenges given.

Saint John

A developer created a testing program that is continuously writing to a log file /var/log/bad.log and filling up disk. You can check for example with tail -f /var/log/bad.log. This program is no longer needed. Find it and terminate it.

So let’s see what is accessing this file with lsof:

$ lsof /var/log/bad.log
COMMAND   PID   USER   FD   TYPE DEVICE SIZE/OFF  NODE NAME
badlog.py 621 ubuntu    3w   REG  259,1    10629 67701 /var/log/bad.log

Second column lists the ID of the process what writes to the log.

We’ll kill ist with kill:

kill 621

As a oneliner:

kill $(lsof -t /var/log/bad.log)

Santiago

Alice the spy has hidden a secret number combination, find it using these instructions:

1) Find the number of lines where the string Alice occurs in *.txt files in the /home/admin directory 2) There’s a file where “Alice” appears exactly once. In the line after that ocurrence there’s a number. Write both numbers consecutively as one (no new line or spaces) to the solution file (eg if the first number from 1) is 11 and the second 22, you can do echo -n 11 > /home/admin/solution; echo 22 >> /home/admin/solution

A little shell tool magic… grep -Hc file gives us file names and match counters. cut cuts up files on delimiters, grep -oEe allows regex-based extraction of values.

Ad 1):

$ grep Alice -Hnc *.txt | \
	cut -d : -f 2 | \
	python3 -c "import sys; print(sum(int(l) for l in sys.stdin))"

Find the word “Alice” in the target *.txt files and count the occurences by file, |
select the counters only from the previous grep output and |
sum up the resulting list of numbers with a short python script.

Ad 2):

grep Alice -Hnc *.txt | grep -Ee :1$ | cut -d: -f1

Find and count Alice in all target files, |
filter out the file containing only a single occurence |
and get the filename.

$ grep -A1 Alice \
	$(grep Alice -Hnc *.txt | grep -Ee :1$ | cut -d: -f1) | \
	grep -oEe "[0-9]+"

Find the occurence in the file found earlier again, but this time find the line after the matching line with -A1, |
select only numbers from the output.

echo ${PART1}${PART2} > /home/admin/solution

Concat both parts, add a newline at the end and write the result to the given file.

As a oneliner:

echo \
	$(grep Alice -Hnc *.txt | cut -d : -f 2 | python3 -c "import sys; print(sum(int(l) for l in sys.stdin))")$(grep -A1 Alice $(grep Alice -Hnc *.txt | grep -Ee :1$ | cut -d: -f1) | grep -oEe "[0-9]+") \
	> /home/admin/solution

Saskatoon

There’s a web server access log file at /home/admin/access.log. The file consists of one line per HTTP request, with the requester’s IP address at the beginning of each line.

Find what’s the IP address that has the most requests in this file (there’s no tie; the IP is unique). Write the solution into a file /home/admin/highestip.txt. For example, if your solution is “1.2.3.4”, you can do echo "1.2.3.4" > /home/admin/highestip.txt

# -o Only print matching part, -E extended regex, -e pattern is a regex
# IP addresses Regex for
#  0-255.0-255.0-255.0-255
grep -oEe \
  '[12]?[0-9]{1,2}[.][12]?[0-9]{1,2}[.][12]?[0-9]{1,2}[.][12]?[0-9]{1,2}' \
	access.log

This yields a list of IP addresses. Now we need to sort and count (sort | uniq -c), sort by the leading count, highest at the bottom (sort -n), pick the last one (tail -n1), and from the resulting line, pick the second field as it’s the IP address. Write it to the result file > /home/admin/highestip.txt.

As a oneliner:

grep -oEe '[12]?[0-9]{1,2}[.][12]?[0-9]{1,2}[.][12]?[0-9]{1,2}[.][12]?[0-9]{1,2}' access.log | \
	sort | \
	uniq -c | \
	sort -n | \
	tail -n1 | \
	awk '{ print $2 }' \
	> /home/admin/highestip.txt

Tokyo

There’s a web server serving a file /var/www/html/index.html with content “hello sadserver” but when we try to check it locally with an HTTP client like curl 127.0.0.1:80, nothing is returned. This scenario is not about the particular web server configuration and you only need to have general knowledge about how web servers work.

ss tells us that we’re dealing with Apache here:

# ss -panetl | grep 80
LISTEN 0      511                *:80              *:*    users:(("apache2",pid=774,fd=4),("apache2",pid=773,fd=4),("apache2",pid=635,fd=4)) ino:17412 sk:1002 cgroup:/system.slice/apache2.service v6only:0 <->
LISTEN 0      4096               *:8080            *:*    users:(("gotty",pid=551,fd=6)) ino:16863 sk:1003 cgroup:/system.slice/gotty.service v6only:0 <->

As it’s Debian, apache2ctl -S tells us that there’s no special config here, it’s pretty much boiler plate:

# apache2ctl -S
VirtualHost configuration:
*:80                   ip-172-31-21-14.us-east-2.compute.internal (/etc/apache2/sites-enabled/000-default.conf:1)
ServerRoot: "/etc/apache2"
Main DocumentRoot: "/var/www/html"
Main ErrorLog: "/var/log/apache2/error.log"
Mutex default: dir="/var/run/apache2/" mechanism=default
Mutex watchdog-callback: using_defaults
PidFile: "/var/run/apache2/apache2.pid"
Define: DUMP_VHOSTS
Define: DUMP_RUN_CFG
User: name="www-data" id=33
Group: name="www-data" id=33

So let’s just try it:

# curl localhost
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>403 Forbidden</title>
</head><body>
<h1>Forbidden</h1>
<p>You don't have permission to access this resource.</p>
<hr>
<address>Apache/2.4.52 (Ubuntu) Server at localhost Port 80</address>
</body></html>

So, what does Apache say?

# cat /var/log/apache2/error.log
[Mon Dec 26 19:06:44.506725 2022] [mpm_event:notice] [pid 635:tid 140509137360768] AH00489: Apache/2.4.52 (Ubuntu) configured -- resuming normal operations
[Mon Dec 26 19:06:44.506755 2022] [core:notice] [pid 635:tid 140509137360768] AH00094: Command line: '/usr/sbin/apache2'
[Mon Dec 26 19:11:36.248299 2022] [core:error] [pid 773:tid 140508974245440] (13)Permission denied: [client ::1:52158] AH00132: file permissions deny server access: /var/www/html/index.html

Ah. File permissions. Check it out…

# sudo -u www-data namei -mx /var/www/html/index.html
f: /var/www/html/index.html
 Drwxr-xr-x /
 drwxr-xr-x var
 drwxr-xr-x www
 drwxr-xr-x html
 -rw------- index.html

Ok, path to the file is fine. The permissions on the file itself seem off though…

# ls -lhA /var/www/html/
total 4.0K
-rw------- 1 root root 16 Aug  1 00:40 index.html

Jup, Apache’s user www-data is not able to access the file.

So change the file’s permissions:

chmod a+r /var/www/html/index.html
# curl localhost
hello sadserver

curl 127.0.0.1:80 hangs though. Strange. The earlier commands ss and apache2ctl reveal that Apache listens on all IP addresses, SELinux et al. aren’t active as it’s Debian. So… IPTables shenanigans?

# iptables-save
# Generated by iptables-save v1.8.7 on Mon Dec 26 19:19:54 2022
*filter
:INPUT ACCEPT [1677:121936]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [1690:328212]
-A INPUT -p tcp -m tcp --dport 80 -j DROP
COMMIT
# Completed on Mon Dec 26 19:19:54 2022

Indeed. iptables -F takes care of that, done.

As a oneliner:

chmod a+r /var/www/html/index.html ; iptables -F

Manhattan

Your objective is to be able to insert a row in an existing Postgres database. The issue is not specific to Postgres and you don’t need to know details about it (although it may help).

Helpful Postgres information: it’s a service that listens to a port (:5432) and writes to disk in a data directory, the location of which is defined in the data_directory parameter of the configuration file /etc/postgresql/14/main/postgresql.conf. In our case Postgres is managed by systemd as a unit with name postgresql.

Can’t write file? Check if the disk is full… With df -h:

# df -h
Filesystem       Size  Used Avail Use% Mounted on
udev             224M     0  224M   0% /dev
tmpfs             47M  1.5M   46M   4% /run
/dev/nvme1n1p1   7.7G  1.2G  6.1G  17% /
tmpfs            233M     0  233M   0% /dev/shm
tmpfs            5.0M     0  5.0M   0% /run/lock
tmpfs            233M     0  233M   0% /sys/fs/cgroup
/dev/nvme1n1p15  124M  278K  124M   1% /boot/efi
/dev/nvme0n1     8.0G  8.0G   28K 100% /opt/pgdata

Let’s look into it…

# ls -lhA
total 8.0G
-rw-r--r--  1 root     root       69 May 21  2022 deleteme
-rw-r--r--  1 root     root     7.0G May 21  2022 file1.bk
-rw-r--r--  1 root     root     923M May 21  2022 file2.bk
-rw-r--r--  1 root     root     488K May 21  2022 file3.bk
drwx------ 19 postgres postgres 4.0K May 21  2022 main

Purge files and restart the database:

# rm -f *.bk
# systemctl restart postgres

As a oneliner:

rm -f /opt/pgdata/*.bk ; systemctl restart postgres

Capetown

There’s an Nginx web server installed and managed by systemd. Running curl -I 127.0.0.1:80 returns curl: (7) Failed to connect to localhost port 80: Connection refused, fix it so when you curl you get the default Nginx page.

Broken nginx, eh? Let’s ask SystemD what’s up with it:

# systemctl status nginx
● nginx.service - The NGINX HTTP and reverse proxy server
     Loaded: loaded (/etc/systemd/system/nginx.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Mon 2022-12-26 19:32:01 UTC; 1min 21s ago
    Process: 570 ExecStartPre=/usr/sbin/nginx -t (code=exited, status=1/FAILURE)
        CPU: 30ms

Dec 26 19:32:01 ip-172-31-40-160 nginx[570]: nginx: [emerg] unexpected ";" in /etc/nginx/sites-enabled/default:1
Dec 26 19:32:01 ip-172-31-40-160 nginx[570]: nginx: configuration file /etc/nginx/nginx.conf test failed
Dec 26 19:32:00 ip-172-31-40-160 systemd[1]: Starting The NGINX HTTP and reverse proxy server...
Dec 26 19:32:01 ip-172-31-40-160 systemd[1]: nginx.service: Control process exited, code=exited, status=1/FAILURE
Dec 26 19:32:01 ip-172-31-40-160 systemd[1]: nginx.service: Failed with result 'exit-code'.
Dec 26 19:32:01 ip-172-31-40-160 systemd[1]: Failed to start The NGINX HTTP and reverse proxy server.

Ok, so the config’s invalid - the first line is only a ‘;’. Remove it and restart nginx:

$ sed -i -e 1d /etc/nginx/sites-enabled/default
$ systemctl restart nginx

Test:

# curl localhost
<html>
<head><title>500 Internal Server Error</title></head>
<body>
<center><h1>500 Internal Server Error</h1></center>
<hr><center>nginx/1.18.0</center>
</body>
</html>

Damn it, it’s still not working. Check its log file:

# cat /var/log/nginx/error.log
2022/12/26 19:37:18 [alert] 961#961: socketpair() failed while spawning "worker process" (24: Too many open files)
2022/12/26 19:37:18 [emerg] 962#962: eventfd() failed (24: Too many open files)
2022/12/26 19:37:18 [alert] 962#962: socketpair() failed (24: Too many open files)
2022/12/26 19:37:22 [crit] 962#962: *1 open() "/var/www/html/index.nginx-debian.html" failed (24: Too many open files), client: 127.0.0.1, server: _, request: "GET / HTTP/1.1", host: "localhost"

Too many open files… Normally, we would check /etc/security/limits.conf, but as it’s SystemD managed we have to look at the unit:

# systemctl cat nginx
# /etc/systemd/system/nginx.service
[Unit]
Description=The NGINX HTTP and reverse proxy server
After=syslog.target network-online.target remote-fs.target nss-lookup.target
Wants=network-online.target

[Service]
Type=forking
PIDFile=/run/nginx.pid
ExecStartPre=/usr/sbin/nginx -t
ExecStart=/usr/sbin/nginx
ExecReload=/usr/sbin/nginx -s reload
ExecStop=/bin/kill -s QUIT $MAINPID
PrivateTmp=true
LimitNOFILE=10

[Install]
WantedBy=multi-user.target

Ah great. Edit the unit and reload.

# sed -i -e /LimitNOFILE/d /etc/systemd/system/nginx.service
# systemctl daemon-reload
# systemctl restart nginx
# !curl
curl localhost
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

Done.

As a oneliner:

sed -i -e 1d /etc/nginx/sites-enabled/default ; \
sed -i -e /LimitNOFILE/d /etc/systemd/system/nginx.service ; \
systemctl daemon-reload ; \
systemctl restart nginx

Salta

There’s a “dockerized” Node.js web application in the /home/admin/app directory. Create a Docker container so you get a web app on port :8888 and can curl to it. For the solution to be valid, there should be only one running Docker container.

First we check if it’s even broken:

admin@ip-172-31-34-225:~/app$ curl localhost:8888
these are not the droids you're looking for

Huh, odd - somethings listening on that port. What’s going on here, tell us ss!

$ ss -panetl | grep 8888
LISTEN 0      511          0.0.0.0:8888      0.0.0.0:*    ino:11536 sk:2 cgroup:/system.slice/nginx.service <->                                                       
LISTEN 0      511             [::]:8888         [::]:*    ino:11537 sk:6 cgroup:/system.slice/nginx.service v6only:1 <->                                              

Ah, Nginx - disable it:

$ sudo systemctl disable --now nginx
Synchronizing state of nginx.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install disable nginx
Removed /etc/systemd/system/multi-user.target.wants/nginx.service.

Ok, so let’s check out what docker is giving us:

# docker ps -a
CONTAINER ID   IMAGE     COMMAND                  CREATED        STATUS                    PORTS     NAMES
124a4fb17a1c   app       "docker-entrypoint.s…"   3 months ago   Exited (1) 3 months ago             elated_taussig
# docker logs elated_taussig
node:internal/modules/cjs/loader:928
  throw err;
  ^

Error: Cannot find module '/usr/src/app/serve.js'
    at Function.Module._resolveFilename (node:internal/modules/cjs/loader:925:15)
    at Function.Module._load (node:internal/modules/cjs/loader:769:27)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:76:12)
    at node:internal/main/run_main_module:17:47 {
  code: 'MODULE_NOT_FOUND',
  requireStack: []
}

So it seems that the container image is broken somehow. We’re given the image’s source in admin’s homedirectory.

admin@ip-xxx:~/app$ ls
Dockerfile  package-lock.json  package.json  server.js
admin@ip-xxx:~/app$ cat Dockerfile
# documentation https://nodejs.org/en/docs/guides/nodejs-docker-webapp/

# most recent node (security patches) and alpine (minimal, adds to security, possible libc issues)
FROM node:15.7-alpine

# Create app directory & copy app files
WORKDIR /usr/src/app

# we copy first package.json only, so we take advantage of cached Docker layers
COPY ./package*.json ./

# RUN npm ci --only=production
RUN npm install

# Copy app source
COPY ./* ./

# port used by this app
EXPOSE 8880

# command to run
CMD [ "node", "serve.js" ]

Port should be 8888, and the file name in CMD should be server.js

Make the changes and rebuild the container image:

admin@ip-xxx:~/app$ sudo docker build -t app:latest .
Sending build context to Docker daemon  101.9kB
Step 1/7 : FROM node:15.7-alpine
 ---> 706d12284dd5
Step 2/7 : WORKDIR /usr/src/app
 ---> Using cache
 ---> 463b1571f18e
Step 3/7 : COPY ./package*.json ./
 ---> Using cache
 ---> acfb467c80ba
Step 4/7 : RUN npm install
 ---> Using cache
 ---> 5cad5aa08c7a
Step 5/7 : COPY ./* ./
 ---> 59c0ca1ef224
Step 6/7 : EXPOSE 8888
 ---> Running in 2e5faf8ee253
Removing intermediate container 2e5faf8ee253
 ---> c3127219be52
Step 7/7 : CMD [ "node", "server.js" ]
 ---> Running in 5d24a99a1a9e
Removing intermediate container 5d24a99a1a9e
 ---> 3774cc41c752
Successfully built 3774cc41c752
Successfully tagged app:latest

Start the container and check the results:

admin@ip-xxx:~/app$ sudo docker run -d -p 8888:8888 app:latest
397a69aa6832fb6edc922b733fd55c6df169963f30875308287f2298ab99730e
admin@ip-xxx:~/app$ !curl
curl localhost:8888
Hello World!

Done

As a oneliner:

sudo systemctl disable --now nginx ; \
cd ~/app ; \
sed -i \
	-e 's/8880/8888/g' \
	-e 's/serve.js/server.js/g' \
	Dockerfile ; \
sudo docker build -t app:latest . ; \
sudo docker run -d -p 8888:8888 app:latest

Jakarta

Can’t ping google.com. It returns ping: google.com: Name or service not known. Expected is being able to resolve the hostname. (Note: currently the VMs can’t ping outside so there’s no automated check for the solution).

Check files connected to DNS resolution (/etc/hosts, /etc/resolv.conf), check relevant services:

ubuntu@ip-172-31-42-233:/$ cat /etc/hosts
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
ubuntu@ip-172-31-42-233:/$ cat /etc/resolv.conf
# This is /run/systemd/resolve/stub-resolv.conf managed by man:systemd-resolved(8).
# Do not edit.
#
# This file might be symlinked as /etc/resolv.conf. If you're looking at
# /etc/resolv.conf and seeing this text, you have followed the symlink.
#
# This is a dynamic resolv.conf file for connecting local clients to the
# internal DNS stub resolver of systemd-resolved. This file lists all
# configured search domains.
#
# Run "resolvectl status" to see details about the uplink DNS servers
# currently in use.
#
# Third party programs should typically not access this file directly, but only
# through the symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a
# different way, replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.

nameserver 127.0.0.53
options edns0 trust-ad
search us-east-2.compute.internal
ubuntu@ip-172-31-42-233:/$ systemctl status systemd-resolved
● systemd-resolved.service - Network Name Resolution
     Loaded: loaded (/lib/systemd/system/systemd-resolved.service; enabled; vendor preset: enabled)
     Active: active (running) since Mon 2022-12-26 20:12:19 UTC; 1min 27s ago
       Docs: man:systemd-resolved.service(8)
             man:org.freedesktop.resolve1(5)
             https://www.freedesktop.org/wiki/Software/systemd/writing-network-configuration-managers
             https://www.freedesktop.org/wiki/Software/systemd/writing-resolver-clients
   Main PID: 434 (systemd-resolve)
     Status: "Processing requests..."
      Tasks: 1 (limit: 521)
     Memory: 8.2M
        CPU: 115ms
     CGroup: /system.slice/systemd-resolved.service
             └─434 /lib/systemd/systemd-resolved

Dec 26 20:12:18 ip-172-31-42-233 systemd[1]: Starting Network Name Resolution...
Dec 26 20:12:18 ip-172-31-42-233 systemd-resolved[434]: Positive Trust Anchors:
Dec 26 20:12:18 ip-172-31-42-233 systemd-resolved[434]: . IN DS 20326 8 2 e06d44b80b8f1d39a95c0b0d7c65d0>
Dec 26 20:12:18 ip-172-31-42-233 systemd-resolved[434]: Negative trust anchors: home.arpa 10.in-addr.arp>
Dec 26 20:12:19 ip-172-31-42-233 systemd-resolved[434]: Using system hostname 'ip-172-31-42-233'.
Dec 26 20:12:19 ip-172-31-42-233 systemd[1]: Started Network Name Resolution.
Dec 26 20:12:22 ip-172-31-42-233 systemd-resolved[434]: ens5: Failed to read DNSSEC negative trust ancho>

It’s all good. Is DNS resolution really broken?

ubuntu@ip-172-31-42-233:/$ ping www.google.com
ping: www.google.com: Name or service not known

Yes, it is.

Does DNS resolution work, or is the network to blame?

$ dig www.google.at

; <<>> DiG 9.18.1-1ubuntu1.1-Ubuntu <<>> www.google.at
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 52104
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;www.google.at.                 IN      A

;; ANSWER SECTION:
www.google.at.          300     IN      A       142.250.191.163

;; Query time: 116 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Mon Dec 26 20:21:23 UTC 2022
;; MSG SIZE  rcvd: 58

So the SystemD resolved cached the entry and things work as they should on that end.

So… is DNS really used by the system? Check /etc/nsswitch.conf:

ubuntu@ip-172-31-42-233:/$ cat /etc/nsswitch.conf
# /etc/nsswitch.conf
#
# Example configuration of GNU Name Service Switch functionality.
# If you have the `glibc-doc-reference' and `info' packages installed, try:
# `info libc "Name Service Switch"' for information about this file.

passwd:         files systemd
group:          files systemd
shadow:         files
gshadow:        files

hosts:          files
networks:       files

protocols:      db files
services:       db files
ethers:         db files
rpc:            db files

netgroup:       nis

The hosts: line should look differently. On my box, it looks like this:

hosts: mymachines resolve [!UNAVAIL=return] files myhostname dns

So let’s put that in:

ubuntu@ip-172-31-42-233:/$ sudo vim /etc/nsswitch.conf
sudo: unable to resolve host ip-172-31-42-233: Name or service not known
ubuntu@ip-172-31-42-233:/$ ping www.google.com
PING www.google.com (142.251.32.4) 56(84) bytes of data.
^C
--- www.google.com ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1005ms

Jup, works now.

As a oneliner:

sed \
	-i -e 's/hosts:.*/hosts: mymachines resolve [!UNAVAIL=return] files myhostname dns/' \
	/etc/nsswitch.conf

Bern

There are two Docker containers running, a web application (Wordpress or WP) and a database (MariaDB) as back-end, but if we look at the web page, we see that it cannot connect to the database. curl -s localhost:80 |tail -4 returns:

<body id="error-page"> <div class="wp-die-message"><h1>Error establishing a database connection</h1></div></body> </html>

This is not a Wordpress code issue (the image is :latest with some network utilities added). What you need to know is that WP uses “WORDPRESS_DB_” environment variables to create the MySQL connection string. See the ./html/wp-config.php WP config file for example (from /home/admin).

root@ip-172-31-19-232:~# docker ps
CONTAINER ID   IMAGE            COMMAND                  CREATED        STATUS              PORTS                    NAMES
6ffb084b515c   wordpress:sad    "docker-entrypoint.s…"   4 months ago   Up About a minute   0.0.0.0:80->80/tcp       wordpress
0eef97284c44   mariadb:latest   "docker-entrypoint.s…"   4 months ago   Up About a minute   0.0.0.0:3306->3306/tcp   mariadb
root@ip-172-31-19-232:~# docker inspect wordpress
[
    {
        "Id": "6ffb084b515ca482ac58fad406b10837b44fb55610acbb35b8ed4a0fb24de50c",
        "Created": "2022-08-04T03:22:49.885388997Z",
        "Path": "docker-entrypoint.sh",
        "Args": [
            "apache2-foreground"
        ],
        "State": {
            "Status": "running",
            "Running": true,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 894,
            "ExitCode": 0,
            "Error": "",
            "StartedAt": "2022-12-26T20:26:21.348936894Z",
            "FinishedAt": "2022-08-31T15:40:20.366876609Z"
        },
        "Image": "sha256:0a3bdd32ae210e34ed07cf49669dccbdb9deac143ebf27d5cc83136a0e3d9063",
        "ResolvConfPath": "/var/lib/docker/containers/6ffb084b515ca482ac58fad406b10837b44fb55610acbb35b8ed4a0fb24de50c/resolv.conf",
        "HostnamePath": "/var/lib/docker/containers/6ffb084b515ca482ac58fad406b10837b44fb55610acbb35b8ed4a0fb24de50c/hostname",
        "HostsPath": "/var/lib/docker/containers/6ffb084b515ca482ac58fad406b10837b44fb55610acbb35b8ed4a0fb24de50c/hosts",
        "LogPath": "/var/lib/docker/containers/6ffb084b515ca482ac58fad406b10837b44fb55610acbb35b8ed4a0fb24de50c/6ffb084b515ca482ac58fad406b10837b44fb55610acbb35b8ed4a0fb24de50c-json.log",
        "Name": "/wordpress",
        "RestartCount": 0,
        "Driver": "overlay2",
        "Platform": "linux",
        "MountLabel": "",
        "ProcessLabel": "",
        "AppArmorProfile": "docker-default",
        "ExecIDs": null,
        "HostConfig": {
            "Binds": [
                "html:/var/www/html"
            ],
            "ContainerIDFile": "",
            "LogConfig": {
                "Type": "json-file",
                "Config": {}
            },
            "NetworkMode": "default",
            "PortBindings": {
                "80/tcp": [
                    {
                        "HostIp": "",
                        "HostPort": "80"
                    }
                ]
            },
            "RestartPolicy": {
                "Name": "always",
                "MaximumRetryCount": 0
            },
            "AutoRemove": false,
            "VolumeDriver": "",
            "VolumesFrom": null,
            "CapAdd": null,
            "CapDrop": null,
            "CgroupnsMode": "private",
            "Dns": [],
            "DnsOptions": [],
            "DnsSearch": [],
            "ExtraHosts": null,
            "GroupAdd": null,
            "IpcMode": "private",
            "Cgroup": "",
            "Links": null,
            "OomScoreAdj": 0,
            "PidMode": "",
            "Privileged": false,
            "PublishAllPorts": false,
            "ReadonlyRootfs": false,
            "SecurityOpt": null,
            "UTSMode": "",
            "UsernsMode": "",
            "ShmSize": 67108864,
            "Runtime": "runc",
            "ConsoleSize": [
                0,
                0
            ],
            "Isolation": "",
            "CpuShares": 0,
            "Memory": 0,
            "NanoCpus": 0,
            "CgroupParent": "",
            "BlkioWeight": 0,
            "BlkioWeightDevice": [],
            "BlkioDeviceReadBps": null,
            "BlkioDeviceWriteBps": null,
            "BlkioDeviceReadIOps": null,
            "BlkioDeviceWriteIOps": null,
            "CpuPeriod": 0,
            "CpuQuota": 0,
            "CpuRealtimePeriod": 0,
            "CpuRealtimeRuntime": 0,
            "CpusetCpus": "",
            "CpusetMems": "",
            "Devices": [],
            "DeviceCgroupRules": null,
            "DeviceRequests": null,
            "KernelMemory": 0,
            "KernelMemoryTCP": 0,
            "MemoryReservation": 0,
            "MemorySwap": 0,
            "MemorySwappiness": null,
            "OomKillDisable": null,
            "PidsLimit": null,
            "Ulimits": null,
            "CpuCount": 0,
            "CpuPercent": 0,
            "IOMaximumIOps": 0,
            "IOMaximumBandwidth": 0,
            "MaskedPaths": [
                "/proc/asound",
                "/proc/acpi",
                "/proc/kcore",
                "/proc/keys",
                "/proc/latency_stats",
                "/proc/timer_list",
                "/proc/timer_stats",
                "/proc/sched_debug",
                "/proc/scsi",
                "/sys/firmware"
            ],
            "ReadonlyPaths": [
                "/proc/bus",
                "/proc/fs",
                "/proc/irq",
                "/proc/sys",
                "/proc/sysrq-trigger"
            ]
        },
        "GraphDriver": {
            "Data": {
                "LowerDir": "/var/lib/docker/overlay2/ce38c2ce47ad7d4bbbd8f5c7c4ed567023d2121aef7d943d0fa2550593cfbeea-init/diff:/var/lib/docker/overlay2/94106dc67f8c47bdc7f21b7607cae55e2cf94644b34d7698b3e174a3755275a6/diff:/var/lib/docker/overlay2/2936726c07d706d506728ed14cf8cab6e23916647e0838f0f79f86552c7010b3/diff:/var/lib/docker/overlay2/069acc5fa9bb5677b2aeb93fc0ef5773a23d6d7b1218cf75ef82aca71c6bda47/diff:/var/lib/docker/overlay2/c8afefd30aa9208d8c1d54d5fd332fc36608816b4dc77f94c1137c882a4ec821/diff:/var/lib/docker/overlay2/420654ce2976260a0830db88a78d5a347fb80aa7b790b2d9309ec7ae76d91d83/diff:/var/lib/docker/overlay2/422f6df2f56d58836998a749e631cab9406fd37800a3d4f7e4eda1f14f008d08/diff:/var/lib/docker/overlay2/3dfa1839ce3bd0b607fcddd25305125d66c7d2870a6e78dc5a01ba82134e1c42/diff:/var/lib/docker/overlay2/acdc673318ff934a6a1735e4a7c3ebfa98ba0b0e1fa7753bed3c36f7be05c5d7/diff:/var/lib/docker/overlay2/3b59127b861d3c1821590e1bbe53361762ec93b9e2471688cf1f16ed75698536/diff:/var/lib/docker/overlay2/3cdfd4f802d2ed04fe9f783aea4f7875b46338caa8d9cc83df921644933a7697/diff:/var/lib/docker/overlay2/c905fa690140ab324cdda21324f759f356b6672242462ddef823f3a8f2456463/diff:/var/lib/docker/overlay2/922e6ceb9fdab2f37d87a09a651497fa659d3c319b013db7427e8eed7487749b/diff:/var/lib/docker/overlay2/904795f943098776cc0a2c8385029bbeb398ae81f6aeedde6f459b7a06130bdd/diff:/var/lib/docker/overlay2/deea2b08a5738a2f6b20e9a9086884a3f181a21408bb140147395e2d3b6b2b88/diff:/var/lib/docker/overlay2/8fc975a0a76589eeb04b246a231d986b411913521b6bedf2e3a5d05908cce7c2/diff:/var/lib/docker/overlay2/0b6935279d515d0669f8a5565cbc0e4bf36dd21bd6632069d3192c34ec4f7d01/diff:/var/lib/docker/overlay2/11de313511ee3247d7543f251bf73d074c52306b38cb468d4406033996bb5579/diff:/var/lib/docker/overlay2/19a9f2c255dac237587a18b8f354abd11410dafc3284412d728c2bf7056313a0/diff:/var/lib/docker/overlay2/04234c51c40e87365dc7808e823f054571bbf2af947366973be0d82abe2d0a60/diff:/var/lib/docker/overlay2/80fe0530020bda727c7a7e05f8261966265ca1a07871152cf5c223d362ec80c3/diff:/var/lib/docker/overlay2/1abcbc54c55abf66822fefb38dbb5816fc59bdb685ced0528db4a3bcf62ebe9b/diff:/var/lib/docker/overlay2/51d8beae216b4b10c5755ae738c4abd12efa13403c8d1c5b0ced1fd44b9b48f3/diff",
                "MergedDir": "/var/lib/docker/overlay2/ce38c2ce47ad7d4bbbd8f5c7c4ed567023d2121aef7d943d0fa2550593cfbeea/merged",
                "UpperDir": "/var/lib/docker/overlay2/ce38c2ce47ad7d4bbbd8f5c7c4ed567023d2121aef7d943d0fa2550593cfbeea/diff",
                "WorkDir": "/var/lib/docker/overlay2/ce38c2ce47ad7d4bbbd8f5c7c4ed567023d2121aef7d943d0fa2550593cfbeea/work"
            },
            "Name": "overlay2"
        },
        "Mounts": [
            {
                "Type": "volume",
                "Name": "html",
                "Source": "/var/lib/docker/volumes/html/_data",
                "Destination": "/var/www/html",
                "Driver": "local",
                "Mode": "z",
                "RW": true,
                "Propagation": ""
            }
        ],
        "Config": {
            "Hostname": "6ffb084b515c",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "ExposedPorts": {
                "80/tcp": {}
            },
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "WORDPRESS_DB_PASSWORD=password",
                "WORDPRESS_DB_USER=root",
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
                "PHPIZE_DEPS=autoconf \t\tdpkg-dev \t\tfile \t\tg++ \t\tgcc \t\tlibc-dev \t\tmake \t\tpkg-config \t\tre2c",
                "PHP_INI_DIR=/usr/local/etc/php",
                "APACHE_CONFDIR=/etc/apache2",
                "APACHE_ENVVARS=/etc/apache2/envvars",
                "PHP_CFLAGS=-fstack-protector-strong -fpic -fpie -O2 -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64",
                "PHP_CPPFLAGS=-fstack-protector-strong -fpic -fpie -O2 -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64",
                "PHP_LDFLAGS=-Wl,-O1 -pie",
                "GPG_KEYS=42670A7FE4D0441C8E4632349E4FDC074A4EF02D 5A52880781F755608BF815FC910DEB46F53EA312",
                "PHP_VERSION=7.4.30",
                "PHP_URL=https://www.php.net/distributions/php-7.4.30.tar.xz",
                "PHP_ASC_URL=https://www.php.net/distributions/php-7.4.30.tar.xz.asc",
                "PHP_SHA256=ea72a34f32c67e79ac2da7dfe96177f3c451c3eefae5810ba13312ed398ba70d"
            ],
            "Cmd": [
                "apache2-foreground"
            ],
            "Image": "wordpress:sad",
            "Volumes": {
                "/var/www/html": {}
            },
            "WorkingDir": "/var/www/html",
            "Entrypoint": [
                "docker-entrypoint.sh"
            ],
            "OnBuild": null,
            "Labels": {
                "com.docker.compose.config-hash": "fa7c866c125b6bdbe158555fca49a8d871d32282817db2f396e0ff43d7a93eb4",
                "com.docker.compose.container-number": "1",
                "com.docker.compose.oneoff": "False",
                "com.docker.compose.project": "admin",
                "com.docker.compose.project.config_files": "docker-compose.yaml",
                "com.docker.compose.project.working_dir": "/home/admin",
                "com.docker.compose.service": "wordpress",
                "com.docker.compose.version": "1.25.0"
            },
            "StopSignal": "SIGWINCH"
        },
        "NetworkSettings": {
            "Bridge": "",
            "SandboxID": "9f28df1912dade5b0d3001c875851cff085b100daa98390d546418207a0b962e",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {
                "80/tcp": [
                    {
                        "HostIp": "0.0.0.0",
                        "HostPort": "80"
                    }
                ]
            },
            "SandboxKey": "/var/run/docker/netns/9f28df1912da",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "3928232aebd334a5aca3706dd3739f070e99bae3723e515a7792874a5f434bc0",
            "Gateway": "172.17.0.1",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "172.17.0.3",
            "IPPrefixLen": 16,
            "IPv6Gateway": "",
            "MacAddress": "02:42:ac:11:00:03",
            "Networks": {
                "bridge": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": null,
                    "NetworkID": "e305027dab8764ad7d11a17ff5228c7b77fdcc18656a55788e9faf9ebe978ecd",
                    "EndpointID": "3928232aebd334a5aca3706dd3739f070e99bae3723e515a7792874a5f434bc0",
                    "Gateway": "172.17.0.1",
                    "IPAddress": "172.17.0.3",
                    "IPPrefixLen": 16,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "02:42:ac:11:00:03",
                    "DriverOpts": null
                }
            }
        }
    }
]

This gives us the necessary parameters to recreate the container, and the passwords to access the database. docker inspect mariadb | grep IPAddress gives us the IP Address to access MariaDB.

root@ip-172-31-19-232:~# docker exec -it wordpress mysql -u root -ppassword -h 172.17.0.2
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 6
Server version: 10.8.3-MariaDB-1:10.8.3+maria~jammy mariadb.org binary distribution

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> use wordpress
Database changed
MariaDB [wordpress]> show tables;
Empty set (0.000 sec)

MariaDB [wordpress]>

So MariaDB is accessible fine from the wordpress container, just the hostname isn’t set up. There’s no --link parameter isn’t given. So let’s recreate the container:

# docker stop wordpress
# docker rm wordpress
# docker create \
	--name wordpress \
	-v html:/var/www/html \
	--link mariadb:mysql \
	-p 80:80 \
	-e WORDPRESS_DB_HOST=mysql \
	-e WORDPRESS_DB_NAME=wordpress \
	-e WORDPRESS_DB_USER=root \
	-e WORDPRESS_DB_PASSWORD=password \
	wordpress:sad
# docker start wordpress

As a oneliner:

docker stop wordpress ; \
docker rm wordpress ; \
docker run -d \
	--name wordpress \
	-v html:/var/www/html \
	--link mariadb:mysql \
	-p 80:80 \
	-e WORDPRESS_DB_HOST=mysql \
	-e WORDPRESS_DB_NAME=wordpress \
	-e WORDPRESS_DB_USER=root \
	-e WORDPRESS_DB_PASSWORD=password \
	wordpress:sad

Singara

There’s a k3s Kubernetes install you can access with kubectl. The Kubernetes YAML manifests under /home/admin have been applied. The objective is to access from the host the “webapp” web server deployed and find what message it serves (it’s a name of a town or city btw). In order to pass the check, the webapp Docker container should not be run separately outside Kubernetes as a shortcut.

Let’s have a look at the cluster’s state:

root@ip-10-0-0-64:~# kubectl get -A pod
NAMESPACE     NAME                                      READY   STATUS             RESTARTS        AGE
kube-system   helm-install-traefik-crd-nml28            0/1     Completed          0               101d
kube-system   helm-install-traefik-z54r4                0/1     Completed          2               101d
web           webapp-deployment-666b67994b-5sffz        0/1     ImagePullBackOff   0               101d
kube-system   coredns-b96499967-scfhc                   1/1     Running            7 (101d ago)    101d
kube-system   local-path-provisioner-7b7dc8d6f5-r8777   1/1     Running            7 (101d ago)    101d
kube-system   metrics-server-668d979685-gkzrn           1/1     Running            11 (101d ago)   101d
kube-system   svclb-traefik-7cfc151c-hqm5p              2/2     Running            8 (101d ago)    101d
kube-system   traefik-7cd4fcff68-g8t6k                  1/1     Running            7 (101d ago)    101d
kube-system   svclb-traefik-7cfc151c-m6842              2/2     Running            2 (115s ago)    100d

Apparantly, the desired container image is missing from the container runtime’s image store. The intro text hints at it being present in the Docker Inc. runtime that’s installed on the system though:

root@ip-10-0-0-64:/home/admin# docker images
REPOSITORY   TAG        IMAGE ID       CREATED        SIZE
webapp       latest     9c082e2983bc   3 months ago   135MB
python       3.7-slim   c1d0bab51bbf   3 months ago   123MB
registry     2          3a0f7b0a13ef   4 months ago   24.1MB

There it is. But it’s not in k3s’ image store:

root@ip-10-0-0-64:~# k3s ctr i ls | grep webapp
root@ip-10-0-0-64:~#

Let’s add it to k3s’ containerd:

root@ip-10-0-0-64:~# sudo docker save webapp:latest | sudo k3s ctr images import -
unpacking docker.io/library/webapp:latest (sha256:25b7b5c6f8ff5fc3b3b89bb5a5c1b96c619e4152275638aeaa82841b4c14ebf8)...done
root@ip-10-0-0-64:~# k3s ctr i ls | grep webapp
docker.io/library/webapp:latest                                                                                    application/vnd.docker.distribution.manifest.v2+json      sha256:25b7b5c6f8ff5fc3b3b89bb5a5c1b96c619e4152275638aeaa82841b4c14ebf8 84.1 MiB  linux/amd64                                                                                             io.cri-containerd.image=managed

Excellent. But the pod is still in ImagePullBackOff:

root@ip-10-0-0-64:~# kubectl get -n web pod
NAME                                 READY   STATUS             RESTARTS   AGE
webapp-deployment-666b67994b-5sffz   0/1     ImagePullBackOff   0          101d

Let’s have a look at the manifests:

root@ip-10-0-0-64:~# cd ~admin
root@ip-10-0-0-64:/home/admin#
root@ip-10-0-0-64:/home/admin# cat deployment.yml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: webapp-deployment
  namespace: web
spec:
  selector:
    matchLabels:
      app: webapp
  replicas: 1
  template:
    metadata:
      labels:
        app: webapp
    spec:
      containers:
      - name: webapp
        image: webapp
        imagePullPolicy: Always
        ports:
        - containerPort: 8880
root@ip-10-0-0-64:/home/admin# cat nodeport.yml
apiVersion: v1
kind: Service
metadata:
  name: webapp-service
  namespace: web
spec:
  type: NodePort
  selector:
    app.kubernetes.io/name: webapp
  ports:
    - port: 80
      targetPort: 8888
      nodePort: 30007

The service doesn’t match the deployment, the labels don’t target the deployment, the port is 80 instead of 8888 and the service type doesn’t allocate the desired port on the host.

The deployment should not pull the image everytime and uses the wrong port, it should be 8888.

The manifests should look like this:

root@ip-10-0-0-64:/home/admin# cat deployment.yml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: webapp-deployment
  namespace: web
spec:
  selector:
    matchLabels:
      app: webapp
  replicas: 1
  template:
    metadata:
      labels:
        app: webapp
    spec:
      containers:
      - name: webapp
        image: webapp
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 8888
root@ip-10-0-0-64:/home/admin# cat nodeport.yml 
apiVersion: v1
kind: Service
metadata:
  name: webapp-service
  namespace: web
spec:
  type: LoadBalancer
  selector:
    app: webapp
  ports:
    - port: 8888
      targetPort: 8888
      nodePort: 30007

Apply them with kubectl apply -f deployment.yaml -f nodeport.yaml and verify the results:

root@ip-10-0-0-64:/home/admin# kubectl -n web get svc -o wide
NAME             TYPE           CLUSTER-IP    EXTERNAL-IP    PORT(S)          AGE    SELECTOR
webapp-service   LoadBalancer   10.43.35.97   172.31.33.52   8888:30007/TCP   101d   app=webapp
root@ip-10-0-0-64:/home/admin# iptables-save  | grep 8888
# Warning: iptables-legacy tables present, use iptables-legacy-save to see them
-A CNI-DN-7a71a21266205d85a33bf -s 10.42.1.0/24 -p tcp -m tcp --dport 8888 -j CNI-HOSTPORT-SETMARK
-A CNI-DN-7a71a21266205d85a33bf -s 127.0.0.1/32 -p tcp -m tcp --dport 8888 -j CNI-HOSTPORT-SETMARK
-A CNI-DN-7a71a21266205d85a33bf -p tcp -m tcp --dport 8888 -j DNAT --to-destination 10.42.1.10:8888
-A CNI-HOSTPORT-DNAT -p tcp -m comment --comment "dnat name: \"cbr0\" id: \"1c91e84359f8f6ac9bdcde61f61f3fedf231aedbc8ca174fa473231ec41298e0\"" -m multiport --dports 8888 -j CNI-DN-7a71a21266205d85a33bf
-A KUBE-SEP-72FXFZAWXXLVOFL6 -p tcp -m comment --comment "web/webapp-service" -m tcp -j DNAT --to-destination 10.42.1.9:8888
-A KUBE-SERVICES -d 10.43.35.97/32 -p tcp -m comment --comment "web/webapp-service cluster IP" -m tcp --dport 8888 -j KUBE-SVC-6PCM5WOO3Q3DGU2X
-A KUBE-SERVICES -d 172.31.33.52/32 -p tcp -m comment --comment "web/webapp-service loadbalancer IP" -m tcp --dport 8888 -j KUBE-EXT-6PCM5WOO3Q3DGU2X
-A KUBE-SVC-6PCM5WOO3Q3DGU2X ! -s 10.42.0.0/16 -d 10.43.35.97/32 -p tcp -m comment --comment "web/webapp-service cluster IP" -m tcp --dport 8888 -j KUBE-MARK-MASQ
-A KUBE-SVC-6PCM5WOO3Q3DGU2X -m comment --comment "web/webapp-service -> 10.42.1.9:8888" -j KUBE-SEP-72FXFZAWXXLVOFL6

We’re done:

root@ip-10-0-0-64:/home/admin# curl localhost:8888
Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch

Karakorum

There’s a binary at /home/admin/wtfit that nobody knows how it works or what it does (“what the fun is this”). Someone remembers something about wtfit needing to communicate to a service in order to start. Run this wtfit program so it doesn’t exit with an error, fixing or working around things that you need but are broken in this server. (Note that you can open more than one web “terminal”).

admin@ip-172-31-42-20:~$ file wtfit 
wtfit: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, Go BuildID=_iHDgQr9_DkcDPzFkdEV/KeopbEbY41Cs4Jx-plaD/bLd_UQ3kiokG80g9hhbY/qtKkikMAZmQbLl9qoMCF, not stripped
admin@ip-172-31-42-20:~$ ls -lhA
total 6.2M
-rw------- 1 admin admin    5 Sep 13 18:31 .bash_history
-rw-r--r-- 1 admin admin  220 Aug  4  2021 .bash_logout
-rw-r--r-- 1 admin admin 3.5K Aug  4  2021 .bashrc
-rw-r--r-- 1 admin admin  807 Aug  4  2021 .profile
drwx------ 2 admin admin 4.0K Aug 31 21:26 .ssh
drwxr-xr-x 2 admin admin 4.0K Sep 13 18:18 agent
-rw-r--r-- 1 admin admin 6.1M Sep 13 18:17 wtfit
-rw-r--r-- 1 admin admin  127 Dec 26 21:21 xxx
admin@ip-172-31-42-20:~$ chmod +x wtfit 
bash: /usr/bin/chmod: Permission denied

Wat? Chmod is broken? Great…

sudo /lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 /usr/bin/chmod +x /usr/bin/chmod

STrace to the rescue…

admin@ip-172-31-42-20:~$ strace -o xxx -f ./wtfit 
ERROR: can't open config file

Let’s look at xxx:

972   openat(AT_FDCWD, "/home/admin/wtfitconfig.conf", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or d
irectory)
972   write(1, "ERROR: can't open config file\n", 30) = 30

ok, again with the strace-and-looking-at-the-dump dance:

986   connect(3, {sa_family=AF_INET, sin_port=htons(7777), sin_addr=inet_addr("127.0.0.1")}, 16 <unfinished ...>
987   <... nanosleep resumed>NULL)      = 0
987   nanosleep({tv_sec=0, tv_nsec=320000},  <unfinished ...>
986   <... connect resumed>)            = -1 EINPROGRESS (Operation now in progress)
986   epoll_ctl(4, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=3781652200, u64=140337543081704}}) = 0
990   <... epoll_pwait resumed>[{EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|EPOLLRDHUP, {u32=3781652200, u64=140337543081704}}], 128, 29999, NULL, 580717888312) = 1
986   write(6, "\0", 1 <unfinished ...>
990   epoll_pwait(4,  <unfinished ...>
986   <... write resumed>)              = 1
990   <... epoll_pwait resumed>[{EPOLLIN, {u32=8851208, u64=8851208}}], 128, 29989, NULL, 580717888262) = 1
986   futex(0xc000034948, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
990   read(5,  <unfinished ...>
986   <... futex resumed>)              = 1
990   <... read resumed>"\0", 16)       = 1
989   <... futex resumed>)              = 0
987   <... nanosleep resumed>NULL)      = 0
986   getsockopt(3, SOL_SOCKET, SO_ERROR,  <unfinished ...>
990   futex(0xc000080148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
989   epoll_pwait(4,  <unfinished ...>
986   <... getsockopt resumed>[ECONNREFUSED], [4]) = 0
986   futex(0xc000080148, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
987   nanosleep({tv_sec=0, tv_nsec=640000},  <unfinished ...>
986   <... futex resumed>)              = 1
990   <... futex resumed>)              = 0
986   epoll_ctl(4, EPOLL_CTL_DEL, 3, 0xc000093324 <unfinished ...>
990   futex(0xc000080148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
986   <... epoll_ctl resumed>)          = 0
986   close(3)                          = 0
986   futex(0xc000080148, FUTEX_WAKE_PRIVATE, 1) = 1
990   <... futex resumed>)              = 0
990   nanosleep({tv_sec=0, tv_nsec=3000},  <unfinished ...>
986   write(1, "ERROR: can't connect to server\n", 31 <unfinished ...>
987   <... nanosleep resumed>NULL)      = 0
987   nanosleep({tv_sec=0, tv_nsec=1280000},  <unfinished ...>
986   <... write resumed>)              = 31
986   exit_group(1 <unfinished ...>
990   <... nanosleep resumed>NULL)      = 0
986   <... exit_group resumed>)         = ?
989   <... epoll_pwait resumed> <unfinished ...>) = ?
988   <... futex resumed>)              = ?
987   <... nanosleep resumed> <unfinished ...>) = ?
990   +++ exited with 1 +++
987   +++ exited with 1 +++
988   +++ exited with 1 +++
989   +++ exited with 1 +++
986   +++ exited with 1 +++

So it wants to connect to localhost:7777 but can’t, as there’s no service listening, as evidenced by ss. No mentions of the port in /etc.

Sigh so I guess we have to improvise a server. Let’s try with Netcat:

admin@ip-172-31-42-20:~$ nc -p 7777 -l &
[1] 1042
admin@ip-172-31-42-20:~$ strace -f -o xxx ./wtfit 
GET / HTTP/1.1
Host: localhost:7777
User-Agent: Go-http-client/1.1
Accept-Encoding: gzip

^Z
[2]+  Stopped                 strace -f -o xxx ./wtfit
admin@ip-172-31-42-20:~$ bg
[2]+ strace -f -o xxx ./wtfit &
admin@ip-172-31-42-20:~$ fg
nc -p 7777 -l
^C
admin@ip-172-31-42-20:~$ ERROR: can't connect to server

[2]+  Exit 1                  strace -f -o xxx ./wtfit
admin@ip-172-31-42-20:~$

So the binary sends an HTTP request to the endpoint in question and waits for an answer. So we need to point it to a web server, but we can’t install stuff. Python’s embedded webserver to the rescue:

admin@ip-172-31-42-20:~$ python3 -m http.server --bind 127.0.0.1 7777 &
[1] 1056
admin@ip-172-31-42-20:~$ Serving HTTP on 127.0.0.1 port 7777 (http://127.0.0.1:7777/) ...

admin@ip-172-31-42-20:~$ strace -f -o xxx ./wtfit 
127.0.0.1 - - [26/Dec/2022 21:35:23] "GET / HTTP/1.1" 200 -
OK.

As a oneliner:

sudo /lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 /usr/bin/chmod +x /usr/bin/chmod ; \
cd ; \
chmod +x wtfit ; \
touch wtfitconfig.conf ; \
python3 -m http.server --bind localhost 7777 & ; \
./wtfit

Oaxaca

lsof /home/admin/somefile and echo $$ reveals the bash we’re in has the file handle open. So we need to find out which one it is:

# Find the file descriptor
fd=find /proc/$$/fd -lname '/home/admin/somefile' | grep -oEe "[0-9]+$"
# Close the file descriptor
exec ${fd}<&-

As a oneliner:

exec $(find /proc/$$/fd -lname '/home/admin/somefile' | grep -oEe "[0-9]+$")<&-

Venice

Try and figure out if you are inside a container (like a Docker one for example) or inside a Virtual Machine (like in the other scenarios).

So we’re in a VM :-)

root@ip-172-31-38-151:/# dmesg | grep Hyper
[    0.000000] Hypervisor detected: KVM
root@ip-172-31-38-151:/# mount
overlay on / type overlay (rw,relatime,lowerdir=/var/lib/containers/storage/overlay/l/4UGO27476EXYY2UQSGBWL6EZC4:/var/lib/containers/storage/overlay/l/3JZQV4UY3FCO3W7SL3ERYENEZN:/var/lib/containers/storage/overlay/l/SCP4AZOKFN5HY4R5CQ5UVOYS7K:/var/lib/containers/storage/overlay/l/LPA46WOYFQ5ZHRJPEGNEEUCN36:/var/lib/containers/storage/overlay/l/WYTOBRCWJIZJALTLB3T5GBAQAB,upperdir=/var/lib/containers/storage/overlay/fb7c300d0cb0c61b15087365f7e0043090a7be0369605bf1d6efb574f78bd65a/diff,workdir=/var/lib/containers/storage/overlay/fb7c300d0cb0c61b15087365f7e0043090a7be0369605bf1d6efb574f78bd65a/work)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev type tmpfs (rw,nosuid,size=65536k,mode=755)
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,relatime)
tmpfs on /run type tmpfs (rw,nosuid,nodev,relatime)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=64000k)
tmpfs on /etc/hosts type tmpfs (rw,nosuid,nodev,noexec,relatime,size=46632k,mode=755)
tmpfs on /etc/resolv.conf type tmpfs (rw,nosuid,nodev,noexec,relatime,size=46632k,mode=755)
tmpfs on /etc/hostname type tmpfs (rw,nosuid,nodev,noexec,relatime,size=46632k,mode=755)
tmpfs on /run/.containerenv type tmpfs (rw,nosuid,nodev,noexec,relatime,size=46632k,mode=755)
mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,relatime)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666)
tmpfs on /var/log/journal type tmpfs (rw,nosuid,nodev,relatime)
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,relatime,nsdelegate,memory_recursiveprot)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=28,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=13272)
debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
tracefs on /sys/kernel/tracing type tracefs (rw,nosuid,nodev,noexec,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=2M)
fusectl on /sys/fs/fuse/connections type fusectl (rw,nosuid,nodev,noexec,relatime)
configfs on /sys/kernel/config type configfs (rw,nosuid,nodev,noexec,relatime)

The mountpoints / and /run/.containerenv heavily hint that this system is a container. As we’re able to call poweroff and it works, it’s likely that our container is privileged.

Melbourne

There is a Python WSGI web application file at /home/admin/wsgi.py, the purpose of which is to serve the string “Hello, world!”. This file is served by a Gunicorn server which is fronted by an nginx server (both servers managed by systemd). So the flow of an HTTP request is: Web Client (curl) -> Nginx -> Gunicorn -> wsgi.py . The objective is to be able to curl the localhost (on default port :80) and get back “Hello, world!”, using the current setup.

Let’s see what we have there:

admin@ip-172-31-42-37:/$ curl -s http://localhost
admin@ip-172-31-42-37:/$ ss -panelt
State        Recv-Q        Send-Q               Local Address:Port               Peer Address:Port       Process                                                                                                  
LISTEN       0             128                        0.0.0.0:22                      0.0.0.0:*           ino:10776 sk:1 cgroup:/system.slice/ssh.service <->                                                     
LISTEN       0             4096                             *:6767                          *:*           users:(("sadagent",pid=600,fd=7)) uid:1000 ino:10760 sk:2 cgroup:/system.slice/sadagent.service v6only:0 <->
LISTEN       0             4096                             *:8080                          *:*           users:(("gotty",pid=599,fd=6)) uid:1000 ino:11485 sk:3 cgroup:/system.slice/gotty.service v6only:0 <->  
LISTEN       0             128                           [::]:22                         [::]:*           ino:10778 sk:4 cgroup:/system.slice/ssh.service v6only:1 <->                                            

Great. Nothing’s running. Let’s start Nginx then:

admin@ip-172-31-42-37:/$ sudo systemctl enable --now nginx
Synchronizing state of nginx.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable nginx
Created symlink /etc/systemd/system/multi-user.target.wants/nginx.service → /lib/systemd/system/nginx.service.
admin@ip-172-31-42-37:/$ systemctl status nginx
● nginx.service - A high performance web server and a reverse proxy server
     Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2022-12-27 19:06:49 UTC; 5s ago
       Docs: man:nginx(8)
    Process: 936 ExecStartPre=/usr/sbin/nginx -t -q -g daemon on; master_process on; (code=exited, statu>
    Process: 937 ExecStart=/usr/sbin/nginx -g daemon on; master_process on; (code=exited, status=0/SUCCE>
   Main PID: 938 (nginx)
      Tasks: 3 (limit: 524)
     Memory: 11.1M
        CPU: 36ms
     CGroup: /system.slice/nginx.service
             ├─938 nginx: master process /usr/sbin/nginx -g daemon on; master_process on;
             ├─939 nginx: worker process
             └─940 nginx: worker process

Dec 27 19:06:49 ip-172-31-42-37 systemd[1]: Starting A high performance web server and a reverse proxy s>
Dec 27 19:06:49 ip-172-31-42-37 systemd[1]: Started A high performance web server and a reverse proxy se>

Ok, nginx is up. So is gunicorn:

admin@ip-172-31-42-37:/$ systemctl status gunicorn
● gunicorn.service - gunicorn daemon
     Loaded: loaded (/etc/systemd/system/gunicorn.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2022-12-27 19:05:12 UTC; 6min ago
TriggeredBy: ● gunicorn.socket
   Main PID: 615 (gunicorn)
      Tasks: 2 (limit: 524)
     Memory: 17.1M
        CPU: 333ms
     CGroup: /system.slice/gunicorn.service
             ├─615 /usr/bin/python3 /usr/local/bin/gunicorn --bind unix:/run/gunicorn.sock wsgi
             └─670 /usr/bin/python3 /usr/local/bin/gunicorn --bind unix:/run/gunicorn.sock wsgi

Dec 27 19:05:12 ip-172-31-42-37 systemd[1]: Started gunicorn daemon.
Dec 27 19:05:13 ip-172-31-42-37 gunicorn[615]: [2022-12-27 19:05:13 +0000] [615] [INFO] Starting gunicorn 20.1.0
Dec 27 19:05:13 ip-172-31-42-37 gunicorn[615]: [2022-12-27 19:05:13 +0000] [615] [INFO] Listening at: unix:/run/gunicorn.sock (615)
Dec 27 19:05:13 ip-172-31-42-37 gunicorn[615]: [2022-12-27 19:05:13 +0000] [615] [INFO] Using worker: sync
Dec 27 19:05:13 ip-172-31-42-37 gunicorn[670]: [2022-12-27 19:05:13 +0000] [670] [INFO] Booting worker with pid: 670

And it’s listening on a unix socket. So let’s see what it’s doing:

admin@ip-172-31-42-37:/$ curl -s http://localhost
<html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.18.0</center>
</body>
</html>

Nothing useful. Have a look at the config then:

admin@ip-172-31-42-37:/$ cat /etc/nginx/sites-enabled/default 
server {
    listen 80;

    location / {
        include proxy_params;
        proxy_pass http://unix:/run/gunicorn.socket;
    }
}

That’s not what Nginx expects. Correct this and use the correct socket location given by systemctl status gunicorn’s output:

upstream gunicorn {
    server unix:/run/gunicorn.sock; 
}
server {
    listen 80;

    location / {
        include proxy_params;
        proxy_pass http://gunicorn;
    }
}

Now Nginx correctly sends Gunicorn’s output:

root@ip-172-31-42-1:~# curl -v http://localhost/
* Connected to localhost (127.0.0.1) port 80 (#0)
> GET / HTTP/1.1
> Host: localhost
> User-Agent: curl/7.74.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: gunicorn
< Date: Tue, 27 Dec 2022 19:27:56 GMT
< Connection: close
< Content-Type: text/html
< Content-Length: 0
< 
* Closing connection 0

But it’s empty? Ok, so check its socket directly:

root@ip-172-31-42-1:~# curl -v --unix-socket /run/gunicorn.sock http://localhost/
*   Trying /run/gunicorn.sock:0...
* Connected to localhost (/run/gunicorn.sock) port 80 (#0)
> GET / HTTP/1.1
> Host: localhost
> User-Agent: curl/7.74.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: gunicorn
< Date: Tue, 27 Dec 2022 19:27:56 GMT
< Connection: close
< Content-Type: text/html
< Content-Length: 0
< 
* Closing connection 0

Hm. The response’s content length should read differently. What’s going on? We need to look at the source code:

def application(environ, start_response):
    start_response('200 OK', [('Content-Type', 'text/html'), ('Content-Length', 0)])
    return [b'Hello, world!']

The application should correctly set the payload’s content length – or leave it out to use chunked transfer encoding. Change the source, Luke:

def application(environ, start_response):
    start_response('200 OK', [('Content-Type', 'text/html')])
    return [b'Hello, world!']

Restart gunicorn and test:

root@ip-172-31-42-1:~# systemctl restart gunicorn
root@ip-172-31-42-1:~# curl -v --unix-socket /run/gunicorn.sock http://localhost/
*   Trying /run/gunicorn.sock:0...
* Connected to localhost (/run/gunicorn.sock) port 80 (#0)
> GET / HTTP/1.1
> Host: localhost
> User-Agent: curl/7.74.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: gunicorn
< Date: Tue, 27 Dec 2022 19:32:32 GMT
< Connection: close
< Transfer-Encoding: chunked
< Content-Type: text/html
< 
* Closing connection 0
Hello, world!

Excellent, we’re done.

As a short script:

cat > /etc/nginx/sites-enabled/default << EOF
upstream gunicorn {
    server unix:/run/gunicorn.sock; 
}
server {
    listen 80;

    location / {
        include proxy_params;
        proxy_pass http://gunicorn;
    }
}
EOF
cat > /home/admin/wsgi.py << EOF
def application(environ, start_response):
    start_response('200 OK', [('Content-Type', 'text/html')])
    return [b'Hello, world!']
EOF
systemctl restart gunicorn
systemctl enable --now nginx

Hong-Kong

(Similar to “Manhattan” scenario but harder). Your objective is to be able to insert a row in an existing Postgres database. The issue is not specific to Postgres and you don’t need to know details about it (although it may help).

Postgres information: it’s a service that listens to a port (:5432) and writes to disk in a data directory, the location of which is defined in the data_directory parameter of the configuration file /etc/postgresql/14/main/postgresql.conf. In our case Postgres is managed by systemd as a unit with name postgresql.

Let’s have a looksie:

root@ip-172-31-25-11:/# systemctl status postgresql
● postgresql.service - PostgreSQL RDBMS
   Loaded: loaded (/lib/systemd/system/postgresql.service; enabled; vendor preset: enabled)
   Active: active (exited) since Tue 2022-12-27 19:50:14 UTC; 2min 13s ago
  Process: 595 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
 Main PID: 595 (code=exited, status=0/SUCCESS)

Dec 27 19:50:14 ip-172-31-25-11 systemd[1]: Starting PostgreSQL RDBMS...
Dec 27 19:50:14 ip-172-31-25-11 systemd[1]: Started PostgreSQL RDBMS.

Great. According to Debian’s docs, we’re supposed to use the postgresql@.service unit, matching the /etc/postgresql/14/main tree as mentioned:

root@ip-172-31-25-11:/# systemctl status postgresql@14-main
● postgresql@14-main.service - PostgreSQL Cluster 14-main
   Loaded: loaded (/lib/systemd/system/postgresql@.service; enabled-runtime; vendor preset: enabled)
   Active: failed (Result: protocol) since Tue 2022-12-27 20:29:00 UTC; 47s ago
  Process: 581 ExecStart=/usr/bin/pg_ctlcluster --skip-systemctl-redirect 14-main start (code=exited, sta

Dec 27 20:29:00 ip-172-31-25-11 systemd[1]: Starting PostgreSQL Cluster 14-main...
Dec 27 20:29:00 ip-172-31-25-11 postgresql@14-main[581]: Error: /opt/pgdata/main is not accessible or doe
Dec 27 20:29:00 ip-172-31-25-11 systemd[1]: postgresql@14-main.service: Can't open PID file /run/postgres
Dec 27 20:29:00 ip-172-31-25-11 systemd[1]: postgresql@14-main.service: Failed with result 'protocol'.
Dec 27 20:29:00 ip-172-31-25-11 systemd[1]: Failed to start PostgreSQL Cluster 14-main.

Ok, so the data is missing, /opt/pgdata is empty. Let’s look at /etc/fstab:

root@ip-172-31-25-11:/# cat /etc/fstab
# /etc/fstab: static file system information
UUID=5db68868-2d70-449f-8b1d-f3c769ec01c7 / ext4 rw,discard,errors=remount-ro,x-systemd.growfs 0 1
UUID=72C9-F191 /boot/efi vfat defaults 0 0
/dev/xvdb /opt/pgdata xfs defaults,nofail 0 0

Verify that the disk is present:

root@ip-172-31-25-11:/# lsblk
NAME         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
nvme0n1      259:0    0    8G  0 disk 
nvme1n1      259:1    0    8G  0 disk 
├─nvme1n1p1  259:2    0  7.9G  0 part /
├─nvme1n1p14 259:3    0    3M  0 part 
└─nvme1n1p15 259:4    0  124M  0 part /boot/efi
root@ip-172-31-25-11:/# blkid /dev/nvme0n1
/dev/nvme0n1: UUID="9a2e1bf9-50e8-41a9-9f8c-5dd837869c50" TYPE="xfs"

No it’s not. But /dev/nvme0n1 is a likely candidate. Beware that /etc/fstab contains the nofail flag. You need to restart SystemD after changing /etc/fstab:

root@ip-172-31-25-11:/# cat /etc/fstab
# /etc/fstab: static file system information
UUID=5db68868-2d70-449f-8b1d-f3c769ec01c7 / ext4 rw,discard,errors=remount-ro,x-systemd.growfs 0 1
UUID=72C9-F191 /boot/efi vfat defaults 0 0
/dev/nvme0n1 /opt/pgdata xfs defaults,nofail 0 0
root@ip-172-31-25-11:/# systemctl daemon-reload
root@ip-172-31-25-11:/# mount /opt/pgdata
root@ip-172-31-25-11:/# ls /opt/pgdata
deleteme  file1.bk  file2.bk  file3.bk  main
root@ip-172-31-25-11:/# df -h
Filesystem       Size  Used Avail Use% Mounted on
udev             224M     0  224M   0% /dev
tmpfs             47M  1.5M   45M   4% /run
/dev/nvme1n1p1   7.7G  1.2G  6.1G  17% /
tmpfs            233M     0  233M   0% /dev/shm
tmpfs            5.0M     0  5.0M   0% /run/lock
tmpfs            233M     0  233M   0% /sys/fs/cgroup
/dev/nvme1n1p15  124M  278K  124M   1% /boot/efi
/dev/nvme0n1     8.0G  8.0G   28K 100% /opt/pgdata

Mounting works, but the disk is full as it was in Manhattan.

Purge the *.bk files from disk, restart the service and we’re done:

root@ip-172-31-25-11:/# rm -f /opt/pgdata/*.bk
root@ip-172-31-25-11:/# systemctl start postgresql@14-main

Lisbon

There’s an etcd server running on https://localhost:2379, get the value for the key “foo”, ie etcdctl get foo or curl https://localhost:2379/v2/keys/foo

Tue Jan  2 07:41:39 UTC 2024
admin@ip-10-0-0-196:/$ curl https://localhost:2379/v2/keys/foo
curl: (60) SSL certificate problem: certificate has expired
More details here: https://curl.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.
admin@ip-10-0-0-196:/$ etcdctl get foo
Error:  client: etcd cluster is unavailable or misconfigured; error #0: x509: certificate has expired or is not yet valid: current time 2024-01-02T07:42:07Z is after 2023-01-30T00:02:48Z

error #0: x509: certificate has expired or is not yet valid: current time 2024-01-02T07:42:07Z is after 2023-01-30T00:02:48Z

This server’s clock is off by a year - reset it:

admin@ip-10-0-0-196:/$ sudo date -s "2023-01-02 08:30:00+0100"
admin@ip-10-0-0-196:/$ curl https://localhost:2379/v2/keys/foo
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx/1.18.0</center>
</body>
</html>

Good, curl can access the host now. But that’s nginx that’s replying, not etcd?


admin@ip-10-0-0-196:/$ systemctl status nginx
● nginx.service - A high performance web server and a reverse proxy server
     Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled)
     Active: active (running) since Mon 2023-01-02 07:41:28 UTC; 6min ago
       Docs: man:nginx(8)
    Process: 581 ExecStartPre=/usr/sbin/nginx -t -q -g daemon on; master_process on; (code=exited, status=0/SUCCESS)
    Process: 601 ExecStart=/usr/sbin/nginx -g daemon on; master_process on; (code=exited, status=0/SUCCESS)
   Main PID: 610 (nginx)
      Tasks: 3 (limit: 521)
     Memory: 12.9M
        CPU: 203ms
     CGroup: /system.slice/nginx.service
             ├─610 nginx: master process /usr/sbin/nginx -g daemon on; master_process on;
             ├─611 nginx: worker process
             └─612 nginx: worker process

Jan 02 07:41:27 ip-10-0-0-196 systemd[1]: Starting A high performance web server and a reverse proxy server...
Jan 02 07:41:28 ip-10-0-0-196 systemd[1]: nginx.service: Failed to parse PID from file /run/nginx.pid: Invalid argument
Jan 02 07:41:28 ip-10-0-0-196 systemd[1]: Started A high performance web server and a reverse proxy server.
admin@ip-10-0-0-196:/$ ss -panetl
State          Recv-Q         Send-Q                 Local Address:Port                 Peer Address:Port        Process                                                                                                              
LISTEN         0              4096                       127.0.0.1:2379                      0.0.0.0:*            uid:108 ino:12125 sk:1 cgroup:/system.slice/etcd.service <->                                                        
LISTEN         0              4096                       127.0.0.1:2380                      0.0.0.0:*            uid:108 ino:12121 sk:2 cgroup:/system.slice/etcd.service <->                                                        
LISTEN         0              128                          0.0.0.0:22                        0.0.0.0:*            ino:11458 sk:3 cgroup:/system.slice/ssh.service <->                                                                 
LISTEN         0              511                          0.0.0.0:443                       0.0.0.0:*            ino:11464 sk:4 cgroup:/system.slice/nginx.service <->                                                               
LISTEN         0              4096                               *:6767                            *:*            users:(("sadagent",pid=563,fd=7)) uid:1000 ino:10633 sk:5 cgroup:/system.slice/sadagent.service v6only:0 <->        
LISTEN         0              4096                               *:8080                            *:*            users:(("gotty",pid=562,fd=6)) uid:1000 ino:10642 sk:6 cgroup:/system.slice/gotty.service v6only:0 <->              
LISTEN         0              128                             [::]:22                           [::]:*            ino:11460 sk:7 cgroup:/system.slice/ssh.service v6only:1 <->                                                        
admin@ip-10-0-0-196:/$ grep -ri 2379 /etc/nginx/

Everything as it should be. What does etcd have to say?

admin@ip-10-0-0-196:/$ systemctl status etcd
● etcd.service - etcd - highly-available key value store
     Loaded: loaded (/lib/systemd/system/etcd.service; enabled; vendor preset: enabled)
     Active: active (running) since Mon 2023-01-02 07:46:08 UTC; 4min 1s ago
       Docs: https://etcd.io/docs
             man:etcd
   Main PID: 873 (etcd)
      Tasks: 9 (limit: 521)
     Memory: 7.8M
        CPU: 2.050s
     CGroup: /system.slice/etcd.service
             └─873 /usr/bin/etcd --cert-file /etc/ssl/certs/localhost.crt --key-file /etc/ssl/certs/localhost.key --advertise-client-urls=https://localhost:2379 --listen-client-urls=https://localhost:2379

Jan 02 07:46:08 ip-10-0-0-196 etcd[873]: 8e9e05c52164694d became candidate at term 19
Jan 02 07:46:08 ip-10-0-0-196 etcd[873]: 8e9e05c52164694d received MsgVoteResp from 8e9e05c52164694d at term 19
Jan 02 07:46:08 ip-10-0-0-196 etcd[873]: 8e9e05c52164694d became leader at term 19
Jan 02 07:46:08 ip-10-0-0-196 etcd[873]: raft.node: 8e9e05c52164694d elected leader 8e9e05c52164694d at term 19
Jan 02 07:46:08 ip-10-0-0-196 etcd[873]: published {Name:ip-10-0-0-196 ClientURLs:[https://localhost:2379]} to cluster cdf818194e3a8c32
Jan 02 07:46:08 ip-10-0-0-196 etcd[873]: ready to serve client requests
Jan 02 07:46:08 ip-10-0-0-196 systemd[1]: Started etcd - highly-available key value store.
Jan 02 07:46:09 ip-10-0-0-196 etcd[873]: serving client requests on 127.0.0.1:2379
Jan 02 07:46:11 ip-10-0-0-196 etcd[873]: WARNING: 2023/01/02 07:46:11 grpc: addrConn.createTransport failed to connect to {localhost:2379  <nil> 0 <nil>}. Err :tls: use of closed connection. Reconnecting...
Jan 02 07:46:23 ip-10-0-0-196 etcd[873]: WARNING: 2023/01/02 07:46:23 grpc: addrConn.createTransport failed to connect to {localhost:2379  <nil> 0 <nil>}. Err :tls: use of closed connection. Reconnecting...

etcd does get connections, but something doesn’t work - So what’s interfering? IPTables?

admin@ip-10-0-0-196:/$ sudo iptables-save 
# Generated by iptables-save v1.8.7 on Mon Jan  2 07:52:37 2023
*nat
:PREROUTING ACCEPT [13:1812]
:INPUT ACCEPT [11:660]
:OUTPUT ACCEPT [13:988]
:POSTROUTING ACCEPT [50:3208]
:DOCKER - [0:0]
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT -o lo -p tcp -m tcp --dport 2379 -j REDIRECT --to-ports 443
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
-A DOCKER -i docker0 -j RETURN
COMMIT
# Completed on Mon Jan  2 07:52:37 2023
# Generated by iptables-save v1.8.7 on Mon Jan  2 07:52:37 2023
*filter
:INPUT ACCEPT [2199:212561]
:FORWARD DROP [0:0]
:OUTPUT ACCEPT [1644:453532]
:DOCKER - [0:0]
:DOCKER-ISOLATION-STAGE-1 - [0:0]
:DOCKER-ISOLATION-STAGE-2 - [0:0]
:DOCKER-USER - [0:0]
-A FORWARD -j DOCKER-USER
-A FORWARD -j DOCKER-ISOLATION-STAGE-1
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN
-A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP
-A DOCKER-ISOLATION-STAGE-2 -j RETURN
-A DOCKER-USER -j RETURN
COMMIT
# Completed on Mon Jan  2 07:52:37 2023

The -A OUTPUT -o lo -p tcp -m tcp --dport 2379 -j REDIRECT --to-ports 443 line is the smoking gun right there. sudo docker ps -a verifies that Docker isn’t relevant to this exercise, so purge the IPTables NAT table:

admin@ip-10-0-0-196:/$ sudo iptables -t nat -F
admin@ip-10-0-0-196:/$ etcdctl cluster-health
member 8e9e05c52164694d is healthy: got healthy result from https://localhost:2379
cluster is healthy
admin@ip-10-0-0-196:/$ etcdctl get foo
bar

And presto, all is well with etcd!