- Intro
- Saint John
- Santiago
- Saskatoon
- Tokyo
- Manhattan
- Capetown
- Salta
- Jakarta
- Bern
- Singara
- Karakorum
- Oaxaca
- Venice
- Melbourne
- Hong-Kong
- Lisbon
Intro
SadServers is a LeetCode style puzzle for Sysadmins/Site Reliability Engineers/DevOps Engineers or whatever Ops people in IT are called nowadays. The following is a writeup of walking through the challenges given.
Saint John
A developer created a testing program that is continuously writing to a log file
/var/log/bad.log
and filling up disk. You can check for example withtail -f /var/log/bad.log.
This program is no longer needed. Find it and terminate it.
So let’s see what is accessing this file with lsof
:
$ lsof /var/log/bad.log
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
badlog.py 621 ubuntu 3w REG 259,1 10629 67701 /var/log/bad.log
Second column lists the ID of the process what writes to the log.
We’ll kill ist with kill
:
kill 621
As a oneliner:
kill $(lsof -t /var/log/bad.log)
Santiago
Alice the spy has hidden a secret number combination, find it using these instructions:
1) Find the number of lines where the string Alice occurs in
*.txt
files in the/home/admin
directory 2) There’s a file where “Alice” appears exactly once. In the line after that ocurrence there’s a number. Write both numbers consecutively as one (no new line or spaces) to the solution file (eg if the first number from 1) is 11 and the second 22, you can doecho -n 11 > /home/admin/solution; echo 22 >> /home/admin/solution
A little shell tool magic… grep -Hc file
gives us file names and match counters. cut
cuts up files on delimiters, grep -oEe
allows regex-based extraction of values.
Ad 1):
$ grep Alice -Hnc *.txt | \
cut -d : -f 2 | \
python3 -c "import sys; print(sum(int(l) for l in sys.stdin))"
Find the word “Alice” in the target *.txt
files and count the occurences by file, |
select the counters only from the previous grep output and |
sum up the resulting list of numbers with a short python script.
Ad 2):
grep Alice -Hnc *.txt | grep -Ee :1$ | cut -d: -f1
Find and count Alice in all target files, |
filter out the file containing only a single occurence |
and get the filename.
$ grep -A1 Alice \
$(grep Alice -Hnc *.txt | grep -Ee :1$ | cut -d: -f1) | \
grep -oEe "[0-9]+"
Find the occurence in the file found earlier again, but this time find the line after the matching line with -A1
, |
select only numbers from the output.
echo ${PART1}${PART2} > /home/admin/solution
Concat both parts, add a newline at the end and write the result to the given file.
As a oneliner:
echo \
$(grep Alice -Hnc *.txt | cut -d : -f 2 | python3 -c "import sys; print(sum(int(l) for l in sys.stdin))")$(grep -A1 Alice $(grep Alice -Hnc *.txt | grep -Ee :1$ | cut -d: -f1) | grep -oEe "[0-9]+") \
> /home/admin/solution
Saskatoon
There’s a web server access log file at
/home/admin/access.log
. The file consists of one line per HTTP request, with the requester’s IP address at the beginning of each line.Find what’s the IP address that has the most requests in this file (there’s no tie; the IP is unique). Write the solution into a file
/home/admin/highestip.txt
. For example, if your solution is “1.2.3.4”, you can doecho "1.2.3.4" > /home/admin/highestip.txt
# -o Only print matching part, -E extended regex, -e pattern is a regex
# IP addresses Regex for
# 0-255.0-255.0-255.0-255
grep -oEe \
'[12]?[0-9]{1,2}[.][12]?[0-9]{1,2}[.][12]?[0-9]{1,2}[.][12]?[0-9]{1,2}' \
access.log
This yields a list of IP addresses. Now we need to sort and count (sort | uniq -c
), sort by the leading count, highest at the bottom (sort -n
), pick the last one (tail -n1
), and from the resulting line, pick the second field as it’s the IP address. Write it to the result file > /home/admin/highestip.txt
.
As a oneliner:
grep -oEe '[12]?[0-9]{1,2}[.][12]?[0-9]{1,2}[.][12]?[0-9]{1,2}[.][12]?[0-9]{1,2}' access.log | \
sort | \
uniq -c | \
sort -n | \
tail -n1 | \
awk '{ print $2 }' \
> /home/admin/highestip.txt
Tokyo
There’s a web server serving a file
/var/www/html/index.html
with content “hello sadserver” but when we try to check it locally with an HTTP client likecurl 127.0.0.1:80
, nothing is returned. This scenario is not about the particular web server configuration and you only need to have general knowledge about how web servers work.
ss
tells us that we’re dealing with Apache here:
# ss -panetl | grep 80
LISTEN 0 511 *:80 *:* users:(("apache2",pid=774,fd=4),("apache2",pid=773,fd=4),("apache2",pid=635,fd=4)) ino:17412 sk:1002 cgroup:/system.slice/apache2.service v6only:0 <->
LISTEN 0 4096 *:8080 *:* users:(("gotty",pid=551,fd=6)) ino:16863 sk:1003 cgroup:/system.slice/gotty.service v6only:0 <->
As it’s Debian, apache2ctl -S
tells us that there’s no special config here, it’s pretty much boiler plate:
# apache2ctl -S
VirtualHost configuration:
*:80 ip-172-31-21-14.us-east-2.compute.internal (/etc/apache2/sites-enabled/000-default.conf:1)
ServerRoot: "/etc/apache2"
Main DocumentRoot: "/var/www/html"
Main ErrorLog: "/var/log/apache2/error.log"
Mutex default: dir="/var/run/apache2/" mechanism=default
Mutex watchdog-callback: using_defaults
PidFile: "/var/run/apache2/apache2.pid"
Define: DUMP_VHOSTS
Define: DUMP_RUN_CFG
User: name="www-data" id=33
Group: name="www-data" id=33
So let’s just try it:
# curl localhost
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>403 Forbidden</title>
</head><body>
<h1>Forbidden</h1>
<p>You don't have permission to access this resource.</p>
<hr>
<address>Apache/2.4.52 (Ubuntu) Server at localhost Port 80</address>
</body></html>
So, what does Apache say?
# cat /var/log/apache2/error.log
[Mon Dec 26 19:06:44.506725 2022] [mpm_event:notice] [pid 635:tid 140509137360768] AH00489: Apache/2.4.52 (Ubuntu) configured -- resuming normal operations
[Mon Dec 26 19:06:44.506755 2022] [core:notice] [pid 635:tid 140509137360768] AH00094: Command line: '/usr/sbin/apache2'
[Mon Dec 26 19:11:36.248299 2022] [core:error] [pid 773:tid 140508974245440] (13)Permission denied: [client ::1:52158] AH00132: file permissions deny server access: /var/www/html/index.html
Ah. File permissions. Check it out…
# sudo -u www-data namei -mx /var/www/html/index.html
f: /var/www/html/index.html
Drwxr-xr-x /
drwxr-xr-x var
drwxr-xr-x www
drwxr-xr-x html
-rw------- index.html
Ok, path to the file is fine. The permissions on the file itself seem off though…
# ls -lhA /var/www/html/
total 4.0K
-rw------- 1 root root 16 Aug 1 00:40 index.html
Jup, Apache’s user www-data
is not able to access the file.
So change the file’s permissions:
chmod a+r /var/www/html/index.html
# curl localhost
hello sadserver
curl 127.0.0.1:80
hangs though. Strange. The earlier commands ss
and apache2ctl
reveal that Apache listens on all IP addresses, SELinux et al. aren’t active as it’s Debian. So… IPTables shenanigans?
# iptables-save
# Generated by iptables-save v1.8.7 on Mon Dec 26 19:19:54 2022
*filter
:INPUT ACCEPT [1677:121936]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [1690:328212]
-A INPUT -p tcp -m tcp --dport 80 -j DROP
COMMIT
# Completed on Mon Dec 26 19:19:54 2022
Indeed. iptables -F
takes care of that, done.
As a oneliner:
chmod a+r /var/www/html/index.html ; iptables -F
Manhattan
Your objective is to be able to insert a row in an existing Postgres database. The issue is not specific to Postgres and you don’t need to know details about it (although it may help).
Helpful Postgres information: it’s a service that listens to a port (
:5432
) and writes to disk in a data directory, the location of which is defined in thedata_directory
parameter of the configuration file/etc/postgresql/14/main/postgresql.conf
. In our case Postgres is managed by systemd as a unit with name postgresql.
Can’t write file? Check if the disk is full… With df -h
:
# df -h
Filesystem Size Used Avail Use% Mounted on
udev 224M 0 224M 0% /dev
tmpfs 47M 1.5M 46M 4% /run
/dev/nvme1n1p1 7.7G 1.2G 6.1G 17% /
tmpfs 233M 0 233M 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 233M 0 233M 0% /sys/fs/cgroup
/dev/nvme1n1p15 124M 278K 124M 1% /boot/efi
/dev/nvme0n1 8.0G 8.0G 28K 100% /opt/pgdata
Let’s look into it…
# ls -lhA
total 8.0G
-rw-r--r-- 1 root root 69 May 21 2022 deleteme
-rw-r--r-- 1 root root 7.0G May 21 2022 file1.bk
-rw-r--r-- 1 root root 923M May 21 2022 file2.bk
-rw-r--r-- 1 root root 488K May 21 2022 file3.bk
drwx------ 19 postgres postgres 4.0K May 21 2022 main
Purge files and restart the database:
# rm -f *.bk
# systemctl restart postgres
As a oneliner:
rm -f /opt/pgdata/*.bk ; systemctl restart postgres
Capetown
There’s an Nginx web server installed and managed by systemd. Running
curl -I 127.0.0.1:80
returnscurl: (7) Failed to connect to localhost port 80: Connection refused
, fix it so when you curl you get the default Nginx page.
Broken nginx, eh? Let’s ask SystemD
what’s up with it:
# systemctl status nginx
● nginx.service - The NGINX HTTP and reverse proxy server
Loaded: loaded (/etc/systemd/system/nginx.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Mon 2022-12-26 19:32:01 UTC; 1min 21s ago
Process: 570 ExecStartPre=/usr/sbin/nginx -t (code=exited, status=1/FAILURE)
CPU: 30ms
Dec 26 19:32:01 ip-172-31-40-160 nginx[570]: nginx: [emerg] unexpected ";" in /etc/nginx/sites-enabled/default:1
Dec 26 19:32:01 ip-172-31-40-160 nginx[570]: nginx: configuration file /etc/nginx/nginx.conf test failed
Dec 26 19:32:00 ip-172-31-40-160 systemd[1]: Starting The NGINX HTTP and reverse proxy server...
Dec 26 19:32:01 ip-172-31-40-160 systemd[1]: nginx.service: Control process exited, code=exited, status=1/FAILURE
Dec 26 19:32:01 ip-172-31-40-160 systemd[1]: nginx.service: Failed with result 'exit-code'.
Dec 26 19:32:01 ip-172-31-40-160 systemd[1]: Failed to start The NGINX HTTP and reverse proxy server.
Ok, so the config’s invalid - the first line is only a ‘;’. Remove it and restart nginx:
$ sed -i -e 1d /etc/nginx/sites-enabled/default
$ systemctl restart nginx
Test:
# curl localhost
<html>
<head><title>500 Internal Server Error</title></head>
<body>
<center><h1>500 Internal Server Error</h1></center>
<hr><center>nginx/1.18.0</center>
</body>
</html>
Damn it, it’s still not working. Check its log file:
# cat /var/log/nginx/error.log
2022/12/26 19:37:18 [alert] 961#961: socketpair() failed while spawning "worker process" (24: Too many open files)
2022/12/26 19:37:18 [emerg] 962#962: eventfd() failed (24: Too many open files)
2022/12/26 19:37:18 [alert] 962#962: socketpair() failed (24: Too many open files)
2022/12/26 19:37:22 [crit] 962#962: *1 open() "/var/www/html/index.nginx-debian.html" failed (24: Too many open files), client: 127.0.0.1, server: _, request: "GET / HTTP/1.1", host: "localhost"
Too many open files… Normally, we would check /etc/security/limits.conf
, but as it’s SystemD
managed we have to look at the unit:
# systemctl cat nginx
# /etc/systemd/system/nginx.service
[Unit]
Description=The NGINX HTTP and reverse proxy server
After=syslog.target network-online.target remote-fs.target nss-lookup.target
Wants=network-online.target
[Service]
Type=forking
PIDFile=/run/nginx.pid
ExecStartPre=/usr/sbin/nginx -t
ExecStart=/usr/sbin/nginx
ExecReload=/usr/sbin/nginx -s reload
ExecStop=/bin/kill -s QUIT $MAINPID
PrivateTmp=true
LimitNOFILE=10
[Install]
WantedBy=multi-user.target
Ah great. Edit the unit and reload.
# sed -i -e /LimitNOFILE/d /etc/systemd/system/nginx.service
# systemctl daemon-reload
# systemctl restart nginx
# !curl
curl localhost
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
}
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
Done.
As a oneliner:
sed -i -e 1d /etc/nginx/sites-enabled/default ; \
sed -i -e /LimitNOFILE/d /etc/systemd/system/nginx.service ; \
systemctl daemon-reload ; \
systemctl restart nginx
Salta
There’s a “dockerized” Node.js web application in the
/home/admin/app
directory. Create a Docker container so you get a web app on port :8888 and can curl to it. For the solution to be valid, there should be only one running Docker container.
First we check if it’s even broken:
admin@ip-172-31-34-225:~/app$ curl localhost:8888
these are not the droids you're looking for
Huh, odd - somethings listening on that port. What’s going on here, tell us ss
!
$ ss -panetl | grep 8888
LISTEN 0 511 0.0.0.0:8888 0.0.0.0:* ino:11536 sk:2 cgroup:/system.slice/nginx.service <->
LISTEN 0 511 [::]:8888 [::]:* ino:11537 sk:6 cgroup:/system.slice/nginx.service v6only:1 <->
Ah, Nginx - disable it:
$ sudo systemctl disable --now nginx
Synchronizing state of nginx.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install disable nginx
Removed /etc/systemd/system/multi-user.target.wants/nginx.service.
Ok, so let’s check out what docker is giving us:
# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
124a4fb17a1c app "docker-entrypoint.s…" 3 months ago Exited (1) 3 months ago elated_taussig
# docker logs elated_taussig
node:internal/modules/cjs/loader:928
throw err;
^
Error: Cannot find module '/usr/src/app/serve.js'
at Function.Module._resolveFilename (node:internal/modules/cjs/loader:925:15)
at Function.Module._load (node:internal/modules/cjs/loader:769:27)
at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:76:12)
at node:internal/main/run_main_module:17:47 {
code: 'MODULE_NOT_FOUND',
requireStack: []
}
So it seems that the container image is broken somehow. We’re given the image’s source in admin
’s homedirectory.
admin@ip-xxx:~/app$ ls
Dockerfile package-lock.json package.json server.js
admin@ip-xxx:~/app$ cat Dockerfile
# documentation https://nodejs.org/en/docs/guides/nodejs-docker-webapp/
# most recent node (security patches) and alpine (minimal, adds to security, possible libc issues)
FROM node:15.7-alpine
# Create app directory & copy app files
WORKDIR /usr/src/app
# we copy first package.json only, so we take advantage of cached Docker layers
COPY ./package*.json ./
# RUN npm ci --only=production
RUN npm install
# Copy app source
COPY ./* ./
# port used by this app
EXPOSE 8880
# command to run
CMD [ "node", "serve.js" ]
Port should be 8888, and the file name in CMD should be server.js
Make the changes and rebuild the container image:
admin@ip-xxx:~/app$ sudo docker build -t app:latest .
Sending build context to Docker daemon 101.9kB
Step 1/7 : FROM node:15.7-alpine
---> 706d12284dd5
Step 2/7 : WORKDIR /usr/src/app
---> Using cache
---> 463b1571f18e
Step 3/7 : COPY ./package*.json ./
---> Using cache
---> acfb467c80ba
Step 4/7 : RUN npm install
---> Using cache
---> 5cad5aa08c7a
Step 5/7 : COPY ./* ./
---> 59c0ca1ef224
Step 6/7 : EXPOSE 8888
---> Running in 2e5faf8ee253
Removing intermediate container 2e5faf8ee253
---> c3127219be52
Step 7/7 : CMD [ "node", "server.js" ]
---> Running in 5d24a99a1a9e
Removing intermediate container 5d24a99a1a9e
---> 3774cc41c752
Successfully built 3774cc41c752
Successfully tagged app:latest
Start the container and check the results:
admin@ip-xxx:~/app$ sudo docker run -d -p 8888:8888 app:latest
397a69aa6832fb6edc922b733fd55c6df169963f30875308287f2298ab99730e
admin@ip-xxx:~/app$ !curl
curl localhost:8888
Hello World!
Done
As a oneliner:
sudo systemctl disable --now nginx ; \
cd ~/app ; \
sed -i \
-e 's/8880/8888/g' \
-e 's/serve.js/server.js/g' \
Dockerfile ; \
sudo docker build -t app:latest . ; \
sudo docker run -d -p 8888:8888 app:latest
Jakarta
Can’t ping google.com. It returns
ping: google.com: Name or service not known.
Expected is being able to resolve the hostname. (Note: currently the VMs can’t ping outside so there’s no automated check for the solution).
Check files connected to DNS resolution (/etc/hosts
, /etc/resolv.conf
), check relevant services:
ubuntu@ip-172-31-42-233:/$ cat /etc/hosts
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
ubuntu@ip-172-31-42-233:/$ cat /etc/resolv.conf
# This is /run/systemd/resolve/stub-resolv.conf managed by man:systemd-resolved(8).
# Do not edit.
#
# This file might be symlinked as /etc/resolv.conf. If you're looking at
# /etc/resolv.conf and seeing this text, you have followed the symlink.
#
# This is a dynamic resolv.conf file for connecting local clients to the
# internal DNS stub resolver of systemd-resolved. This file lists all
# configured search domains.
#
# Run "resolvectl status" to see details about the uplink DNS servers
# currently in use.
#
# Third party programs should typically not access this file directly, but only
# through the symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a
# different way, replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.
nameserver 127.0.0.53
options edns0 trust-ad
search us-east-2.compute.internal
ubuntu@ip-172-31-42-233:/$ systemctl status systemd-resolved
● systemd-resolved.service - Network Name Resolution
Loaded: loaded (/lib/systemd/system/systemd-resolved.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2022-12-26 20:12:19 UTC; 1min 27s ago
Docs: man:systemd-resolved.service(8)
man:org.freedesktop.resolve1(5)
https://www.freedesktop.org/wiki/Software/systemd/writing-network-configuration-managers
https://www.freedesktop.org/wiki/Software/systemd/writing-resolver-clients
Main PID: 434 (systemd-resolve)
Status: "Processing requests..."
Tasks: 1 (limit: 521)
Memory: 8.2M
CPU: 115ms
CGroup: /system.slice/systemd-resolved.service
└─434 /lib/systemd/systemd-resolved
Dec 26 20:12:18 ip-172-31-42-233 systemd[1]: Starting Network Name Resolution...
Dec 26 20:12:18 ip-172-31-42-233 systemd-resolved[434]: Positive Trust Anchors:
Dec 26 20:12:18 ip-172-31-42-233 systemd-resolved[434]: . IN DS 20326 8 2 e06d44b80b8f1d39a95c0b0d7c65d0>
Dec 26 20:12:18 ip-172-31-42-233 systemd-resolved[434]: Negative trust anchors: home.arpa 10.in-addr.arp>
Dec 26 20:12:19 ip-172-31-42-233 systemd-resolved[434]: Using system hostname 'ip-172-31-42-233'.
Dec 26 20:12:19 ip-172-31-42-233 systemd[1]: Started Network Name Resolution.
Dec 26 20:12:22 ip-172-31-42-233 systemd-resolved[434]: ens5: Failed to read DNSSEC negative trust ancho>
It’s all good. Is DNS resolution really broken?
ubuntu@ip-172-31-42-233:/$ ping www.google.com
ping: www.google.com: Name or service not known
Yes, it is.
Does DNS resolution work, or is the network to blame?
$ dig www.google.at
; <<>> DiG 9.18.1-1ubuntu1.1-Ubuntu <<>> www.google.at
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 52104
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;www.google.at. IN A
;; ANSWER SECTION:
www.google.at. 300 IN A 142.250.191.163
;; Query time: 116 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Mon Dec 26 20:21:23 UTC 2022
;; MSG SIZE rcvd: 58
So the SystemD resolved cached the entry and things work as they should on that end.
So… is DNS really used by the system? Check /etc/nsswitch.conf
:
ubuntu@ip-172-31-42-233:/$ cat /etc/nsswitch.conf
# /etc/nsswitch.conf
#
# Example configuration of GNU Name Service Switch functionality.
# If you have the `glibc-doc-reference' and `info' packages installed, try:
# `info libc "Name Service Switch"' for information about this file.
passwd: files systemd
group: files systemd
shadow: files
gshadow: files
hosts: files
networks: files
protocols: db files
services: db files
ethers: db files
rpc: db files
netgroup: nis
The hosts:
line should look differently. On my box, it looks like this:
hosts: mymachines resolve [!UNAVAIL=return] files myhostname dns
So let’s put that in:
ubuntu@ip-172-31-42-233:/$ sudo vim /etc/nsswitch.conf
sudo: unable to resolve host ip-172-31-42-233: Name or service not known
ubuntu@ip-172-31-42-233:/$ ping www.google.com
PING www.google.com (142.251.32.4) 56(84) bytes of data.
^C
--- www.google.com ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1005ms
Jup, works now.
As a oneliner:
sed \
-i -e 's/hosts:.*/hosts: mymachines resolve [!UNAVAIL=return] files myhostname dns/' \
/etc/nsswitch.conf
Bern
There are two Docker containers running, a web application (Wordpress or WP) and a database (MariaDB) as back-end, but if we look at the web page, we see that it cannot connect to the database.
curl -s localhost:80 |tail -4
returns:
<body id="error-page"> <div class="wp-die-message"><h1>Error establishing a database connection</h1></div></body> </html>
This is not a Wordpress code issue (the image is :latest with some network utilities added). What you need to know is that WP uses “WORDPRESS_DB_” environment variables to create the MySQL connection string. See the
./html/wp-config.php
WP config file for example (from/home/admin
).
root@ip-172-31-19-232:~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
6ffb084b515c wordpress:sad "docker-entrypoint.s…" 4 months ago Up About a minute 0.0.0.0:80->80/tcp wordpress
0eef97284c44 mariadb:latest "docker-entrypoint.s…" 4 months ago Up About a minute 0.0.0.0:3306->3306/tcp mariadb
root@ip-172-31-19-232:~# docker inspect wordpress
[
{
"Id": "6ffb084b515ca482ac58fad406b10837b44fb55610acbb35b8ed4a0fb24de50c",
"Created": "2022-08-04T03:22:49.885388997Z",
"Path": "docker-entrypoint.sh",
"Args": [
"apache2-foreground"
],
"State": {
"Status": "running",
"Running": true,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
"Dead": false,
"Pid": 894,
"ExitCode": 0,
"Error": "",
"StartedAt": "2022-12-26T20:26:21.348936894Z",
"FinishedAt": "2022-08-31T15:40:20.366876609Z"
},
"Image": "sha256:0a3bdd32ae210e34ed07cf49669dccbdb9deac143ebf27d5cc83136a0e3d9063",
"ResolvConfPath": "/var/lib/docker/containers/6ffb084b515ca482ac58fad406b10837b44fb55610acbb35b8ed4a0fb24de50c/resolv.conf",
"HostnamePath": "/var/lib/docker/containers/6ffb084b515ca482ac58fad406b10837b44fb55610acbb35b8ed4a0fb24de50c/hostname",
"HostsPath": "/var/lib/docker/containers/6ffb084b515ca482ac58fad406b10837b44fb55610acbb35b8ed4a0fb24de50c/hosts",
"LogPath": "/var/lib/docker/containers/6ffb084b515ca482ac58fad406b10837b44fb55610acbb35b8ed4a0fb24de50c/6ffb084b515ca482ac58fad406b10837b44fb55610acbb35b8ed4a0fb24de50c-json.log",
"Name": "/wordpress",
"RestartCount": 0,
"Driver": "overlay2",
"Platform": "linux",
"MountLabel": "",
"ProcessLabel": "",
"AppArmorProfile": "docker-default",
"ExecIDs": null,
"HostConfig": {
"Binds": [
"html:/var/www/html"
],
"ContainerIDFile": "",
"LogConfig": {
"Type": "json-file",
"Config": {}
},
"NetworkMode": "default",
"PortBindings": {
"80/tcp": [
{
"HostIp": "",
"HostPort": "80"
}
]
},
"RestartPolicy": {
"Name": "always",
"MaximumRetryCount": 0
},
"AutoRemove": false,
"VolumeDriver": "",
"VolumesFrom": null,
"CapAdd": null,
"CapDrop": null,
"CgroupnsMode": "private",
"Dns": [],
"DnsOptions": [],
"DnsSearch": [],
"ExtraHosts": null,
"GroupAdd": null,
"IpcMode": "private",
"Cgroup": "",
"Links": null,
"OomScoreAdj": 0,
"PidMode": "",
"Privileged": false,
"PublishAllPorts": false,
"ReadonlyRootfs": false,
"SecurityOpt": null,
"UTSMode": "",
"UsernsMode": "",
"ShmSize": 67108864,
"Runtime": "runc",
"ConsoleSize": [
0,
0
],
"Isolation": "",
"CpuShares": 0,
"Memory": 0,
"NanoCpus": 0,
"CgroupParent": "",
"BlkioWeight": 0,
"BlkioWeightDevice": [],
"BlkioDeviceReadBps": null,
"BlkioDeviceWriteBps": null,
"BlkioDeviceReadIOps": null,
"BlkioDeviceWriteIOps": null,
"CpuPeriod": 0,
"CpuQuota": 0,
"CpuRealtimePeriod": 0,
"CpuRealtimeRuntime": 0,
"CpusetCpus": "",
"CpusetMems": "",
"Devices": [],
"DeviceCgroupRules": null,
"DeviceRequests": null,
"KernelMemory": 0,
"KernelMemoryTCP": 0,
"MemoryReservation": 0,
"MemorySwap": 0,
"MemorySwappiness": null,
"OomKillDisable": null,
"PidsLimit": null,
"Ulimits": null,
"CpuCount": 0,
"CpuPercent": 0,
"IOMaximumIOps": 0,
"IOMaximumBandwidth": 0,
"MaskedPaths": [
"/proc/asound",
"/proc/acpi",
"/proc/kcore",
"/proc/keys",
"/proc/latency_stats",
"/proc/timer_list",
"/proc/timer_stats",
"/proc/sched_debug",
"/proc/scsi",
"/sys/firmware"
],
"ReadonlyPaths": [
"/proc/bus",
"/proc/fs",
"/proc/irq",
"/proc/sys",
"/proc/sysrq-trigger"
]
},
"GraphDriver": {
"Data": {
"LowerDir": "/var/lib/docker/overlay2/ce38c2ce47ad7d4bbbd8f5c7c4ed567023d2121aef7d943d0fa2550593cfbeea-init/diff:/var/lib/docker/overlay2/94106dc67f8c47bdc7f21b7607cae55e2cf94644b34d7698b3e174a3755275a6/diff:/var/lib/docker/overlay2/2936726c07d706d506728ed14cf8cab6e23916647e0838f0f79f86552c7010b3/diff:/var/lib/docker/overlay2/069acc5fa9bb5677b2aeb93fc0ef5773a23d6d7b1218cf75ef82aca71c6bda47/diff:/var/lib/docker/overlay2/c8afefd30aa9208d8c1d54d5fd332fc36608816b4dc77f94c1137c882a4ec821/diff:/var/lib/docker/overlay2/420654ce2976260a0830db88a78d5a347fb80aa7b790b2d9309ec7ae76d91d83/diff:/var/lib/docker/overlay2/422f6df2f56d58836998a749e631cab9406fd37800a3d4f7e4eda1f14f008d08/diff:/var/lib/docker/overlay2/3dfa1839ce3bd0b607fcddd25305125d66c7d2870a6e78dc5a01ba82134e1c42/diff:/var/lib/docker/overlay2/acdc673318ff934a6a1735e4a7c3ebfa98ba0b0e1fa7753bed3c36f7be05c5d7/diff:/var/lib/docker/overlay2/3b59127b861d3c1821590e1bbe53361762ec93b9e2471688cf1f16ed75698536/diff:/var/lib/docker/overlay2/3cdfd4f802d2ed04fe9f783aea4f7875b46338caa8d9cc83df921644933a7697/diff:/var/lib/docker/overlay2/c905fa690140ab324cdda21324f759f356b6672242462ddef823f3a8f2456463/diff:/var/lib/docker/overlay2/922e6ceb9fdab2f37d87a09a651497fa659d3c319b013db7427e8eed7487749b/diff:/var/lib/docker/overlay2/904795f943098776cc0a2c8385029bbeb398ae81f6aeedde6f459b7a06130bdd/diff:/var/lib/docker/overlay2/deea2b08a5738a2f6b20e9a9086884a3f181a21408bb140147395e2d3b6b2b88/diff:/var/lib/docker/overlay2/8fc975a0a76589eeb04b246a231d986b411913521b6bedf2e3a5d05908cce7c2/diff:/var/lib/docker/overlay2/0b6935279d515d0669f8a5565cbc0e4bf36dd21bd6632069d3192c34ec4f7d01/diff:/var/lib/docker/overlay2/11de313511ee3247d7543f251bf73d074c52306b38cb468d4406033996bb5579/diff:/var/lib/docker/overlay2/19a9f2c255dac237587a18b8f354abd11410dafc3284412d728c2bf7056313a0/diff:/var/lib/docker/overlay2/04234c51c40e87365dc7808e823f054571bbf2af947366973be0d82abe2d0a60/diff:/var/lib/docker/overlay2/80fe0530020bda727c7a7e05f8261966265ca1a07871152cf5c223d362ec80c3/diff:/var/lib/docker/overlay2/1abcbc54c55abf66822fefb38dbb5816fc59bdb685ced0528db4a3bcf62ebe9b/diff:/var/lib/docker/overlay2/51d8beae216b4b10c5755ae738c4abd12efa13403c8d1c5b0ced1fd44b9b48f3/diff",
"MergedDir": "/var/lib/docker/overlay2/ce38c2ce47ad7d4bbbd8f5c7c4ed567023d2121aef7d943d0fa2550593cfbeea/merged",
"UpperDir": "/var/lib/docker/overlay2/ce38c2ce47ad7d4bbbd8f5c7c4ed567023d2121aef7d943d0fa2550593cfbeea/diff",
"WorkDir": "/var/lib/docker/overlay2/ce38c2ce47ad7d4bbbd8f5c7c4ed567023d2121aef7d943d0fa2550593cfbeea/work"
},
"Name": "overlay2"
},
"Mounts": [
{
"Type": "volume",
"Name": "html",
"Source": "/var/lib/docker/volumes/html/_data",
"Destination": "/var/www/html",
"Driver": "local",
"Mode": "z",
"RW": true,
"Propagation": ""
}
],
"Config": {
"Hostname": "6ffb084b515c",
"Domainname": "",
"User": "",
"AttachStdin": false,
"AttachStdout": false,
"AttachStderr": false,
"ExposedPorts": {
"80/tcp": {}
},
"Tty": false,
"OpenStdin": false,
"StdinOnce": false,
"Env": [
"WORDPRESS_DB_PASSWORD=password",
"WORDPRESS_DB_USER=root",
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"PHPIZE_DEPS=autoconf \t\tdpkg-dev \t\tfile \t\tg++ \t\tgcc \t\tlibc-dev \t\tmake \t\tpkg-config \t\tre2c",
"PHP_INI_DIR=/usr/local/etc/php",
"APACHE_CONFDIR=/etc/apache2",
"APACHE_ENVVARS=/etc/apache2/envvars",
"PHP_CFLAGS=-fstack-protector-strong -fpic -fpie -O2 -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64",
"PHP_CPPFLAGS=-fstack-protector-strong -fpic -fpie -O2 -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64",
"PHP_LDFLAGS=-Wl,-O1 -pie",
"GPG_KEYS=42670A7FE4D0441C8E4632349E4FDC074A4EF02D 5A52880781F755608BF815FC910DEB46F53EA312",
"PHP_VERSION=7.4.30",
"PHP_URL=https://www.php.net/distributions/php-7.4.30.tar.xz",
"PHP_ASC_URL=https://www.php.net/distributions/php-7.4.30.tar.xz.asc",
"PHP_SHA256=ea72a34f32c67e79ac2da7dfe96177f3c451c3eefae5810ba13312ed398ba70d"
],
"Cmd": [
"apache2-foreground"
],
"Image": "wordpress:sad",
"Volumes": {
"/var/www/html": {}
},
"WorkingDir": "/var/www/html",
"Entrypoint": [
"docker-entrypoint.sh"
],
"OnBuild": null,
"Labels": {
"com.docker.compose.config-hash": "fa7c866c125b6bdbe158555fca49a8d871d32282817db2f396e0ff43d7a93eb4",
"com.docker.compose.container-number": "1",
"com.docker.compose.oneoff": "False",
"com.docker.compose.project": "admin",
"com.docker.compose.project.config_files": "docker-compose.yaml",
"com.docker.compose.project.working_dir": "/home/admin",
"com.docker.compose.service": "wordpress",
"com.docker.compose.version": "1.25.0"
},
"StopSignal": "SIGWINCH"
},
"NetworkSettings": {
"Bridge": "",
"SandboxID": "9f28df1912dade5b0d3001c875851cff085b100daa98390d546418207a0b962e",
"HairpinMode": false,
"LinkLocalIPv6Address": "",
"LinkLocalIPv6PrefixLen": 0,
"Ports": {
"80/tcp": [
{
"HostIp": "0.0.0.0",
"HostPort": "80"
}
]
},
"SandboxKey": "/var/run/docker/netns/9f28df1912da",
"SecondaryIPAddresses": null,
"SecondaryIPv6Addresses": null,
"EndpointID": "3928232aebd334a5aca3706dd3739f070e99bae3723e515a7792874a5f434bc0",
"Gateway": "172.17.0.1",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"IPAddress": "172.17.0.3",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"MacAddress": "02:42:ac:11:00:03",
"Networks": {
"bridge": {
"IPAMConfig": null,
"Links": null,
"Aliases": null,
"NetworkID": "e305027dab8764ad7d11a17ff5228c7b77fdcc18656a55788e9faf9ebe978ecd",
"EndpointID": "3928232aebd334a5aca3706dd3739f070e99bae3723e515a7792874a5f434bc0",
"Gateway": "172.17.0.1",
"IPAddress": "172.17.0.3",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"MacAddress": "02:42:ac:11:00:03",
"DriverOpts": null
}
}
}
}
]
This gives us the necessary parameters to recreate the container, and the passwords to access the database. docker inspect mariadb | grep IPAddress
gives us the IP Address to access MariaDB.
root@ip-172-31-19-232:~# docker exec -it wordpress mysql -u root -ppassword -h 172.17.0.2
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 6
Server version: 10.8.3-MariaDB-1:10.8.3+maria~jammy mariadb.org binary distribution
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]> use wordpress
Database changed
MariaDB [wordpress]> show tables;
Empty set (0.000 sec)
MariaDB [wordpress]>
So MariaDB is accessible fine from the wordpress container, just the hostname isn’t set up. There’s no --link
parameter isn’t given. So let’s recreate the container:
# docker stop wordpress
# docker rm wordpress
# docker create \
--name wordpress \
-v html:/var/www/html \
--link mariadb:mysql \
-p 80:80 \
-e WORDPRESS_DB_HOST=mysql \
-e WORDPRESS_DB_NAME=wordpress \
-e WORDPRESS_DB_USER=root \
-e WORDPRESS_DB_PASSWORD=password \
wordpress:sad
# docker start wordpress
As a oneliner:
docker stop wordpress ; \
docker rm wordpress ; \
docker run -d \
--name wordpress \
-v html:/var/www/html \
--link mariadb:mysql \
-p 80:80 \
-e WORDPRESS_DB_HOST=mysql \
-e WORDPRESS_DB_NAME=wordpress \
-e WORDPRESS_DB_USER=root \
-e WORDPRESS_DB_PASSWORD=password \
wordpress:sad
Singara
There’s a k3s Kubernetes install you can access with kubectl. The Kubernetes YAML manifests under /home/admin have been applied. The objective is to access from the host the “webapp” web server deployed and find what message it serves (it’s a name of a town or city btw). In order to pass the check, the webapp Docker container should not be run separately outside Kubernetes as a shortcut.
Let’s have a look at the cluster’s state:
root@ip-10-0-0-64:~# kubectl get -A pod
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system helm-install-traefik-crd-nml28 0/1 Completed 0 101d
kube-system helm-install-traefik-z54r4 0/1 Completed 2 101d
web webapp-deployment-666b67994b-5sffz 0/1 ImagePullBackOff 0 101d
kube-system coredns-b96499967-scfhc 1/1 Running 7 (101d ago) 101d
kube-system local-path-provisioner-7b7dc8d6f5-r8777 1/1 Running 7 (101d ago) 101d
kube-system metrics-server-668d979685-gkzrn 1/1 Running 11 (101d ago) 101d
kube-system svclb-traefik-7cfc151c-hqm5p 2/2 Running 8 (101d ago) 101d
kube-system traefik-7cd4fcff68-g8t6k 1/1 Running 7 (101d ago) 101d
kube-system svclb-traefik-7cfc151c-m6842 2/2 Running 2 (115s ago) 100d
Apparantly, the desired container image is missing from the container runtime’s image store. The intro text hints at it being present in the Docker Inc. runtime that’s installed on the system though:
root@ip-10-0-0-64:/home/admin# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
webapp latest 9c082e2983bc 3 months ago 135MB
python 3.7-slim c1d0bab51bbf 3 months ago 123MB
registry 2 3a0f7b0a13ef 4 months ago 24.1MB
There it is. But it’s not in k3s’ image store:
root@ip-10-0-0-64:~# k3s ctr i ls | grep webapp
root@ip-10-0-0-64:~#
Let’s add it to k3s’ containerd:
root@ip-10-0-0-64:~# sudo docker save webapp:latest | sudo k3s ctr images import -
unpacking docker.io/library/webapp:latest (sha256:25b7b5c6f8ff5fc3b3b89bb5a5c1b96c619e4152275638aeaa82841b4c14ebf8)...done
root@ip-10-0-0-64:~# k3s ctr i ls | grep webapp
docker.io/library/webapp:latest application/vnd.docker.distribution.manifest.v2+json sha256:25b7b5c6f8ff5fc3b3b89bb5a5c1b96c619e4152275638aeaa82841b4c14ebf8 84.1 MiB linux/amd64 io.cri-containerd.image=managed
Excellent. But the pod is still in ImagePullBackOff
:
root@ip-10-0-0-64:~# kubectl get -n web pod
NAME READY STATUS RESTARTS AGE
webapp-deployment-666b67994b-5sffz 0/1 ImagePullBackOff 0 101d
Let’s have a look at the manifests:
root@ip-10-0-0-64:~# cd ~admin
root@ip-10-0-0-64:/home/admin#
root@ip-10-0-0-64:/home/admin# cat deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: webapp-deployment
namespace: web
spec:
selector:
matchLabels:
app: webapp
replicas: 1
template:
metadata:
labels:
app: webapp
spec:
containers:
- name: webapp
image: webapp
imagePullPolicy: Always
ports:
- containerPort: 8880
root@ip-10-0-0-64:/home/admin# cat nodeport.yml
apiVersion: v1
kind: Service
metadata:
name: webapp-service
namespace: web
spec:
type: NodePort
selector:
app.kubernetes.io/name: webapp
ports:
- port: 80
targetPort: 8888
nodePort: 30007
The service doesn’t match the deployment, the labels don’t target the deployment, the port is 80 instead of 8888 and the service type doesn’t allocate the desired port on the host.
The deployment should not pull the image everytime and uses the wrong port, it should be 8888.
The manifests should look like this:
root@ip-10-0-0-64:/home/admin# cat deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: webapp-deployment
namespace: web
spec:
selector:
matchLabels:
app: webapp
replicas: 1
template:
metadata:
labels:
app: webapp
spec:
containers:
- name: webapp
image: webapp
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8888
root@ip-10-0-0-64:/home/admin# cat nodeport.yml
apiVersion: v1
kind: Service
metadata:
name: webapp-service
namespace: web
spec:
type: LoadBalancer
selector:
app: webapp
ports:
- port: 8888
targetPort: 8888
nodePort: 30007
Apply them with kubectl apply -f deployment.yaml -f nodeport.yaml
and verify the results:
root@ip-10-0-0-64:/home/admin# kubectl -n web get svc -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
webapp-service LoadBalancer 10.43.35.97 172.31.33.52 8888:30007/TCP 101d app=webapp
root@ip-10-0-0-64:/home/admin# iptables-save | grep 8888
# Warning: iptables-legacy tables present, use iptables-legacy-save to see them
-A CNI-DN-7a71a21266205d85a33bf -s 10.42.1.0/24 -p tcp -m tcp --dport 8888 -j CNI-HOSTPORT-SETMARK
-A CNI-DN-7a71a21266205d85a33bf -s 127.0.0.1/32 -p tcp -m tcp --dport 8888 -j CNI-HOSTPORT-SETMARK
-A CNI-DN-7a71a21266205d85a33bf -p tcp -m tcp --dport 8888 -j DNAT --to-destination 10.42.1.10:8888
-A CNI-HOSTPORT-DNAT -p tcp -m comment --comment "dnat name: \"cbr0\" id: \"1c91e84359f8f6ac9bdcde61f61f3fedf231aedbc8ca174fa473231ec41298e0\"" -m multiport --dports 8888 -j CNI-DN-7a71a21266205d85a33bf
-A KUBE-SEP-72FXFZAWXXLVOFL6 -p tcp -m comment --comment "web/webapp-service" -m tcp -j DNAT --to-destination 10.42.1.9:8888
-A KUBE-SERVICES -d 10.43.35.97/32 -p tcp -m comment --comment "web/webapp-service cluster IP" -m tcp --dport 8888 -j KUBE-SVC-6PCM5WOO3Q3DGU2X
-A KUBE-SERVICES -d 172.31.33.52/32 -p tcp -m comment --comment "web/webapp-service loadbalancer IP" -m tcp --dport 8888 -j KUBE-EXT-6PCM5WOO3Q3DGU2X
-A KUBE-SVC-6PCM5WOO3Q3DGU2X ! -s 10.42.0.0/16 -d 10.43.35.97/32 -p tcp -m comment --comment "web/webapp-service cluster IP" -m tcp --dport 8888 -j KUBE-MARK-MASQ
-A KUBE-SVC-6PCM5WOO3Q3DGU2X -m comment --comment "web/webapp-service -> 10.42.1.9:8888" -j KUBE-SEP-72FXFZAWXXLVOFL6
We’re done:
root@ip-10-0-0-64:/home/admin# curl localhost:8888
Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch
Karakorum
There’s a binary at
/home/admin/wtfit
that nobody knows how it works or what it does (“what the fun is this”). Someone remembers something about wtfit needing to communicate to a service in order to start. Run this wtfit program so it doesn’t exit with an error, fixing or working around things that you need but are broken in this server. (Note that you can open more than one web “terminal”).
admin@ip-172-31-42-20:~$ file wtfit
wtfit: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, Go BuildID=_iHDgQr9_DkcDPzFkdEV/KeopbEbY41Cs4Jx-plaD/bLd_UQ3kiokG80g9hhbY/qtKkikMAZmQbLl9qoMCF, not stripped
admin@ip-172-31-42-20:~$ ls -lhA
total 6.2M
-rw------- 1 admin admin 5 Sep 13 18:31 .bash_history
-rw-r--r-- 1 admin admin 220 Aug 4 2021 .bash_logout
-rw-r--r-- 1 admin admin 3.5K Aug 4 2021 .bashrc
-rw-r--r-- 1 admin admin 807 Aug 4 2021 .profile
drwx------ 2 admin admin 4.0K Aug 31 21:26 .ssh
drwxr-xr-x 2 admin admin 4.0K Sep 13 18:18 agent
-rw-r--r-- 1 admin admin 6.1M Sep 13 18:17 wtfit
-rw-r--r-- 1 admin admin 127 Dec 26 21:21 xxx
admin@ip-172-31-42-20:~$ chmod +x wtfit
bash: /usr/bin/chmod: Permission denied
Wat? Chmod is broken? Great…
sudo /lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 /usr/bin/chmod +x /usr/bin/chmod
STrace to the rescue…
admin@ip-172-31-42-20:~$ strace -o xxx -f ./wtfit
ERROR: can't open config file
Let’s look at xxx
:
972 openat(AT_FDCWD, "/home/admin/wtfitconfig.conf", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or d
irectory)
972 write(1, "ERROR: can't open config file\n", 30) = 30
ok, again with the strace-and-looking-at-the-dump dance:
986 connect(3, {sa_family=AF_INET, sin_port=htons(7777), sin_addr=inet_addr("127.0.0.1")}, 16 <unfinished ...>
987 <... nanosleep resumed>NULL) = 0
987 nanosleep({tv_sec=0, tv_nsec=320000}, <unfinished ...>
986 <... connect resumed>) = -1 EINPROGRESS (Operation now in progress)
986 epoll_ctl(4, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=3781652200, u64=140337543081704}}) = 0
990 <... epoll_pwait resumed>[{EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|EPOLLRDHUP, {u32=3781652200, u64=140337543081704}}], 128, 29999, NULL, 580717888312) = 1
986 write(6, "\0", 1 <unfinished ...>
990 epoll_pwait(4, <unfinished ...>
986 <... write resumed>) = 1
990 <... epoll_pwait resumed>[{EPOLLIN, {u32=8851208, u64=8851208}}], 128, 29989, NULL, 580717888262) = 1
986 futex(0xc000034948, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
990 read(5, <unfinished ...>
986 <... futex resumed>) = 1
990 <... read resumed>"\0", 16) = 1
989 <... futex resumed>) = 0
987 <... nanosleep resumed>NULL) = 0
986 getsockopt(3, SOL_SOCKET, SO_ERROR, <unfinished ...>
990 futex(0xc000080148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
989 epoll_pwait(4, <unfinished ...>
986 <... getsockopt resumed>[ECONNREFUSED], [4]) = 0
986 futex(0xc000080148, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
987 nanosleep({tv_sec=0, tv_nsec=640000}, <unfinished ...>
986 <... futex resumed>) = 1
990 <... futex resumed>) = 0
986 epoll_ctl(4, EPOLL_CTL_DEL, 3, 0xc000093324 <unfinished ...>
990 futex(0xc000080148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
986 <... epoll_ctl resumed>) = 0
986 close(3) = 0
986 futex(0xc000080148, FUTEX_WAKE_PRIVATE, 1) = 1
990 <... futex resumed>) = 0
990 nanosleep({tv_sec=0, tv_nsec=3000}, <unfinished ...>
986 write(1, "ERROR: can't connect to server\n", 31 <unfinished ...>
987 <... nanosleep resumed>NULL) = 0
987 nanosleep({tv_sec=0, tv_nsec=1280000}, <unfinished ...>
986 <... write resumed>) = 31
986 exit_group(1 <unfinished ...>
990 <... nanosleep resumed>NULL) = 0
986 <... exit_group resumed>) = ?
989 <... epoll_pwait resumed> <unfinished ...>) = ?
988 <... futex resumed>) = ?
987 <... nanosleep resumed> <unfinished ...>) = ?
990 +++ exited with 1 +++
987 +++ exited with 1 +++
988 +++ exited with 1 +++
989 +++ exited with 1 +++
986 +++ exited with 1 +++
So it wants to connect to localhost:7777
but can’t, as there’s no service listening, as evidenced by ss
. No mentions of the port in /etc
.
Sigh so I guess we have to improvise a server. Let’s try with Netcat
:
admin@ip-172-31-42-20:~$ nc -p 7777 -l &
[1] 1042
admin@ip-172-31-42-20:~$ strace -f -o xxx ./wtfit
GET / HTTP/1.1
Host: localhost:7777
User-Agent: Go-http-client/1.1
Accept-Encoding: gzip
^Z
[2]+ Stopped strace -f -o xxx ./wtfit
admin@ip-172-31-42-20:~$ bg
[2]+ strace -f -o xxx ./wtfit &
admin@ip-172-31-42-20:~$ fg
nc -p 7777 -l
^C
admin@ip-172-31-42-20:~$ ERROR: can't connect to server
[2]+ Exit 1 strace -f -o xxx ./wtfit
admin@ip-172-31-42-20:~$
So the binary sends an HTTP request to the endpoint in question and waits for an answer. So we need to point it to a web server, but we can’t install stuff. Python’s embedded webserver to the rescue:
admin@ip-172-31-42-20:~$ python3 -m http.server --bind 127.0.0.1 7777 &
[1] 1056
admin@ip-172-31-42-20:~$ Serving HTTP on 127.0.0.1 port 7777 (http://127.0.0.1:7777/) ...
admin@ip-172-31-42-20:~$ strace -f -o xxx ./wtfit
127.0.0.1 - - [26/Dec/2022 21:35:23] "GET / HTTP/1.1" 200 -
OK.
As a oneliner:
sudo /lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 /usr/bin/chmod +x /usr/bin/chmod ; \
cd ; \
chmod +x wtfit ; \
touch wtfitconfig.conf ; \
python3 -m http.server --bind localhost 7777 & ; \
./wtfit
Oaxaca
lsof /home/admin/somefile
and echo $$
reveals the bash we’re in has the file handle open. So we need to find out which one it is:
# Find the file descriptor
fd=find /proc/$$/fd -lname '/home/admin/somefile' | grep -oEe "[0-9]+$"
# Close the file descriptor
exec ${fd}<&-
As a oneliner:
exec $(find /proc/$$/fd -lname '/home/admin/somefile' | grep -oEe "[0-9]+$")<&-
Venice
Try and figure out if you are inside a container (like a Docker one for example) or inside a Virtual Machine (like in the other scenarios).
So we’re in a VM :-)
root@ip-172-31-38-151:/# dmesg | grep Hyper
[ 0.000000] Hypervisor detected: KVM
root@ip-172-31-38-151:/# mount
overlay on / type overlay (rw,relatime,lowerdir=/var/lib/containers/storage/overlay/l/4UGO27476EXYY2UQSGBWL6EZC4:/var/lib/containers/storage/overlay/l/3JZQV4UY3FCO3W7SL3ERYENEZN:/var/lib/containers/storage/overlay/l/SCP4AZOKFN5HY4R5CQ5UVOYS7K:/var/lib/containers/storage/overlay/l/LPA46WOYFQ5ZHRJPEGNEEUCN36:/var/lib/containers/storage/overlay/l/WYTOBRCWJIZJALTLB3T5GBAQAB,upperdir=/var/lib/containers/storage/overlay/fb7c300d0cb0c61b15087365f7e0043090a7be0369605bf1d6efb574f78bd65a/diff,workdir=/var/lib/containers/storage/overlay/fb7c300d0cb0c61b15087365f7e0043090a7be0369605bf1d6efb574f78bd65a/work)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev type tmpfs (rw,nosuid,size=65536k,mode=755)
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,relatime)
tmpfs on /run type tmpfs (rw,nosuid,nodev,relatime)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=64000k)
tmpfs on /etc/hosts type tmpfs (rw,nosuid,nodev,noexec,relatime,size=46632k,mode=755)
tmpfs on /etc/resolv.conf type tmpfs (rw,nosuid,nodev,noexec,relatime,size=46632k,mode=755)
tmpfs on /etc/hostname type tmpfs (rw,nosuid,nodev,noexec,relatime,size=46632k,mode=755)
tmpfs on /run/.containerenv type tmpfs (rw,nosuid,nodev,noexec,relatime,size=46632k,mode=755)
mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,relatime)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666)
tmpfs on /var/log/journal type tmpfs (rw,nosuid,nodev,relatime)
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,relatime,nsdelegate,memory_recursiveprot)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=28,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=13272)
debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
tracefs on /sys/kernel/tracing type tracefs (rw,nosuid,nodev,noexec,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=2M)
fusectl on /sys/fs/fuse/connections type fusectl (rw,nosuid,nodev,noexec,relatime)
configfs on /sys/kernel/config type configfs (rw,nosuid,nodev,noexec,relatime)
The mountpoints /
and /run/.containerenv
heavily hint that this system is a container. As we’re able to call poweroff
and it works, it’s likely that our container is privileged.
Melbourne
There is a Python WSGI web application file at
/home/admin/wsgi.py
, the purpose of which is to serve the string “Hello, world!”. This file is served by a Gunicorn server which is fronted by an nginx server (both servers managed by systemd). So the flow of an HTTP request is: Web Client (curl) -> Nginx -> Gunicorn -> wsgi.py . The objective is to be able to curl the localhost (on default port :80) and get back “Hello, world!”, using the current setup.
Let’s see what we have there:
admin@ip-172-31-42-37:/$ curl -s http://localhost
admin@ip-172-31-42-37:/$ ss -panelt
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
LISTEN 0 128 0.0.0.0:22 0.0.0.0:* ino:10776 sk:1 cgroup:/system.slice/ssh.service <->
LISTEN 0 4096 *:6767 *:* users:(("sadagent",pid=600,fd=7)) uid:1000 ino:10760 sk:2 cgroup:/system.slice/sadagent.service v6only:0 <->
LISTEN 0 4096 *:8080 *:* users:(("gotty",pid=599,fd=6)) uid:1000 ino:11485 sk:3 cgroup:/system.slice/gotty.service v6only:0 <->
LISTEN 0 128 [::]:22 [::]:* ino:10778 sk:4 cgroup:/system.slice/ssh.service v6only:1 <->
Great. Nothing’s running. Let’s start Nginx then:
admin@ip-172-31-42-37:/$ sudo systemctl enable --now nginx
Synchronizing state of nginx.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable nginx
Created symlink /etc/systemd/system/multi-user.target.wants/nginx.service → /lib/systemd/system/nginx.service.
admin@ip-172-31-42-37:/$ systemctl status nginx
● nginx.service - A high performance web server and a reverse proxy server
Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2022-12-27 19:06:49 UTC; 5s ago
Docs: man:nginx(8)
Process: 936 ExecStartPre=/usr/sbin/nginx -t -q -g daemon on; master_process on; (code=exited, statu>
Process: 937 ExecStart=/usr/sbin/nginx -g daemon on; master_process on; (code=exited, status=0/SUCCE>
Main PID: 938 (nginx)
Tasks: 3 (limit: 524)
Memory: 11.1M
CPU: 36ms
CGroup: /system.slice/nginx.service
├─938 nginx: master process /usr/sbin/nginx -g daemon on; master_process on;
├─939 nginx: worker process
└─940 nginx: worker process
Dec 27 19:06:49 ip-172-31-42-37 systemd[1]: Starting A high performance web server and a reverse proxy s>
Dec 27 19:06:49 ip-172-31-42-37 systemd[1]: Started A high performance web server and a reverse proxy se>
Ok, nginx is up. So is gunicorn:
admin@ip-172-31-42-37:/$ systemctl status gunicorn
● gunicorn.service - gunicorn daemon
Loaded: loaded (/etc/systemd/system/gunicorn.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2022-12-27 19:05:12 UTC; 6min ago
TriggeredBy: ● gunicorn.socket
Main PID: 615 (gunicorn)
Tasks: 2 (limit: 524)
Memory: 17.1M
CPU: 333ms
CGroup: /system.slice/gunicorn.service
├─615 /usr/bin/python3 /usr/local/bin/gunicorn --bind unix:/run/gunicorn.sock wsgi
└─670 /usr/bin/python3 /usr/local/bin/gunicorn --bind unix:/run/gunicorn.sock wsgi
Dec 27 19:05:12 ip-172-31-42-37 systemd[1]: Started gunicorn daemon.
Dec 27 19:05:13 ip-172-31-42-37 gunicorn[615]: [2022-12-27 19:05:13 +0000] [615] [INFO] Starting gunicorn 20.1.0
Dec 27 19:05:13 ip-172-31-42-37 gunicorn[615]: [2022-12-27 19:05:13 +0000] [615] [INFO] Listening at: unix:/run/gunicorn.sock (615)
Dec 27 19:05:13 ip-172-31-42-37 gunicorn[615]: [2022-12-27 19:05:13 +0000] [615] [INFO] Using worker: sync
Dec 27 19:05:13 ip-172-31-42-37 gunicorn[670]: [2022-12-27 19:05:13 +0000] [670] [INFO] Booting worker with pid: 670
And it’s listening on a unix socket. So let’s see what it’s doing:
admin@ip-172-31-42-37:/$ curl -s http://localhost
<html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.18.0</center>
</body>
</html>
Nothing useful. Have a look at the config then:
admin@ip-172-31-42-37:/$ cat /etc/nginx/sites-enabled/default
server {
listen 80;
location / {
include proxy_params;
proxy_pass http://unix:/run/gunicorn.socket;
}
}
That’s not what Nginx expects. Correct this and use the correct socket location given by systemctl status gunicorn
’s output:
upstream gunicorn {
server unix:/run/gunicorn.sock;
}
server {
listen 80;
location / {
include proxy_params;
proxy_pass http://gunicorn;
}
}
Now Nginx correctly sends Gunicorn’s output:
root@ip-172-31-42-1:~# curl -v http://localhost/
* Connected to localhost (127.0.0.1) port 80 (#0)
> GET / HTTP/1.1
> Host: localhost
> User-Agent: curl/7.74.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: gunicorn
< Date: Tue, 27 Dec 2022 19:27:56 GMT
< Connection: close
< Content-Type: text/html
< Content-Length: 0
<
* Closing connection 0
But it’s empty? Ok, so check its socket directly:
root@ip-172-31-42-1:~# curl -v --unix-socket /run/gunicorn.sock http://localhost/
* Trying /run/gunicorn.sock:0...
* Connected to localhost (/run/gunicorn.sock) port 80 (#0)
> GET / HTTP/1.1
> Host: localhost
> User-Agent: curl/7.74.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: gunicorn
< Date: Tue, 27 Dec 2022 19:27:56 GMT
< Connection: close
< Content-Type: text/html
< Content-Length: 0
<
* Closing connection 0
Hm. The response’s content length should read differently. What’s going on? We need to look at the source code:
def application(environ, start_response):
start_response('200 OK', [('Content-Type', 'text/html'), ('Content-Length', 0)])
return [b'Hello, world!']
The application should correctly set the payload’s content length – or leave it out to use chunked transfer encoding. Change the source, Luke:
def application(environ, start_response):
start_response('200 OK', [('Content-Type', 'text/html')])
return [b'Hello, world!']
Restart gunicorn and test:
root@ip-172-31-42-1:~# systemctl restart gunicorn
root@ip-172-31-42-1:~# curl -v --unix-socket /run/gunicorn.sock http://localhost/
* Trying /run/gunicorn.sock:0...
* Connected to localhost (/run/gunicorn.sock) port 80 (#0)
> GET / HTTP/1.1
> Host: localhost
> User-Agent: curl/7.74.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: gunicorn
< Date: Tue, 27 Dec 2022 19:32:32 GMT
< Connection: close
< Transfer-Encoding: chunked
< Content-Type: text/html
<
* Closing connection 0
Hello, world!
Excellent, we’re done.
As a short script:
cat > /etc/nginx/sites-enabled/default << EOF
upstream gunicorn {
server unix:/run/gunicorn.sock;
}
server {
listen 80;
location / {
include proxy_params;
proxy_pass http://gunicorn;
}
}
EOF
cat > /home/admin/wsgi.py << EOF
def application(environ, start_response):
start_response('200 OK', [('Content-Type', 'text/html')])
return [b'Hello, world!']
EOF
systemctl restart gunicorn
systemctl enable --now nginx
Hong-Kong
(Similar to “Manhattan” scenario but harder). Your objective is to be able to insert a row in an existing Postgres database. The issue is not specific to Postgres and you don’t need to know details about it (although it may help).
Postgres information: it’s a service that listens to a port (:5432) and writes to disk in a data directory, the location of which is defined in the data_directory parameter of the configuration file /etc/postgresql/14/main/postgresql.conf. In our case Postgres is managed by systemd as a unit with name postgresql.
Let’s have a looksie:
root@ip-172-31-25-11:/# systemctl status postgresql
● postgresql.service - PostgreSQL RDBMS
Loaded: loaded (/lib/systemd/system/postgresql.service; enabled; vendor preset: enabled)
Active: active (exited) since Tue 2022-12-27 19:50:14 UTC; 2min 13s ago
Process: 595 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
Main PID: 595 (code=exited, status=0/SUCCESS)
Dec 27 19:50:14 ip-172-31-25-11 systemd[1]: Starting PostgreSQL RDBMS...
Dec 27 19:50:14 ip-172-31-25-11 systemd[1]: Started PostgreSQL RDBMS.
Great. According to Debian’s docs, we’re supposed to use the postgresql@.service
unit, matching the /etc/postgresql/14/main
tree as mentioned:
root@ip-172-31-25-11:/# systemctl status postgresql@14-main
● postgresql@14-main.service - PostgreSQL Cluster 14-main
Loaded: loaded (/lib/systemd/system/postgresql@.service; enabled-runtime; vendor preset: enabled)
Active: failed (Result: protocol) since Tue 2022-12-27 20:29:00 UTC; 47s ago
Process: 581 ExecStart=/usr/bin/pg_ctlcluster --skip-systemctl-redirect 14-main start (code=exited, sta
Dec 27 20:29:00 ip-172-31-25-11 systemd[1]: Starting PostgreSQL Cluster 14-main...
Dec 27 20:29:00 ip-172-31-25-11 postgresql@14-main[581]: Error: /opt/pgdata/main is not accessible or doe
Dec 27 20:29:00 ip-172-31-25-11 systemd[1]: postgresql@14-main.service: Can't open PID file /run/postgres
Dec 27 20:29:00 ip-172-31-25-11 systemd[1]: postgresql@14-main.service: Failed with result 'protocol'.
Dec 27 20:29:00 ip-172-31-25-11 systemd[1]: Failed to start PostgreSQL Cluster 14-main.
Ok, so the data is missing, /opt/pgdata
is empty. Let’s look at /etc/fstab
:
root@ip-172-31-25-11:/# cat /etc/fstab
# /etc/fstab: static file system information
UUID=5db68868-2d70-449f-8b1d-f3c769ec01c7 / ext4 rw,discard,errors=remount-ro,x-systemd.growfs 0 1
UUID=72C9-F191 /boot/efi vfat defaults 0 0
/dev/xvdb /opt/pgdata xfs defaults,nofail 0 0
Verify that the disk is present:
root@ip-172-31-25-11:/# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1 259:0 0 8G 0 disk
nvme1n1 259:1 0 8G 0 disk
├─nvme1n1p1 259:2 0 7.9G 0 part /
├─nvme1n1p14 259:3 0 3M 0 part
└─nvme1n1p15 259:4 0 124M 0 part /boot/efi
root@ip-172-31-25-11:/# blkid /dev/nvme0n1
/dev/nvme0n1: UUID="9a2e1bf9-50e8-41a9-9f8c-5dd837869c50" TYPE="xfs"
No it’s not. But /dev/nvme0n1
is a likely candidate. Beware that /etc/fstab
contains the nofail
flag. You need to restart SystemD after changing /etc/fstab
:
root@ip-172-31-25-11:/# cat /etc/fstab
# /etc/fstab: static file system information
UUID=5db68868-2d70-449f-8b1d-f3c769ec01c7 / ext4 rw,discard,errors=remount-ro,x-systemd.growfs 0 1
UUID=72C9-F191 /boot/efi vfat defaults 0 0
/dev/nvme0n1 /opt/pgdata xfs defaults,nofail 0 0
root@ip-172-31-25-11:/# systemctl daemon-reload
root@ip-172-31-25-11:/# mount /opt/pgdata
root@ip-172-31-25-11:/# ls /opt/pgdata
deleteme file1.bk file2.bk file3.bk main
root@ip-172-31-25-11:/# df -h
Filesystem Size Used Avail Use% Mounted on
udev 224M 0 224M 0% /dev
tmpfs 47M 1.5M 45M 4% /run
/dev/nvme1n1p1 7.7G 1.2G 6.1G 17% /
tmpfs 233M 0 233M 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 233M 0 233M 0% /sys/fs/cgroup
/dev/nvme1n1p15 124M 278K 124M 1% /boot/efi
/dev/nvme0n1 8.0G 8.0G 28K 100% /opt/pgdata
Mounting works, but the disk is full as it was in Manhattan.
Purge the *.bk
files from disk, restart the service and we’re done:
root@ip-172-31-25-11:/# rm -f /opt/pgdata/*.bk
root@ip-172-31-25-11:/# systemctl start postgresql@14-main
Lisbon
There’s an etcd server running on
https://localhost:2379
, get the value for the key “foo”, ieetcdctl get foo
orcurl https://localhost:2379/v2/keys/foo
Tue Jan 2 07:41:39 UTC 2024
admin@ip-10-0-0-196:/$ curl https://localhost:2379/v2/keys/foo
curl: (60) SSL certificate problem: certificate has expired
More details here: https://curl.se/docs/sslcerts.html
curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.
admin@ip-10-0-0-196:/$ etcdctl get foo
Error: client: etcd cluster is unavailable or misconfigured; error #0: x509: certificate has expired or is not yet valid: current time 2024-01-02T07:42:07Z is after 2023-01-30T00:02:48Z
error #0: x509: certificate has expired or is not yet valid: current time 2024-01-02T07:42:07Z is after 2023-01-30T00:02:48Z
This server’s clock is off by a year - reset it:
admin@ip-10-0-0-196:/$ sudo date -s "2023-01-02 08:30:00+0100"
admin@ip-10-0-0-196:/$ curl https://localhost:2379/v2/keys/foo
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx/1.18.0</center>
</body>
</html>
Good, curl
can access the host now. But that’s nginx
that’s replying, not etcd
?
admin@ip-10-0-0-196:/$ systemctl status nginx
● nginx.service - A high performance web server and a reverse proxy server
Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2023-01-02 07:41:28 UTC; 6min ago
Docs: man:nginx(8)
Process: 581 ExecStartPre=/usr/sbin/nginx -t -q -g daemon on; master_process on; (code=exited, status=0/SUCCESS)
Process: 601 ExecStart=/usr/sbin/nginx -g daemon on; master_process on; (code=exited, status=0/SUCCESS)
Main PID: 610 (nginx)
Tasks: 3 (limit: 521)
Memory: 12.9M
CPU: 203ms
CGroup: /system.slice/nginx.service
├─610 nginx: master process /usr/sbin/nginx -g daemon on; master_process on;
├─611 nginx: worker process
└─612 nginx: worker process
Jan 02 07:41:27 ip-10-0-0-196 systemd[1]: Starting A high performance web server and a reverse proxy server...
Jan 02 07:41:28 ip-10-0-0-196 systemd[1]: nginx.service: Failed to parse PID from file /run/nginx.pid: Invalid argument
Jan 02 07:41:28 ip-10-0-0-196 systemd[1]: Started A high performance web server and a reverse proxy server.
admin@ip-10-0-0-196:/$ ss -panetl
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
LISTEN 0 4096 127.0.0.1:2379 0.0.0.0:* uid:108 ino:12125 sk:1 cgroup:/system.slice/etcd.service <->
LISTEN 0 4096 127.0.0.1:2380 0.0.0.0:* uid:108 ino:12121 sk:2 cgroup:/system.slice/etcd.service <->
LISTEN 0 128 0.0.0.0:22 0.0.0.0:* ino:11458 sk:3 cgroup:/system.slice/ssh.service <->
LISTEN 0 511 0.0.0.0:443 0.0.0.0:* ino:11464 sk:4 cgroup:/system.slice/nginx.service <->
LISTEN 0 4096 *:6767 *:* users:(("sadagent",pid=563,fd=7)) uid:1000 ino:10633 sk:5 cgroup:/system.slice/sadagent.service v6only:0 <->
LISTEN 0 4096 *:8080 *:* users:(("gotty",pid=562,fd=6)) uid:1000 ino:10642 sk:6 cgroup:/system.slice/gotty.service v6only:0 <->
LISTEN 0 128 [::]:22 [::]:* ino:11460 sk:7 cgroup:/system.slice/ssh.service v6only:1 <->
admin@ip-10-0-0-196:/$ grep -ri 2379 /etc/nginx/
Everything as it should be. What does etcd
have to say?
admin@ip-10-0-0-196:/$ systemctl status etcd
● etcd.service - etcd - highly-available key value store
Loaded: loaded (/lib/systemd/system/etcd.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2023-01-02 07:46:08 UTC; 4min 1s ago
Docs: https://etcd.io/docs
man:etcd
Main PID: 873 (etcd)
Tasks: 9 (limit: 521)
Memory: 7.8M
CPU: 2.050s
CGroup: /system.slice/etcd.service
└─873 /usr/bin/etcd --cert-file /etc/ssl/certs/localhost.crt --key-file /etc/ssl/certs/localhost.key --advertise-client-urls=https://localhost:2379 --listen-client-urls=https://localhost:2379
Jan 02 07:46:08 ip-10-0-0-196 etcd[873]: 8e9e05c52164694d became candidate at term 19
Jan 02 07:46:08 ip-10-0-0-196 etcd[873]: 8e9e05c52164694d received MsgVoteResp from 8e9e05c52164694d at term 19
Jan 02 07:46:08 ip-10-0-0-196 etcd[873]: 8e9e05c52164694d became leader at term 19
Jan 02 07:46:08 ip-10-0-0-196 etcd[873]: raft.node: 8e9e05c52164694d elected leader 8e9e05c52164694d at term 19
Jan 02 07:46:08 ip-10-0-0-196 etcd[873]: published {Name:ip-10-0-0-196 ClientURLs:[https://localhost:2379]} to cluster cdf818194e3a8c32
Jan 02 07:46:08 ip-10-0-0-196 etcd[873]: ready to serve client requests
Jan 02 07:46:08 ip-10-0-0-196 systemd[1]: Started etcd - highly-available key value store.
Jan 02 07:46:09 ip-10-0-0-196 etcd[873]: serving client requests on 127.0.0.1:2379
Jan 02 07:46:11 ip-10-0-0-196 etcd[873]: WARNING: 2023/01/02 07:46:11 grpc: addrConn.createTransport failed to connect to {localhost:2379 <nil> 0 <nil>}. Err :tls: use of closed connection. Reconnecting...
Jan 02 07:46:23 ip-10-0-0-196 etcd[873]: WARNING: 2023/01/02 07:46:23 grpc: addrConn.createTransport failed to connect to {localhost:2379 <nil> 0 <nil>}. Err :tls: use of closed connection. Reconnecting...
etcd
does get connections, but something doesn’t work - So what’s interfering? IPTables?
admin@ip-10-0-0-196:/$ sudo iptables-save
# Generated by iptables-save v1.8.7 on Mon Jan 2 07:52:37 2023
*nat
:PREROUTING ACCEPT [13:1812]
:INPUT ACCEPT [11:660]
:OUTPUT ACCEPT [13:988]
:POSTROUTING ACCEPT [50:3208]
:DOCKER - [0:0]
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT -o lo -p tcp -m tcp --dport 2379 -j REDIRECT --to-ports 443
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
-A DOCKER -i docker0 -j RETURN
COMMIT
# Completed on Mon Jan 2 07:52:37 2023
# Generated by iptables-save v1.8.7 on Mon Jan 2 07:52:37 2023
*filter
:INPUT ACCEPT [2199:212561]
:FORWARD DROP [0:0]
:OUTPUT ACCEPT [1644:453532]
:DOCKER - [0:0]
:DOCKER-ISOLATION-STAGE-1 - [0:0]
:DOCKER-ISOLATION-STAGE-2 - [0:0]
:DOCKER-USER - [0:0]
-A FORWARD -j DOCKER-USER
-A FORWARD -j DOCKER-ISOLATION-STAGE-1
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN
-A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP
-A DOCKER-ISOLATION-STAGE-2 -j RETURN
-A DOCKER-USER -j RETURN
COMMIT
# Completed on Mon Jan 2 07:52:37 2023
The -A OUTPUT -o lo -p tcp -m tcp --dport 2379 -j REDIRECT --to-ports 443
line is the smoking gun right there. sudo docker ps -a
verifies that Docker isn’t relevant to this exercise, so purge the IPTables NAT table:
admin@ip-10-0-0-196:/$ sudo iptables -t nat -F
admin@ip-10-0-0-196:/$ etcdctl cluster-health
member 8e9e05c52164694d is healthy: got healthy result from https://localhost:2379
cluster is healthy
admin@ip-10-0-0-196:/$ etcdctl get foo
bar
And presto, all is well with etcd
!