App outbound connecting timed out because App selected a ephemeral port which collisions with the existing socket on 15001(outbound) listener#
Sidecar intercept and TCP proxying all outbound TCP connection by default:
(app --[conntrack DNAT]--> sidecar) -----> upstream-tcp-service
But in some scenarios, App just get a connect timed out error when connecting to the sidecar 15001(outbound) listener.
Scenarios:
When sidecar has a half-open connection to App. e.g:
$ ss tcp FIN-WAIT-2 0 0 127.0.0.1:15001 172.29.73.7:44410(POD_IP:ephemeral_port)
This can happen, eg: TCP Proxy half-closed connection leak for 1 hour in some scenarios #43297
There is no track entry in conntrack table because
nf_conntrack_tcp_timeout_close_wait
time out and expired.App invoke syscall
connect(sockfd, peer_addr)
, kernel allocation aephemeral port
(44410 in this case) , bind the new socket to thatephemeral port
and sentSYN
packet to peer.SYN
packet reach conntrack and it create atrack entry
inconntrack table
:
$ conntrack -L
tcp 6 108 SYN_SENT src=172.29.73.7 dst=172.21.206.198 sport=44410 dport=7777 src=127.0.0.1 dst=172.29.73.7 sport=15001 dport=44410
SYN
packet DNAT to127.0.0.1:15001
SYN
packet reach the already existingFIN-WAIT-2 127.0.0.1:15001 172.29.73.7:44410
socket, then sidecar reply aTCP Challenge ACK
(TCP seq-no is from the oldFIN-WAIT-2
) packet to AppApp reply the
TCP Challenge ACK
with aRST
(TCP seq-no is from theTCP Challenge ACK
)Conntrack
get theRST
packet and check it. In some kernel version,conntrack
justinvalid
theRST
packet because theseq-no
is out of thetrack entry
inconntrack table
which created in step 3.App will retransmit
SYN
but all without an expectedSYN/ACK
reply. Connect timed out will happen on App user space.
Different kernel version may have different packet validate rule in step 7:
RST packet mark as invalid:
SUSE Linux Enterprise Server 15 SP4:
5.14.21-150400.24.21-default
# cat /proc/sys/net/netfilter/nf_conntrack_tcp_ignore_invalid_rst
0
####################
RST packet passed check and NATed:
Ubuntu 20.04.2:
5.4.0-137-generic
# cat /proc/sys/net/netfilter/nf_conntrack_tcp_ignore_invalid_rst
cat: /proc/sys/net/netfilter/nf_conntrack_tcp_ignore_invalid_rst: No such file or directory
It seems related to kernel patch: Add tcp_ignore_invalid_rst sysctl to allow to disable out of segment RSTs which merge to kernel after kernel v5.14
Good news is that, someone will fix the problem at kernel v6.2-rc7: netfilter: conntrack: handle tcp challenge acks during connection reuse:
When a connection is re-used, following can happen:
[ connection starts to close, fin sent in either direction ]
> syn # initator quickly reuses connection
< ack # peer sends a challenge ack
> rst # rst, sequence number == ack_seq of previous challenge ack
> syn # this syn is expected to pass
Problem is that the rst will fail window validation, so it gets
tagged as invalid.
If ruleset drops such packets, we get repeated syn-retransmits until
initator gives up or peer starts responding with syn/ack.
But in some scenarios and kernel version, it will be an issue anyway.
Base knowledge#
TCP#
TCP Challenge ACK#
Conntrack table#
list conntrack table command and output explain:
conntrack -L tcp 6 54(entry expire seconds) CLOSE_WAIT(entry state) (local endpoint):src=172.29.73.7 dst=172.21.206.198 sport=44410 dport=7777 (remote endpoint)src=127.0.0.1 dst=172.29.73.7 sport=15001 dport=44410 [ASSURED] mark=0 use=1
Conntrack by default will not DNAT any invalid(Not in tcp connection window) tcp RST packet.
Environment#
$ ./istioctl version
client version: 1.16.2
control plane version: 1.16.2
data plane version: 1.16.2 (4 proxies)
$ kubectl version --short
Flag --short has been deprecated, and will be removed in the future. The --short output will become the default.
Client Version: v1.25.3
Kustomize Version: v4.5.7
Server Version: v1.25.6
CNI: Calico
Testing POD#
IP: 172.29.73.7
POD Name: fortio-server-l2-0
pod running at worker node: sle
Worker Node:
Kernel: 5.14.21-150400.24.21-default
Linux distros: SUSE Linux Enterprise Server 15 SP4
TCP service pod#
IP: 172.21.206.198
POD Name: netshoot-0
# listen on port 7777
kubectl -it exec netshoot-0 -- nc -l -k 7777
enable conntrack’s invalid packet log on Testing POD#
enable nf_log_all_netns
on worker node:
sle:/proc/sys # sudo su
sle:/proc/sys # echo 1 > /proc/sys/net/netfilter/nf_log_all_netns
add iptables rule on container:
fortio-server-l2-0:/ $
iptables -t mangle -N ISTIO_INBOUND_LOG
iptables -t mangle -A ISTIO_INBOUND_LOG -m conntrack --ctstate INVALID -j LOG --log-level debug --log-prefix 'ISTIO_INBOUND_INVALID: '
iptables -t mangle -A PREROUTING -p tcp -j ISTIO_INBOUND_LOG
iptables -t mangle -N ISTIO_OUTBOUND_LOG
iptables -t mangle -A ISTIO_OUTBOUND_LOG -m conntrack --ctstate INVALID -j LOG --log-level debug --log-prefix 'ISTIO_OUTBOUND_INVALID: '
iptables -t mangle -A OUTPUT -p tcp -j ISTIO_OUTBOUND_LOG
New connection timeout#
App build connection on the same ephemeral port#
Make a connection by specified ephemeral port. e.g:
nc -p 44410 172.21.206.198 7777
List sockets with timer(state/retry timeout) info. e.g:
ss -naoep tcp SYN-SENT 0 1 172.29.73.7:44410 172.21.206.198:7777 users:(("nc",pid=144426,fd=3)) timer:(on,1.684ms(timeout),2(retry counter)) ino:2024378629 sk:aa4e2 <->
Case 1: connect timed out by collision ephemeral port#
fortio-server-l2-0:/ # nc -p 44410 172.21.206.198 7777 #pid=144426
fortio-server-l2-0:/ # ss -naoep | egrep '44410|7777'
tcp SYN-SENT 0 1 172.29.73.7:44410 172.21.206.198:7777 users:(("nc",pid=144426,fd=3)) timer:(on,1.684ms,2) ino:2024378629 sk:aa4e2 <->
tcp FIN-WAIT-2 0 0 127.0.0.1:15001 172.29.73.7:44410 users:(("envoy",pid=51435,fd=121)) ino:2023763691 sk:aa125 <--
#tcp CLOSE-WAIT 0 0 172.29.73.7:38076 172.21.206.198:7777 users:(("envoy",pid=51435,fd=135)) ino:2023763692 sk:aa123 -->
#tcp CLOSE-WAIT 0 0 172.29.73.7:38078 172.21.206.198:7777 users:(("nc",pid=88235,fd=3)) ino:2024198076 sk:aa3d7 -->
fortio-server-l2-0:/ # conntrack -L 2>&1 | egrep "192.168.7.|10.96.94.44|172.21.206.198"
tcp 6 108 SYN_SENT src=172.29.73.7 dst=172.21.206.198 sport=44410 dport=7777 src=127.0.0.1 dst=172.29.73.7 sport=15001 dport=44410 mark=0 use=1
fortio-server-l2-0:/ # nstat -zas | egrep -i 'retr|cha'
TcpRetransSegs 165296 0.0
TcpExtTCPLostRetransmit 842 0.0
TcpExtTCPChallengeACK ↑ 140 0.0
TcpExtTCPSYNChallenge ↑ 140 0.0
TcpExtTCPSynRetrans ↑ 295 0.0
sle:/ $ dmesg --follow-new -T -l debug
[Sat Feb 4 16:46:40 2023] ISTIO_OUTBOUND_INVALID: IN= OUT=eth0 SRC=172.29.73.7 DST=172.21.206.198 LEN=40 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=44410 DPT=7777 WINDOW=0 RES=0x00 RST URGP=0
[Sat Feb 4 16:46:41 2023] ISTIO_OUTBOUND_INVALID: IN= OUT=eth0 SRC=172.29.73.7 DST=172.21.206.198 LEN=40 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=44410 DPT=7777 WINDOW=0 RES=0x00 RST URGP=0
[Sat Feb 4 16:46:43 2023] ISTIO_OUTBOUND_INVALID: IN= OUT=eth0 SRC=172.29.73.7 DST=172.21.206.198 LEN=40 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=44410 DPT=7777 WINDOW=0 RES=0x00 RST URGP=0
[Sat Feb 4 16:46:47 2023] ISTIO_OUTBOUND_INVALID: IN= OUT=eth0 SRC=172.29.73.7 DST=172.21.206.198 LEN=40 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=44410 DPT=7777 WINDOW=0 RES=0x00 RST URGP=0
[Sat Feb 4 16:46:55 2023] ISTIO_OUTBOUND_INVALID: IN= OUT=eth0 SRC=172.29.73.7 DST=172.21.206.198 LEN=40 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=44410 DPT=7777 WINDOW=0 RES=0x00 RST URGP=0
[Sat Feb 4 16:47:11 2023] ISTIO_OUTBOUND_INVALID: IN= OUT=eth0 SRC=172.29.73.7 DST=172.21.206.198 LEN=40 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=44410 DPT=7777 WINDOW=0 RES=0x00 RST URGP=0
[Sat Feb 4 16:47:43 2023] ISTIO_OUTBOUND_INVALID: IN= OUT=eth0 SRC=172.29.73.7 DST=172.21.206.198 LEN=40 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=44410 DPT=7777 WINDOW=0 RES=0x00 RST URGP=0
fortio-server-l2-0:/ # sudo tcpdump -i lo -n -v "port 44410"
tcpdump: listening on lo, link-type EN10MB (Ethernet), capture size 262144 bytes
16:47:16.343525 IP (tos 0x0, ttl 64, id 20177, offset 0, flags [DF], proto TCP (6), length 60)
172.29.73.7.44410 > 127.0.0.1.15001: Flags [S], cksum 0x43fd (incorrect -> 0x3d28), seq 66001914, win 62720, options [mss 8960,sackOK,TS val 1250779694 ecr 0,nop,wscale 11], length 0
# TcpExtTCPChallengeACK: (from sidecar)
16:47:16.343547 IP (tos 0x0, ttl 64, id 40356, offset 0, flags [DF], proto TCP (6), length 52)
172.21.206.198.7777 > 172.29.73.7.44410: Flags [.], cksum 0x7c2c (incorrect -> 0x0bf4), ack 1382665639, win 32, options [nop,nop,TS val 2150755275 ecr 1250324922], length 0
16:47:17.356027 IP (tos 0x0, ttl 64, id 20178, offset 0, flags [DF], proto TCP (6), length 60)
172.29.73.7.44410 > 127.0.0.1.15001: Flags [S], cksum 0x43fd (incorrect -> 0x3934), seq 66001914, win 62720, options [mss 8960,sackOK,TS val 1250780706 ecr 0,nop,wscale 11], length 0
16:47:17.356051 IP (tos 0x0, ttl 64, id 40357, offset 0, flags [DF], proto TCP (6), length 52)
172.21.206.198.7777 > 172.29.73.7.44410: Flags [.], cksum 0x7c2c (incorrect -> 0x0800), ack 1, win 32, options [nop,nop,TS val 2150756287 ecr 1250324922], length 0
16:47:19.372023 IP (tos 0x0, ttl 64, id 20179, offset 0, flags [DF], proto TCP (6), length 60)
172.29.73.7.44410 > 127.0.0.1.15001: Flags [S], cksum 0x43fd (incorrect -> 0x3154), seq 66001914, win 62720, options [mss 8960,sackOK,TS val 1250782722 ecr 0,nop,wscale 11], length 0
16:47:19.372063 IP (tos 0x0, ttl 64, id 40358, offset 0, flags [DF], proto TCP (6), length 52)
172.21.206.198.7777 > 172.29.73.7.44410: Flags [.], cksum 0x7c2c (incorrect -> 0x0020), ack 1, win 32, options [nop,nop,TS val 2150758303 ecr 1250324922], length 0
16:47:23.440028 IP (tos 0x0, ttl 64, id 20180, offset 0, flags [DF], proto TCP (6), length 60)
172.29.73.7.44410 > 127.0.0.1.15001: Flags [S], cksum 0x43fd (incorrect -> 0x2170), seq 66001914, win 62720, options [mss 8960,sackOK,TS val 1250786790 ecr 0,nop,wscale 11], length 0
16:47:23.440068 IP (tos 0x0, ttl 64, id 40359, offset 0, flags [DF], proto TCP (6), length 52)
172.21.206.198.7777 > 172.29.73.7.44410: Flags [.], cksum 0x7c2c (incorrect -> 0xf03a), ack 1, win 32, options [nop,nop,TS val 2150762372 ecr 1250324922], length 0
16:47:31.628034 IP (tos 0x0, ttl 64, id 20181, offset 0, flags [DF], proto TCP (6), length 60)
172.29.73.7.44410 > 127.0.0.1.15001: Flags [S], cksum 0x43fd (incorrect -> 0x0174), seq 66001914, win 62720, options [mss 8960,sackOK,TS val 1250794978 ecr 0,nop,wscale 11], length 0
16:47:31.628066 IP (tos 0x0, ttl 64, id 40360, offset 0, flags [DF], proto TCP (6), length 52)
172.21.206.198.7777 > 172.29.73.7.44410: Flags [.], cksum 0x7c2c (incorrect -> 0xd03e), ack 1, win 32, options [nop,nop,TS val 2150770560 ecr 1250324922], length 0
16:47:47.756038 IP (tos 0x0, ttl 64, id 20182, offset 0, flags [DF], proto TCP (6), length 60)
172.29.73.7.44410 > 127.0.0.1.15001: Flags [S], cksum 0x43fd (incorrect -> 0xc273), seq 66001914, win 62720, options [mss 8960,sackOK,TS val 1250811106 ecr 0,nop,wscale 11], length 0
16:47:47.756096 IP (tos 0x0, ttl 64, id 40361, offset 0, flags [DF], proto TCP (6), length 52)
172.21.206.198.7777 > 172.29.73.7.44410: Flags [.], cksum 0x7c2c (incorrect -> 0x913e), ack 1, win 32, options [nop,nop,TS val 2150786688 ecr 1250324922], length 0
16:48:20.016036 IP (tos 0x0, ttl 64, id 20183, offset 0, flags [DF], proto TCP (6), length 60)
172.29.73.7.44410 > 127.0.0.1.15001: Flags [S], cksum 0x43fd (incorrect -> 0x446f), seq 66001914, win 62720, options [mss 8960,sackOK,TS val 1250843366 ecr 0,nop,wscale 11], length 0
16:48:20.016066 IP (tos 0x0, ttl 64, id 40362, offset 0, flags [DF], proto TCP (6), length 52)
172.21.206.198.7777 > 172.29.73.7.44410: Flags [.], cksum 0x7c2c (incorrect -> 0x133a), ack 1, win 32, options [nop,nop,TS val 2150818948 ecr 1250324922], length 0
fortio-server-l2-0:/ # sudo tcpdump -i eth0 -n -v "port 44410"
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
# RST from 172.29.73.7.44410 (Conntrack Invalid)
16:47:16.343564 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
172.29.73.7.44410 > 172.21.206.198.7777: Flags [R], cksum 0x47ee (correct), seq 1382665639, win 0, length 0
16:47:17.356068 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
172.29.73.7.44410 > 172.21.206.198.7777: Flags [R], cksum 0x47ee (correct), seq 1382665639, win 0, length 0
16:47:19.372085 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
172.29.73.7.44410 > 172.21.206.198.7777: Flags [R], cksum 0x47ee (correct), seq 1382665639, win 0, length 0
16:47:23.440094 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
172.29.73.7.44410 > 172.21.206.198.7777: Flags [R], cksum 0x47ee (correct), seq 1382665639, win 0, length 0
16:47:31.628082 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
172.29.73.7.44410 > 172.21.206.198.7777: Flags [R], cksum 0x47ee (correct), seq 1382665639, win 0, length 0
16:47:47.756113 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
172.29.73.7.44410 > 172.21.206.198.7777: Flags [R], cksum 0x47ee (correct), seq 1382665639, win 0, length 0
16:48:20.016082 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
172.29.73.7.44410 > 172.21.206.198.7777: Flags [R], cksum 0x47ee (correct), seq 1382665639, win 0, length 0
Case 2: connect by collision ephemeral port but seq-no happens to be in the TCP window of the old connection#
When connect by collision ephemeral port but seq-no happens to be in the TCP window of the old connection(FIN-WAIT-2 state). Connect successed.
fortio-server-l2-0:/home/eccd # sudo tcpdump -i lo -n -v "port 44410"
tcpdump: listening on lo, link-type EN10MB (Ethernet), capture size 262144 bytes
# SYN retrans
16:23:32.466286 IP (tos 0x0, ttl 64, id 6422, offset 0, flags [DF], proto TCP (6), length 60)
172.29.73.7.44410 > 127.0.0.1.15001: Flags [S], cksum 0x43fd (incorrect -> 0xef14), seq 3587723838, win 62720, options [mss 8960,sackOK,TS val 1249355817 ecr 0,nop,wscale 11], length 0
# TcpExtTCPChallengeACK: (from sidecar)
16:23:32.466309 IP (tos 0x0, ttl 64, id 19983, offset 0, flags [DF], proto TCP (6), length 52)
172.21.206.198.7777 > 172.29.73.7.44410: Flags [.], cksum 0x7c2c (incorrect -> 0x176f), ack 2082340714, win 32, options [nop,nop,TS val 2149331398 ecr 1248029014], length 0
# RST from 172.29.73.7.44410 (Conntrack Valid)
16:23:32.466320 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
172.29.73.7.44410 > 127.0.0.1.15001: Flags [R], cksum 0x0876 (correct), seq 2082340714, win 0, length 0
# SYN retrans
16:23:33.484037 IP (tos 0x0, ttl 64, id 6423, offset 0, flags [DF], proto TCP (6), length 60)
172.29.73.7.44410 > 127.0.0.1.15001: Flags [S], cksum 0x43fd (incorrect -> 0xeb1b), seq 3587723838, win 62720, options [mss 8960,sackOK,TS val 1249356834 ecr 0,nop,wscale 11], length 0
# syn/ack from 172.21.206.198.7777(sidecar)
16:23:33.484060 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
172.21.206.198.7777 > 172.29.73.7.44410: Flags [S.], cksum 0x7c34 (incorrect -> 0xa6d3), seq 3731074132, ack 3587723839, win 65483, options [mss 65495,sackOK,TS val 2149332416 ecr 1249356834,nop,wscale 11], length 0
# ack from 172.29.73.7.44410
16:23:33.484072 IP (tos 0x0, ttl 64, id 6424, offset 0, flags [DF], proto TCP (6), length 52)
172.29.73.7.44410 > 127.0.0.1.15001: Flags [.], cksum 0x43f5 (incorrect -> 0xeb72), ack 3731074133, win 31, options [nop,nop,TS val 1249356835 ecr 2149332416], length 0
# fin from 172.21.206.198.7777(sidecar, because upstream pod(172.21.206.198) not listen on 7777)
16:23:33.484498 IP (tos 0x0, ttl 64, id 49188, offset 0, flags [DF], proto TCP (6), length 52)
172.21.206.198.7777 > 172.29.73.7.44410: Flags [F.], cksum 0x7c2c (incorrect -> 0xcf71), seq 1, ack 1, win 32, options [nop,nop,TS val 2149332416 ecr 1249356835], length 0
16:23:33.488011 IP (tos 0x0, ttl 64, id 6425, offset 0, flags [DF], proto TCP (6), length 52)
172.29.73.7.44410 > 127.0.0.1.15001: Flags [.], cksum 0x43f5 (incorrect -> 0xeb6e), ack 2, win 31, options [nop,nop,TS val 1249356838 ecr 2149332416], length 0
Skills#
ssh $WORKER_NODE_OF_POD
export POD="fortio-server-0"
ENVOY_PIDS=$(pgrep envoy)
while IFS= read -r ENVOY_PID; do
HN=$(sudo nsenter -u -t $ENVOY_PID hostname)
if [[ "$HN" = "$POD" ]]; then # space between = is important
sudo nsenter -u -t $ENVOY_PID hostname
export POD_PID=$ENVOY_PID
fi
done <<< "$ENVOY_PIDS"
echo $POD_PID
export PID=$POD_PID
sudo nsenter -t $PID -n -u
ssh $WORKER_NODE_OF_POD
export POD="fortio-server-l2-0"
ENVOY_PIDS=$(pgrep envoy)
while IFS= read -r ENVOY_PID; do
HN=$(sudo nsenter -u -t $ENVOY_PID hostname)
if [[ "$HN" = "$POD" ]]; then # space between = is important
sudo nsenter -u -t $ENVOY_PID hostname
export POD_PID=$ENVOY_PID
fi
done <<< "$ENVOY_PIDS"
echo $POD_PID
export PID=$POD_PID
sudo nsenter -t $PID -n -u