HTTP Connection Lifecycle Management#
Upstream/Downstream connection uncoupling#
The HTTP/1.1 specification has this design: HTTP Proxy is a L7 proxy and should be separate from the L3/L4 connection lifecycle. Therefore, headers like
Connection: Close
andConnection: Keepalive
from Downstream will not be forwarded to Upstream by the Envoy. The lifecycle of the Downstream connection is of course controlled by theConnection: xyz
directive. However, the connection lifecycle of the Upstream connection is not affected by the connection lifecycle of the Downstream connection. That is, there are two separate connection lifecycles.
Github Issue: HTTP filter before and after evaluation of Connection: Close header sent by upstream#15788 说明了这个问题: This doesn’t make sense in the context of Envoy, where downstream and upstream are decoupled and can use different protocols. I’m still not completely understanding the actual problem you are trying to solve?
Connection timeout related configuration parameters#
idle_timeout#
(Duration) The idle timeout for connections. The idle timeout is defined as the period in which there are no active requests. When the idle timeout is reached the connection will be closed. If the connection is an HTTP/2 downstream connection a drain sequence will occur prior to closing the connection, see drain_timeout. Note that request based timeouts mean that HTTP/2 PINGs will not keep the connection alive. If not specified, this defaults to 1 hour. To disable idle timeouts explicitly set this to 0.
Warning
Disabling this timeout has a highly likelihood of yielding connection leaks due to lost TCP FIN packets, etc.
If the overload action “envoy.overload_actions.reduce_timeouts” is configured, this timeout is scaled for downstream connections according to the value for HTTP_DOWNSTREAM_CONNECTION_IDLE.
max_connection_duration#
(Duration) The maximum duration of a connection. The duration is defined as a period since a connection was established. If not set, there is no max duration. When max_connection_duration
is reached and if there are no active streams, the connection will be closed. If the connection is a downstream connection and there are any active streams, the drain sequence
will kick-in, and the connection will be force-closed after the drain period. See drain_timeout.
Github Issue: Forward Connection:Close header to downstream#14910 For HTTP/1, Envoy will send a
Connection: close
header aftermax_connection_duration
if another request comes in. If not, after some period of time, it will just close the connection.https://github.com/envoyproxy/envoy/issues/14910#issuecomment-773434342
Note that
max_requests_per_connection
isn’t (yet) implemented/supported for downstream connections.For HTTP/1, Envoy will send a
Connection: close
header aftermax_connection_duration
(and beforedrain_timeout
) if another request comes in. If not, after some period of time, it will just close the connection.I don’t know what your downstream LB is going to do, but note that according to the spec, the
Connection
header is hop-by-hop for HTTP proxies.
max_requests_per_connection#
(UInt32Value) Optional maximum requests for both upstream and downstream connections. If not specified, there is no limit. Setting this parameter to 1 will effectively disable keep alive. For HTTP/2 and HTTP/3, due to concurrent stream processing, the limit is approximate.
Github Issue: Forward Connection:Close header to downstream#14910
We are having this same issue when using istio (istio/istio#32516). We are migrating to use istio with envoy sidecars frontend be an AWS ELB. We see that connections from ELB -> envoy stay open even when our application is sending
Connection: Close
.max_connection_duration
works but does not seem to be the best option. Our applications are smart enough to know when they are overloaded from a client and sendConnection: Close
to shard load.I tried writing an envoy filter to get around this but the filter gets applied before the stripping. Did anyone discover a way to forward the connection close header?
drain_timeout - for downstream only#
(Duration) The time that Envoy will wait between sending an HTTP/2 “shutdown notification” (GOAWAY frame with max stream ID) and a final GOAWAY frame. This is used so that Envoy provides a grace period for new streams that race with the final GOAWAY frame. During this grace period, Envoy will continue to accept new streams.
After the grace period, a final GOAWAY frame is sent and Envoy will start refusing new streams. Draining occurs both when:
a connection hits the
idle timeout
i.e., a connection that hits the
idle_timeout
ormax_connection_duration
starts thedraining
state and thedrain_timeout
timer. For HTTP/1.1, in thedraining
state. If a downstream request comes in, Envoy adds aConnection: close
header to the response.So the
draining
state and thedrain_timeout
timer will only be entered if the connection has anidle_timeout
ormax_connection_duration
.
or during general server draining.
The default grace period is 5000 milliseconds (5 seconds) if this option is not specified.
https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/operations/draining
By default, the
HTTP connection manager filter
will add “Connection: close
” to HTTP1 requests(Author’s Notes: By HTTP Response), send HTTP2 GOAWAY, and terminate connections on request completion (after the delayed close period).
I used to think that drain was only triggered when the Envoy was going to shutdown. Now it seems that whenever there is a planned shutdown of a connection (after the connection reaches idle_timeout
or max_connection_duration
), the drain should be triggered.
delayed_close_timeout - for downstream only#
(Duration) The delayed close timeout is for downstream connections managed by the HTTP connection manager. It is defined as a grace period after connection close processing has been locally initiated during which Envoy will wait for the peer to close (i.e., a TCP FIN/RST is received by Envoy from the downstream connection) prior to Envoy closing the socket associated with that connection。
That is, in some scenarios, Envoy will write back an HTTP Response before it has finished reading the HTTP Request and wants to close the connection. This is called Server Prematurely/Early Closes Connection
. There are several possible scenarios:
The downstream is still sending the HTTP Request (socket write).
Or there is a
socket recv buffer
in the Envoy kernel that has not been accessed by the Envoy user-space. Typically, the HTTP Content-Length sized body is still in the kernel’ssocket recv buffer
and has not been fully loaded into the Envoy user-space.
In both cases, if the Envoy calls close(fd)
to close the connection, the downstream may receive an RST
from the Envoy kernel. Eventually the downstream may not read the HTTP Response in the socket and just assume that the connection is abnormal and report an exception to the upper layers: Peer connection rest
.
See: Racing conditions after an Envoy connection is closed for details.
To mitigate this, Envoy provides a configuration that delays connection closure. Which wants to wait for the downstream to complete the socket write process. Let the kernel socket recv buffer
be loaded into user space
. Then call close(fd)
.
NOTE: This timeout is enforced even when the socket associated with the downstream connection is pending a flush of the write buffer. However, any progress made writing data to the socket will restart the timer associated with this timeout. This means that the total grace period for a socket in this state will be
<total_time_waiting_for_write_buffer_flushes>+<delayed_close_timeout>
.
That is, every time the write socket succeeds, the timer will be rested.
Delaying Envoy’s connection close and giving the peer the opportunity to initiate the close sequence mitigates a race condition that exists when downstream clients do not drain/process data in a connection’s receive buffer after a remote close has been detected via a socket write().
That is, it can alleviate the situation where downsteam does not read the socket to get a Response after the write socket fails.
This race leads to such clients failing to process the response code sent by Envoy, which could result in erroneous downstream processing. If the timeout triggers, Envoy will close the connection’s socket. The default timeout is 1000 ms if this option is not specified.
Note:
To be useful in avoiding the race condition described above, this timeout must be set to at least
+<100ms to account for a reasonable “worst” case processing time for a full iteration of Envoy’s event loop>.
Warning:
A value of 0 will completely disable delayed close processing. When disabled, the downstream connection’s socket will be closed immediately after the write flush is completed or will never close if the write flush does not complete.
Note that delayed_close_timeout
will not take effect in many cases in order not to impact performance:
Github PR: http: reduce delay-close issues for HTTP/1.1 and below #19863
Skipping delay close for:
HTTP/1.0 framed by connection close (as it simply reduces time to end-framing)
as well as HTTP/1.1 if the request is fully read (so there’s no FIN-RST race)。即系如果
Addresses the Envoy-specific parts of #19821 Runtime guard:
envoy.reloadable_features.skip_delay_close
Also appears in Release Note for Envoy 1.22.0:
http: avoiding
delay-close
for:
HTTP/1.0 responses framed by
connection: close
as well as HTTP/1.1 if the request is fully read.
This means for responses to such requests, the FIN will be sent immediately after the response. This behavior can be temporarily reverted by setting
envoy.reloadable_features.skip_delay_close
to false. If clients are seen to be receiving sporadic partial responses and flipping this flag fixes it, please notify the project immediately.