HTTP 连接生命周期管理#

Upstream/Downstream 连接解藕#

HTTP/1.1 规范有这个设计: HTTP Proxy 是 L7 层的代理,应该和 L3/L4 层的连接生命周期分开。

所以,像从 Downstream 来的 Connection: CloseConnection: Keepalive 这种 Header, Envoy 不会 Forward 到 Upstream 。 Downstream 连接的生命周期,当然会遵从 Connection: xyz 的指示控制。但 Upstream 的连接生命周期不会被 Downstream 的连接生命周期影响。 即,这是两个独立的连接生命周期管理。

Github Issue: HTTP filter before and after evaluation of Connection: Close header sent by upstream#15788 说明了这个问题: This doesn’t make sense in the context of Envoy, where downstream and upstream are decoupled and can use different protocols. I’m still not completely understanding the actual problem you are trying to solve?

连接超时相关配置参数#

图:Envoy 连接 timeout 时序线

图:Envoy 连接 timeout 时序线#

用 Draw.io 打开

idle_timeout#

(Duration) The idle timeout for connections. The idle timeout is defined as the period in which there are no active requests. When the idle timeout is reached the connection will be closed. If the connection is an HTTP/2 downstream connection a drain sequence will occur prior to closing the connection, see drain_timeout. Note that request based timeouts mean that HTTP/2 PINGs will not keep the connection alive. If not specified, this defaults to 1 hour. To disable idle timeouts explicitly set this to 0.

Warning

Disabling this timeout has a highly likelihood of yielding connection leaks due to lost TCP FIN packets, etc.

If the overload action “envoy.overload_actions.reduce_timeouts” is configured, this timeout is scaled for downstream connections according to the value for HTTP_DOWNSTREAM_CONNECTION_IDLE.

max_connection_duration#

(Duration) The maximum duration of a connection. The duration is defined as a period since a connection was established. If not set, there is no max duration. When max_connection_duration is reached and if there are no active streams, the connection will be closed. If the connection is a downstream connection and there are any active streams, the drain sequence will kick-in, and the connection will be force-closed after the drain period. See drain_timeout.

Github Issue: Forward Connection:Close header to downstream#14910 For HTTP/1, Envoy will send a Connection: close header after max_connection_duration if another request comes in. If not, after some period of time, it will just close the connection.

https://github.com/envoyproxy/envoy/issues/14910#issuecomment-773434342

Note that max_requests_per_connection isn’t (yet) implemented/supported for downstream connections.

For HTTP/1, Envoy will send a Connection: close header after max_connection_duration (且在 drain_timeout 前) if another request comes in. If not, after some period of time, it will just close the connection.

I don’t know what your downstream LB is going to do, but note that according to the spec, the Connection header is hop-by-hop for HTTP proxies.

max_requests_per_connection#

(UInt32Value) Optional maximum requests for both upstream and downstream connections. If not specified, there is no limit. Setting this parameter to 1 will effectively disable keep alive. For HTTP/2 and HTTP/3, due to concurrent stream processing, the limit is approximate.

Github Issue: Forward Connection:Close header to downstream#14910

We are having this same issue when using istio (istio/istio#32516). We are migrating to use istio with envoy sidecars frontend be an AWS ELB. We see that connections from ELB -> envoy stay open even when our application is sending Connection: Close. max_connection_duration works but does not seem to be the best option. Our applications are smart enough to know when they are overloaded from a client and send Connection: Close to shard load.

I tried writing an envoy filter to get around this but the filter gets applied before the stripping. Did anyone discover a way to forward the connection close header?

drain_timeout - for downstream only#

(Duration) The time that Envoy will wait between sending an HTTP/2 “shutdown notification” (GOAWAY frame with max stream ID) and a final GOAWAY frame. This is used so that Envoy provides a grace period for new streams that race with the final GOAWAY frame. During this grace period, Envoy will continue to accept new streams.

After the grace period, a final GOAWAY frame is sent and Envoy will start refusing new streams. Draining occurs both when:

  • a connection hits the idle timeout

    • 即系连接到达 idle_timeoutmax_connection_duration后,都会开始 draining 的状态和drain_timeout计时器。对于 HTTP/1.1,在 draining 状态下。如果 downstream 过来请求,Envoy 都在响应中加入 Connection: close header。

    • 所以只有连接发生 idle_timeoutmax_connection_duration后,才会进入 draining 的状态和drain_timeout计时器。

  • or during general server draining.

The default grace period is 5000 milliseconds (5 seconds) if this option is not specified.

https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/operations/draining

By default, the HTTP connection manager filter will add “Connection: close” to HTTP1 requests(笔者注:By HTTP Response), send HTTP2 GOAWAY, and terminate connections on request completion (after the delayed close period).

我曾经认为, drain 只在 Envoy 要 shutdown 时才触发。现在看来,只要是有计划的关闭连接(连接到达 idle_timeoutmax_connection_duration后),都应该走 drain 流程。

delayed_close_timeout - for downstream only#

(Duration) The delayed close timeout is for downstream connections managed by the HTTP connection manager. It is defined as a grace period after connection close processing has been locally initiated during which Envoy will wait for the peer to close (i.e., a TCP FIN/RST is received by Envoy from the downstream connection) prior to Envoy closing the socket associated with that connection。

即系在一些场景下,Envoy 会在未完全读取完 HTTP Request 前,就回写 HTTP Response 且希望关闭连接。这叫 服务端过早关闭连接(Server Prematurely/Early Closes Connection)。这时有几种可能情况:

  • downstream 还在发送 HTTP Reqest 当中(socket write)。

  • 或者是 Envoy 的 kernel 中,还有 socket recv buffer 未被 Envoy user-space 进取。通常是 HTTP Conent-Lentgh 大小的 BODY 还在内核的 socket recv buffer 中,未完整加载到 Envoy user-space

这两种情况下, 如果 Envoy 调用 close(fd) 去关闭连接, downstream 均可能会收到来自 Envoy kernel 的 RST 。最终 downstream 可能不会 read socket 中的 HTTP Response 就直接认为连接异常,向上层报告异常:Peer connection rest

详见:Envoy 连接关闭后的竞态条件

为缓解这种情况,Envoy 提供了延后关闭连接的配置。希望等待 downstream 完成 socket write 的过程。让 kernel socket recv buffer 数据都加载到 user space 中。再去调用 close(fd)

NOTE: This timeout is enforced even when the socket associated with the downstream connection is pending a flush of the write buffer. However, any progress made writing data to the socket will restart the timer associated with this timeout. This means that the total grace period for a socket in this state will be <total_time_waiting_for_write_buffer_flushes>+<delayed_close_timeout>.

即系,每次 write socket 成功,这个 timer 均会被 rest.

Delaying Envoy’s connection close and giving the peer the opportunity to initiate the close sequence mitigates(缓解) a race condition that exists when downstream clients do not drain/process data in a connection’s receive buffer after a remote close has been detected via a socket write(). 即系,可以缓解 downsteam 在 write socket 失败后,就不去 read socket 取 Response 的情况。

This race leads to such clients failing to process the response code sent by Envoy, which could result in erroneous downstream processing.

If the timeout triggers, Envoy will close the connection’s socket.

The default timeout is 1000 ms if this option is not specified.

Note:

To be useful in avoiding the race condition described above, this timeout must be set to at least +<100ms to account for a reasonable “worst” case processing time for a full iteration of Envoy’s event loop>.

Warning:

A value of 0 will completely disable delayed close processing. When disabled, the downstream connection’s socket will be closed immediately after the write flush is completed or will never close if the write flush does not complete.

需要注意的是,为了不影响性能,delayed_close_timeout 在很多情况下是不会生效的:

Github PR: http: reduce delay-close issues for HTTP/1.1 and below #19863

Skipping delay close for:

  • HTTP/1.0 framed by connection close (as it simply reduces time to end-framing)

  • as well as HTTP/1.1 if the request is fully read (so there’s no FIN-RST race)。即系如果

Addresses the Envoy-specific parts of #19821 Runtime guard: envoy.reloadable_features.skip_delay_close

同时出现在 Envoy 1.22.0 的 Release Note 里:

http: avoiding delay-close for:

  • HTTP/1.0 responses framed by connection: close

  • as well as HTTP/1.1 if the request is fully read.

This means for responses to such requests, the FIN will be sent immediately after the response. This behavior can be temporarily reverted by setting envoy.reloadable_features.skip_delay_close to false. If clients are seen to be receiving sporadic partial responses and flipping this flag fixes it, please notify the project immediately.

Envoy 连接关闭后的竞态条件#