-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
don't use canceled context to send KILL signal to healthcheck process#43739
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
do we want a test for this? Or too complicated to set one up? |
I don't think the failure was related, so I kicked CI again, but posting here in case there's a flaky test at hand that we're not tracking yet;
edit: already known as flaky; #23626 |
f529426
to
557b602
Compare
557b602
to
b4c051c
Compare
0100d64
to
25a6430
Compare
25a6430
to
e975155
Compare
Something failing;
|
f937722
to
250d005
Compare
5d3979e
to
291dcc4
Compare
This is weird; it's as if Windows is running an older commit before the quote was fixed;
At least, I don't see anything wrong looking at this 🤔; Test: []string{"CMD-SHELL", `sh -c 'sleep 60'`}, |
I guess |
7a34314
to
a50a635
Compare
failure on Windows
|
a50a635
to
cfb06cd
Compare
integration/container/health_test.go
Outdated
err := apiClient.ContainerKill(ctx, cID, "SIGKILL") | ||
assert.NilError(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try if this helps with the cleanup step on Windows 🤞
Unfortunately; that didn't help, only shifting / repeating the problem;
So it looks like on Windows, the health check causes an issue with stopping the container (?). Of course the question is; was that already the case (but not tested before?), or is there an issue in the implementation here? (Perhaps an issue with signal handling / incorrect signals for Windows, idk) |
cfb06cd
to
5976cb3
Compare
integration/container/health_test.go
Outdated
|
||
cID := container.Run(ctx, t, apiClient, func(c *container.TestContainerConfig) { | ||
c.Config.Healthcheck = &containertypes.HealthConfig{ | ||
Test: []string{"CMD", "sleep", "60"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other attempt; trying CMD ["sleep", "60"]
instead of starting sleep
in a shell.
Getting more interesting now; running the command without shell looks to have fixed some instances of Windows. Possibly because Windows may be running a different version of BusyBox (ISTR older versions of BusyBox didn't propagate signals, and newer versions do). However, on Windows 2019 and Windows 2022 running with containerd,
|
5976cb3
to
e419012
Compare
integration/container/health_test.go
Outdated
|
||
cID := container.Run(ctx, t, apiClient, func(c *container.TestContainerConfig) { | ||
c.Config.Healthcheck = &containertypes.HealthConfig{ | ||
Test: []string{"CMD-SHELL", `sh -c "sleep 60"`}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grmbl. back to the original issue on Windows on this one;
health_test.go:111: polling check failed: expected "Health check exceeded timeout (50ms)", got "60\": line 0: syntax error: unterminated quoted string\n"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relevant CI logs: builtin, containerd-shim-runhcs-v1
In both cases, the CommandLine
string ultimately passed into the Win32API CreateProcess
is
cmd /S /C "sh -c \"sleep 60\""
After lots of research and experimentation on a Windows box... my head hurts. Here's what I know so far:
- A Windows process' command and arguments are passed in as a single string, which is the string specified in the
CommandLine
passed intoCreateProcess
. - Each process parses its own command line. Not all processes use the same algorithm.
- As a corollary,
cmd.exe
does not tokenize its subprocess' command lines for them.
- As a corollary,
cmd.exe /S /C
follows some arcane rules, but the gist is that everything following the/C
is taken literally as a command line, with one exception: if the first non-whitespace character following the/C
is a double-quote character, the command line is modified by dropping that character and the last double-quote character in the remaining command-line.
Putting that all together:cmd /S /C "sh -c \"sleep 60\""
is equivalent to interactively running
C:\> sh -c \"sleep 60\"
Executing that command results in CreateProcess()
being called with CommandLine
set to sh -c \"sleep 60\"
.
The Windows C Runtime parses the CommandLine
sh -c \"sleep 60\"
into the argv
argv[0] = sh
argv[1] = -c
argv[2] = \"sleep
argv[3] = 60\"
and busybox ash gets very confused.
The ultimate issue is that cmdProbe
is incorrectly escaping the shell command line in a manner which the Windows C runtime would parse as a single argument, expecting cmd.exe /S /C
to act like sh -c
and consume the next argv as its argument. But cmd.exe
does not unescape the argument.
containerd/containerd#5067 is very related to the above, but I don't think it has to do with the timeouts you saw. (My guess is that the timeouts have something to do with sleep.exe
not existing.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recall I tested various combinations; any of them would fail on at least one combination 😞
I tried running sleep
as main process itself;
[]string{"CMD", "sleep", "60"},
Which would be the equivalent of the default command we set for "sleep" containers;
moby/integration-cli/test_vars_test.go
Lines 6 to 9 in 7b9275c
func sleepCommandForDaemonPlatform() []string { | |
if testEnv.OSType == "windows" { | |
return []string{"sleep", "240"} | |
} |
Tried running it as child process of sh
;
[]string{"CMD", "sh", "-c", "sleep 60"},
Tried passing it as single string, but I think Linux wasn't a fan of that
[]string{"CMD-SHELL", `sh -c "sleep 60`},
And the original one (which (like above), and if I'm not mistaken, would run sh
as child process of cmd
, so that didn't feel right);
[]string{"CMD-SHELL", "sh", "-c", "sleep 60"},
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I wonder is if the Busybox image should also be updated to set the SHELL
to /bin/sh
; currently it doesn't, which means that (in Dockerfiles), the default shell would still be cmd
or powershell
(depending on the base image)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also wondering if exec.Config
needs to have a CommandLine
property added; it just occurred to me that we have that for the container's main process, but don't have it for exec
🤔
moby/vendor/github.com/opencontainers/runtime-spec/specs-go/config.go
Lines 41 to 43 in 2646bea
Args []string `json:"args,omitempty"` | |
// CommandLine specifies the full command line for the application to execute on Windows. | |
CommandLine string `json:"commandLine,omitempty" platform:"windows"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I wonder is if the Busybox image should also be updated to set the
SHELL
to/bin/sh
Also wondering if
exec.Config
needs to have aCommandLine
property added
Both excellent ideas I wholeheartedly support!
64c7eab
to
eb75fb8
Compare
windows 2022 / containerd;
|
Terminating the exec process when the context is canceled has been broken since Docker v17.11 so nobody has been able to depend upon that behaviour in five years of releases. We are thus free from backwards- compatibility constraints. Co-authored-by: Nicolas De Loof <nicolas.deloof@gmail.com> Co-authored-by: Sebastiaan van Stijn <github@gone.nl> Signed-off-by: Nicolas De Loof <nicolas.deloof@gmail.com> Signed-off-by: Cory Snider <csnider@mirantis.com> Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
eb75fb8
to
4b84a33
Compare
@@ -93,6 +94,42 @@ while true; do sleep 1; done | |||
poll.WaitOn(t, pollForHealthStatus(ctxPoll, client, id, "healthy"), poll.WithDelay(100*time.Millisecond)) | |||
} | |||
|
|||
// TestHealthCheckProcessKilled verifies that health-checks exec get killed on time-out. | |||
func TestHealthCheckProcessKilled(t *testing.T) { | |||
skip.If(t, testEnv.RuntimeIsWindowsContainerd(), "FIXME: Broken on Windows + containerd combination") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a skip
for now.
Some unrelated failures; not sure I've seen this one before, but doesn't look related at least (kicked CI once more)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's greeeen!
LGTM
Alright, let's get this one in |
Looks like this was addressed, so dismissing this one.
- What I did
send KILL signal to healthcheck process when the context is canceled (timeout)
- How I did it
Use a background context to send KILL signal
- How to verify it
see reproduction scenario on #43737
- Description for the changelog
fix healthcheck process termination on timeout
- A picture of a cute animal (not mandatory but encouraged)