listeners: clean up stale Unix socket files on Windows (#7676)
Some checks failed
Tests / test (./cmd/caddy/caddy, ~1.26.0, macos-14, 0, 1.26, mac) (push) Waiting to run
Tests / test (./cmd/caddy/caddy.exe, ~1.26.0, windows-latest, True, 1.26, windows) (push) Waiting to run
Lint / lint (macos-14, mac) (push) Waiting to run
Lint / lint (windows-latest, windows) (push) Waiting to run
Tests / test (s390x on IBM Z) (push) Has been skipped
Tests / goreleaser-check (push) Has been skipped
Cross-Build / build (~1.26.0, 1.26, darwin) (push) Successful in 1m35s
Cross-Build / build (~1.26.0, 1.26, dragonfly) (push) Failing after 1m52s
Cross-Build / build (~1.26.0, 1.26, freebsd) (push) Successful in 3m33s
Cross-Build / build (~1.26.0, 1.26, aix) (push) Successful in 3m49s
Tests / test (./cmd/caddy/caddy, ~1.26.0, ubuntu-latest, 0, 1.26, linux) (push) Failing after 4m0s
Cross-Build / build (~1.26.0, 1.26, linux) (push) Successful in 2m6s
Cross-Build / build (~1.26.0, 1.26, illumos) (push) Failing after 2m36s
Lint / lint (ubuntu-latest, linux) (push) Failing after 2m5s
Cross-Build / build (~1.26.0, 1.26, netbsd) (push) Successful in 2m41s
Cross-Build / build (~1.26.0, 1.26, openbsd) (push) Successful in 2m31s
Cross-Build / build (~1.26.0, 1.26, windows) (push) Successful in 2m26s
Cross-Build / build (~1.26.0, 1.26, solaris) (push) Successful in 2m31s
Lint / dependency-review (push) Failing after 1m2s
Lint / govulncheck (push) Failing after 2m35s
OpenSSF Scorecard supply-chain security / Scorecard analysis (push) Failing after 6m54s

* Delete old unix domain socket files on Windows

While Windows doesn't have the need to reuse a socket file descriptor
by dup()ing it on config reloads, there still is a valid need for an
equivalent to the `syscall.Unlink()` call in listen_unix.go (also in
`reuseUnixSocket`).

If a previous Caddy instance didn't terminate properly, the chances it
will leave behind a socket file are very high, breaking all subsequent
starting attempts. Other than for regular files, Windows seemingly has
no way for a process to flag a UNIX domain socket file with `FILE_DELETE_ON_CLOSE`,
which means this scenario can never be avoided entirely (e.g. in the case
of crashes).

For the long comment on `isAbstractUnixSocket`: the logic itself is likely
of dubious value, but I thought it better to explicitly reference the
issue, as I have just spent half an hour searching the web to figure out
whether abstract names will work or not on Windows. At least, the logic
as-is should now do the sensible thing if these are ever implemented
properly (and it matches what the Golang standard library does internally).

* Add a dial attempt to check for active server processes

As @steadytao pointed out (thanks!), the previous code didn't have
solid proof that an existing unix socket file had really been orphaned,
as it's also possible that there's another server process (still running).

This would still give the Windows implementation parity with the unix
one (as that one also unlinks the socket file without further checks),
but I've performed a couple of small tests and found this way of handling
socket files still problematic at least problematic if Caddy is used as
a reverse proxy in real world scenarios.

In tests with a simple Caddyfile that only declares an admin socket,
starting two caddy instances with the same Caddyfile works and behaves
like one would expect: the second instance removes the first instance's
socket file and "wins" the race.

When Caddy is used as a reverse proxy, though, what'll happen is more
complicated: While the second instance wins the race for the admin
socket, as long as the Caddyfile specifies a TCP downstream socket,
the second process will not be able to take this one over from the first
(also to be expected, that's how socket binding usually works).

This results in a rather broken state: The first process still holds on
to its TCP listening sockets, the second process fails to start because
of the error in its listening attempt, leaving an orphaned admin socket
file in the file system. Afterwards, the second process won't be running
and the first _will_ be running but unable to be controlled because its
admin socket has been replaced. This leaves the system in another state
that is bad from an ops perspective.

With this new change, we try first to connect to any unix socket that
isn't already covered by our current process (with a very low timeout)
and can easily decide if the socket is still in use by another process:

- If the connection is accepted, there's obviously a server process.

- If Windows returns WSACONNREFUSED [^1], there is either no active
  server process for the socket file anymore, or the socket file does
  not exist.

- Any other errors are likely a sign that there still is a server process
  (e.g. a timeout would indicate that it's just slow in accepting new
  connection attempts).

[^1]: https://learn.microsoft.com/en-us/windows/win32/winsock/windows-sockets-error-codes-2#wsaeconnrefused

* chore: tidy Windows unix socket reuse helper

---------

Co-authored-by: Zen Dodd <mail@steadytao.com>
This commit is contained in:
mfrischknecht 2026-04-29 13:52:04 +02:00 committed by GitHub
parent 4d6945769d
commit 6a64bb2ce5
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
3 changed files with 110 additions and 4 deletions

View File

@ -30,10 +30,6 @@ import (
"go.uber.org/zap"
)
func reuseUnixSocket(_, _ string) (any, error) {
return nil, nil
}
func listenReusable(ctx context.Context, lnKey string, network, address string, config net.ListenConfig) (any, error) {
var socketFile *os.File

21
listen_reuseUnixSocket.go Normal file
View File

@ -0,0 +1,21 @@
// Copyright 2015 Matthew Holt and The Caddy Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//go:build (!unix || solaris) && !windows
package caddy
func reuseUnixSocket(_, _ string) (any, error) {
return nil, nil
}

View File

@ -0,0 +1,89 @@
// Copyright 2015 Matthew Holt and The Caddy Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//go:build windows
package caddy
import (
"errors"
"fmt"
"io/fs"
"net"
"os"
"strings"
"syscall"
"time"
)
var errUnixSocketAlreadyInUse = errors.New("unix socket is already in use by another process")
func reuseUnixSocket(network, addr string) (any, error) {
if !IsUnixNetwork(network) {
return nil, nil
}
// Note: This is here mainly for proper compatibility, because Unix sockets with abstract names are in an interesting limbo state on Windows:
// Go already translates `@` characters to `\0` for Windows: https://github.com/golang/go/blob/65d5c5f6dd8aa7b221cff6ec3f5101ea2e5f3efa/src/syscall/syscall_windows.go#L910
// ...but there still is an open issue about the fact that this is not properly supported: https://github.com/microsoft/WSL/issues/4240#issuecomment-620805115
// The main issue is that the original announcement proclaimed support for this feature, but it was (apparently) never implemented: https://devblogs.microsoft.com/commandline/af_unix-comes-to-windows/
isAbstractUnixSocket := strings.HasPrefix(addr, "@")
if isAbstractUnixSocket {
// Abstract Unix sockets do not require us to remove stale socket files.
return nil, nil
}
// On Windows, we're using the `fakeCloseListener` wrappers around a single, ever-living listener.
// So, if there's an active listener entry in the pool, we're the current owner of the Unix socket file.
_, socketBelongsToCurrentProcess := listenerPool.References(listenerKey(network, addr))
if socketBelongsToCurrentProcess {
// Reuse/cleanup is entirely handled by the refcounting mechanism in `listenerPool`.
return nil, nil
}
// If the socket file does not exist or has no backing server process, this will fail instantly.
connection, err := net.DialTimeout("unix", addr, 10*time.Millisecond)
if err == nil {
connection.Close()
return nil, fmt.Errorf("cannot reuse socket %v: %w", addr, errUnixSocketAlreadyInUse)
}
// Windows returns this error code both if the socket file does not exist and if it isn't backed by a server process anymore.
// See: https://learn.microsoft.com/en-us/windows/win32/winsock/windows-sockets-error-codes-2#wsaeconnrefused
const WSAECONNREFUSED syscall.Errno = 10061
var errno syscall.Errno
hasNoListeningServerProcess := errors.As(err, &errno) && errno == WSAECONNREFUSED
if !hasNoListeningServerProcess {
return nil, fmt.Errorf("cannot reuse socket %v: %w", addr, errUnixSocketAlreadyInUse)
}
// If the socket file exists, it hasn't been created by our process, and it seemingly
// isn't backed by a server process anymore. Try to delete it so we can bind to it later.
err = os.Remove(addr)
if err == nil {
return nil, nil
} else if errors.Is(err, fs.ErrNotExist) {
// Either the file didn't exist in the first place, or it was deleted before we were able to.
return nil, nil
} else {
// We failed to delete the file. Likely, it belongs to another (active) process.
return nil, err
}
}