|
| SSH Frequently Asked Questions
Sometimes my SSH connection hangs when exiting the shell (or remote
command) exits, but the connection remains open, doing nothing.
Quick Fix
You're probably using the OpenSSH server, and started a background process
on the server which you intended to continue after logging out of the SSH
session. Fix: redirect the background process stdin/stdout/stderr streams
(e.g. to files, or /dev/null if you don't care about them). For example,
this hangs:
client% ssh server
server% xterm &
server% logout
hangs...
but this behaves as expected:
client% ssh server
server% xterm < /dev/null >& /dev/null &
server% logout
SSH session terminates
client%
Short Explanation
This problem is usually due to a feature of the OpenSSH server. When
writing an SSH server, you have to answer the question, "When should the
server close the SSH connection?" The obvious answer might seem to be:
close it when the server-side user program started by client request
(shell or remote command) exits. However, it's actually a bit more
complicated; this simple strategy allows a race condition which can cause
data loss (see the explanation below). To avoid this problem,
sshd instead waits until it encounters end-of-file (eof) on the
pipes connecting to the stdout and stderr of the user program.
This strategy, however, can have unexpected consequences. In Unix, an
open file does not return eof until all references to it have
been closed. When you start a background process from the shell on the
server, it inherits references to the shell's standard streams. Unless
you prevent this by redirecting these, or the process closes them itself
(daemons will generally do this), the existence of the new process will
cause sshd to wait indefinitely, since it will never see eof on
the pipe connecting it to the (now defunct) shell process because
that pipe also connects it to your background process.
This design choice has changed over time. Early versions of OpenSSH
behaved as described here. For some time, it was changed to exit
immediately upon exit of the user program; then, it was changed back when
the possibility of data loss was discovered.
Race Condition Details
As an example, let's take the simple case of:
ssh server cat foo.txt
This should result in the entire contents of the file foo.txt
coming back to the client but in fact, it may not. Consider the
following sequence of events:
- The SSH connection is set up; sshd starts the target
account's shell as shell -c "cat foo.txt" in a child
process, reading the shell's stdout and sending the data over the SSH
connection. sshd is waiting for the shell to exit.
- The shell, in turn, starts cat foo.txt in a child process,
and waits for it to exit. The file data from foo.txt which
cat write to its stdout, however, does not pass through the shell
process on its way to sshd. cat inherits its stdout
file descriptor (fd) from it parent process, the shell that fd is a
direct reference to the pipe connecting the shell's stdout to
sshd.
- cat writes the last chunk of data from foo.txt, and
exits; the data is passed to the kernel via the write system
call, and is waiting in the pipe buffer to be read by sshd. The
shell, which was waiting on the cat process, exits, and then
sshd in turn exits, closing the SSH connection. However, there
is a race condition here: through the vagaries of process scheduling, it
is possible that sshd will receive and act on the SIGCHLD
notifying it of the shell's exit, before it reads the last chunk
of data from the pipe. If so, then it misses that data.
This sequence of events can, for example, cause file truncation when using
scp.
|