Notes on consoling Sun machines with our products.
The Problem
When a serial device such as a terminal server, PC, or even a dumb
terminal is used to "console" a Sun machine, the Sun machine will
"halt" and stop running when the consoling device is power
cycled.
So far, we've seen this problem only on Sun workstations and servers.
Further, it happens not only with our products, but also Cisco Access
Servers, Specialix Jetstream servers, serial ports on the back of most
PC's, dumb terminals etc... In other words, it appears to be a problem with
what the Sun system considers to be an RS-232 "break condition",
as opposed to one actually being sent intentionally by the consoling
device.
Once the system is halted, you can restore service by typing
"c" for continue, but the downtime is unacceptable, and you also
risk data loss or other problems as a result of halting UNIX so abruptly.
Background
A BREAK is defined as a "space" condition on an RS-232/V.24
line for some number of milliseconds (typically about 125ms to 500ms, but a
POSIX compliant break condition is specified as 250ms). The normal
condition on the line is a "mark".
A space = logical 0 = positive voltage between +3 and +12V on
RS-232/V.24
A mark = logical 1 = negative voltage between -3 and -12V on RS-232/V.24
A normal async character starts with:
- a start bit (space)
- 7 or 8 data bits
(marks or spaces)
- an optional parity
bit
- 1, 1.5 or 2 stop
bits (mark)
The BREAK sequence starts of with a start bit (space), followed by a
number of spaces (all zeros) for an amount of time greater then the transmission
of a "normal" asynchronous character; therefore, the receiving
side knows/detects it as a BREAK condition requiring attention.
When RS-232 line drivers lose and/or regain power, the RS-232/V.24
signal can "float" and cause false signaling that the Sun system
interprets as a BREAK signal. This is why many vendors advise that BREAK be
disabled on console ports by default if the system is to be consoled
remotely.
Most (multi)serial port devices from vendors such as ourselves typically
send a small "glitch" of energy at one or more of these events:
- The access server is
powered on.
- The access server is
powered off.
- The serial
controller hardware of the access server is reset.
This "glitch" looks like a "BREAK" signal to the Sun
system. By default, Sun Microsystems computers will halt execution of the
operating system and drop in to ROM monitor upon receipt of BREAK. This
behavior is intended to facilitate debugging system hangs and serious
performance issues. Other UNIX systems may have this behavior as well, but
so far, we've only seen it in Suns.
Digi Product Solutions/Workarounds:
Because the spurious BREAK signal is an artifact of physical layer
issues, a solution is required that prevents the BREAK signal from getting
from to the Sun, or that causes the Sun to not interpret the BREAK
signal as a halt command.
- Digi One RealPort,
PortServer TS2, PortServer TS4, PortServer TS16 and the PortServer CM products
are Sun Break Safe.
- PortServer II has
been re-engineered (as of 1/1/2001) to prevent this problem from
occurring. Units built after this date are Sun break safe.
- Power Control Modules for
Etherlite Products: ASP Technologies (800) 516-0841 sells a
PWR-001 which can be used to prevent the problem from occurring on
Etherlite 2, 8, and 16 port units.
- A
simple workaround on Solaris 2.6 and higher is to edit the
/etc/default/kbd file to disable halt on break. This file is self documented.