|
|
How CAN Handles
Errors
| Error handling is built into in the CAN protocol and is of great importance for the
performance of a CAN system. The error handling aims at detecting errors in messages
appearing on the CAN bus, so that the transmitter can retransmit an erroneous message.
Every CAN controller along a bus will try to detect errors within a message. If an error
is found, the discovering node will transmit an Error Flag, thus destroying the bus
traffic. The other nodes will detect the error caused by the Error Flag (if they haven't
already detected the original error) and take appropriate action, i.e. discard the current
message. Each node maintains two error counters: the Transmit Error Counter and the
Receive Error Counter. There are several rules governing how these counters are
incremented and/or decremented. In essence, a transmitter detecting a fault increments its
Transmit Error Counter faster than the listening nodes will increment their Receive Error
Counter. This is because there is a good chance that it is the transmitter who is at
fault! When any Error Counter raises over a certain value, the node will first become
"error passive", that is, it will not actively destroy the bus traffic when it
detects an error, and then "bus off", which means that the node doesn't
participate in the bus traffic at all.
Using the error counters, a CAN node can not only detect faults but also perform error
confinement.
|
Error Detection
Mechanisms
| The CAN protocol defines no less than five different ways of detecting errors. Two
of these works at the bit level, and the other three at the message level.
- Bit Monitoring.
- Bit Stuffing.
- Frame Check.
- Acknowledgement Check.
- Cyclic Redundancy Check.
Bit Monitoring
Each transmitter on the CAN bus monitors (i.e. reads back) the transmitted signal
level. If the bit level actually read differs from the one transmitted, a Bit Error
is signaled. (No bit error is raised during the arbitration process.)
Bit Stuffing
When five consecutive bits of the same level have been transmitted by a node, it will
add a sixth bit of the opposite level to the outgoing bit stream. The receivers will
remove this extra bit. This is done to avoid excessive DC components on the bus, but it
also gives the receivers an extra opportunity to detect errors: if more than five
consecutive bits of the same level occurs on the bus, a Stuff Error is
signaled.
Frame check
Some parts of the CAN message have a fixed format, i.e. the standard defines exactly
what levels must occur and when. (Those parts are the CRC Delimiter, ACK Delimiter, End of
Frame, and also the Intermission, but there are some extra special error checking rules
for that.) If a CAN controller detects an invalid value in one of these fixed fields, a Form
Error is signaled.
Acknowledgement Check
All nodes on the bus that correctly receives a message (regardless of their being
"interested" of its contents or not) are expected to send a dominant level in
the so-called Acknowledgement Slot in the message. The transmitter will transmit a
recessive level here. If the transmitter can't detect a dominant level in the ACK slot, an
Acknowledgement Error is signaled.
Cyclic Redundancy Check
Each message features a 15-bit Cyclic Redundancy Checksum (CRC), and any node that
detects a different CRC in the message than what it has calculated itself will signal an CRC
Error.
|
Error Confinement
Mechanisms
| Every CAN controller along a bus will try to detect the errors outlined above within
each message. If an error is found, the discovering node will transmit an
Error Flag, thus destroying the bus traffic. The
other nodes will detect the error caused by the Error Flag (if they haven't already
detected the original error) and take appropriate action, i.e. discard the current
message. Each node maintains two error counters: the Transmit Error Counter and the
Receive Error Counter. There are several rules governing how these counters are
incremented and/or decremented. In essence, a transmitter detecting a fault increments its
Transmit Error Counter faster than the listening nodes will increment their Receive Error
Counter. This is because there is a good chance that it is the transmitter who is at
fault!
A node starts out in Error Active mode. When any one of the two Error Counters
raises above 127, the node will enter a state known as Error Passive and when the
Transmit Error Counter raises above 255, the node will enter the Bus Off state.
- An Error Active node will transmit Active Error Flags when it detects
errors.
- An Error Passive node will transmit Passive Error Flags when it
detects errors.
- A node which is Bus Off will not transmit anything on the bus at all.
The rules for increasing and decreasing the error counters are somewhat complex, but
the principle is simple: transmit errors give 8 error points, and receive errors give 1
error point. Correctly transmitted and/or received messages causes the counter(s) to
decrease.
Example (slightly simplified): Let's assume that node A on a bus has a
bad day. Whenever A tries to transmit a message, it fails (for whatever reason). Each time
this happens, it increases its Transmit Error Counter by 8 and transmits an Active Error
Flag. Then it will attempt to retransmit the message.. and the same thing happens.
When the Transmit Error Counter raises above 127 (i.e. after 16
attempts), node A goes Error Passive. The difference is that it will now transmit Passive
Error Flags on the bus. A Passive Error Flag comprises 6 recessive bits, and will
not destroy other bus traffic - so the other nodes will not hear A complaining about bus
errors. However, A continues to increase its Transmit Error Counter. When it raises above
255, node A finally gives in and goes Bus Off.
What does the other nodes think about node A? - For every active error
flag that A transmitted, the other nodes will increase their Receive Error Counters by 1.
By the time that A goes Bus Off, the other nodes will have a count in their Receive Error
Counters that is well below the limit for Error Passive, i.e. 127. This count will
decrease by one for every correctly received message. However, node A will stay bus off.
Most CAN controllers will provide status bits (and corresponding
interrupts) for two states:
- "Error Warning" - one or both error counters are above 96
- Bus Off, as described above.
Some - but not all! - controllers also provide a bit for the Error
Passive state. A few controllers also provide direct access to the error counters.
The CAN controller's habit of automatically retransmitting messages when
errors have occurred can be annoying at times. There is at least one controller on the
market (the SJA1000 from Philips) that allows for full manual control of the error
handling.
|
Bus Failure Modes
| The ISO 11898 standard enumerates several failure modes of the CAN bus cable:
- CAN_H interrupted
- CAN_L interrupted
- CAN_H shorted to battery voltage
- CAN_L shorted to ground
- CAN_H shorted to ground
- CAN_L shorted to battery voltage
- CAN_L shorted to CAN_H wire
- CAN_H and CAN_L interrupted at the same location
- Loss of connection to termination network
For failures 1-6 and 9, it is "recommended" that the bus survives with a
reduced S/N ratio, and in case of failure 8, that the resulting subsystem survives. For
failure 7, it is "optional" to survive with a reduced S/N ratio.
In practice, a CAN system using 82C250-type transceivers will not survive failures 1-7,
and may or may not survive failures 8-9.
There are "fault-tolerant" drivers, like the TJA1053, that can handle all
failures though. Normally you pay for this fault tolerance with a restricted maximum
speed; for the TJA1053 it is 125 kbit/s. |
|
|