Contents Previous Next

Linux FreeS/WAN Troubleshooting

This is a collection of notes on various aspects of debugging FreeS/WAN setup and connections. Other sources of information are:

Problem Reporting

For how to report problems, see the file doc/prob.report.

Logs used

Error messages generated by KLIPS during the boot sequence are accessible with the dmesg command.

Pluto logs to:

Check both places to get full information. If you find nothing, check your syslogd.conf(5) to see where your system is putting things.

Information available on your system

man pages provided

ipsec.conf(5)
Manual page for IPSEC configuration file.
ipsec(8)
Primary man page for ipsec utilities.

Other man pages are on

this list and in

Status information

/proc/net/ipsec*
Various files reporting the status of IPSEC.
ipsec auto --status
Command to get status report from running system. Displays Pluto's state: the list of "added" conns and the list of state objects reflecting ISAKMP and IPsec SAs being negotiated or installed.
ipsec look
Brief status info.
ipsec barf
Copious debugging info.

Pluto problem hints

From a message posted to the mailing list Jan 14 2000 by Pluto developer Hugh Redelmeier:

Until ipsec auto and whack/pluto get fixed:

        When puzzled by Pluto behaviour, always look in
        /var/log/secure -- that's the unadulterated story.

        To get the whole whack output (almost a subset of
        the story from Pluto), give auto the --verbose flag
        on each invocation.  Eg:
                ipsec auto --verbose --up sadaisy


Bonus hint: problems snowball.  So look for the first problem first,
it is likely to be the cause of later problems.

And a final hint: If one side keeps retrying to no avail, it may be
because the other is unhappy about something and won't reply.  Go look
at the other side to figure out what it doesn't like.
Various error messages from Pluto are discussed in the FAQ and the ipsec_pluto(8) man page.

Using GDB on Pluto

You may need to use the GNU degugger, gdb(1), on Pluto. This should be necessary only in unusal cases, for example if you encounter a problem which the Pluto developer cannot readily reproduce or if you are modifying Pluto.

Here are the Pluto developer's suggestions for doing this:

Can you get a core dump and use gdb to find out what Pluto was doing
when it died?

To get a core dump, you will have to set dumpdir to point to a
suitable directory (see ipsec.conf(5)).

To get gdb to tell you interesting stuff:
        $ script
        $ cd dump-directory-you-chose
        $ gdb /usr/local/lib/ipsec/pluto core
        (gdb) where
        (gdb) quit
        $ exit

The resulting output will have been captured by the script command in
a file called "typescript".  Send it to the list.

Do not delete the core file.  I may need to ask you to print out some
more relevant stuff.
Note that the dumpdir parameter takes effect only when the IPsec subsystem is restarted -- reboot or ipsec setup restart .

ifconfig reports for KLIPS debugging

From a mail message from our KLIPS developer:

Here is a catalogue of the types of errors that can occur for which
statistics are kept when transmitting and receiving packets via klips.
I notice that they are not necessarily logged in the right counter.
. . .

Sources of ifconfig statistics for ipsec devices

rx-errors:
- packet handed to ipsec_rcv that is not an ipsec packet.
- ipsec packet with payload length not modulo 4.
- ipsec packet with bad authenticator length.
- incoming packet with no SA.
- replayed packet.
- incoming authentication failed.
- got esp packet with length not modulo 8.

tx_dropped:
- cannot process ip_options.
- packet ttl expired.
- packet with no eroute.
- eroute with no SA.
- cannot allocate sk_buff.
- cannot allocate kernel memory.
- sk_buff internal error.


The standard counters are:

struct enet_statistics
{
        int        rx_packets;                /* total packets received */
        int        tx_packets;                /* total packets transmitted */
        int        rx_errors;                /* bad packets received */
        int        tx_errors;                /* packet transmit problems */
        int        rx_dropped;                /* no space in linux buffers */
        int        tx_dropped;                /* no space available in linux */
        int        multicast;                /* multicast packets received */
        int        collisions;

        /* detailed rx_errors: */
        int        rx_length_errors;
        int        rx_over_errors;                /* receiver ring buff overflow */
        int        rx_crc_errors;                /* recved pkt with crc error */
        int        rx_frame_errors;        /* recv'd frame alignment error */
        int        rx_fifo_errors;                /* recv'r fifo overrun */
        int        rx_missed_errors;        /* receiver missed packet */

        /* detailed tx_errors */
        int        tx_aborted_errors;
        int        tx_carrier_errors;
        int        tx_fifo_errors;
        int        tx_heartbeat_errors;
        int        tx_window_errors;
};

of which I think only the first 6 are useful.

Testing between security gateways

Sometimes you need to test the tunnel between two security gateways. This can be done by having a machine behind one gateway ping a machine behind the other gateway, but this is not always convenient or even possible.

Simply pinging one gateway from the other is not useful. Such a ping does not normally go through the tunnel. The tunnel handles trafiic between the two protected subnets, not between the gateways . Depending on the routing in place, a ping might

Neither event tells you anything about the tunnel. You can explicitly create an eroute to force such packets through the tunnel, or you can create additional tunnels as described in our configuration document, but those may be an unnecessary complications in your situation.

The trick is to explicitly use an IP address for the subnet-side interface of one gateway machine, either as the target of a ping or as the origin of a traceroute. Since that interface is on the protected subnet, the resulting packets do go via the tunnel.

From the mailing list:

>; > ;I have two gateways, SG1 and SG2, with I/Fs i and e (for internal and
>; > ;external), and two hosts, H1 and H2 set up as:
>; > ;
>; > ;     H1-----(i)SG1(e)===========(e)SG2(i)------H2
>; > ;
>; > ;And I want to test a tunnel set up between the H1 subnet and the H2
>; > ;subnet, but the H2 host may not exist yet, or may not be responding.
>; > ;
>; > ;If I ping SG2i from H1, all traffic in both directions is encrypted,
>; > ;testing the tunnel.
.....
>; > ;If I understand correctly, this could be accomplished by the 'ping -I'
>; > ;feature of which you spoke earlier or 'traceroute -i'?
>; 
>; Indeed, 
>;   traceroute -i eth0 -f 20 otherSG 
>; appears to give me a solution using only N machines, the SGs themselves.
>; This is very nice.  Note that in this example, eth0 is the *private* (i)
>; interface.  If you try it with the (e) interface or the ipsec0 interface,
>; you won't get the desired result.  If you leave off the -f 20, the trace
>; will hang in some totally bizarre way.

Some older Linux distributions did not support ping -I, according to mailing list comments. More recent comments indicate that this does now work. For example, you can do:

	ping -I 192.168.10.250 192.168.0.11
to test between the interfaces on the two protected subnets.

Claudia's guide

FreeS/WAN "listress" (mailing list tech support person) Claudia Schmeing posted this guide to trouble-shooting in early March 2000. It may be worth checking list archives for a more recent version.
Your mail has inspired me to write a little trouble shooting 
guide to supplement and connect the existing docs on the subject.
Here's v. 1. Comments are welcome.


Steps in Troubleshooting Linux FreeS/WAN:
- -----------------------------------------

Finding the Error
- -----------------

First, try to find verbose text that describes how things are going wrong 
or creating unexpected results. Here's how:

While the dialog from ipsec auto --up myconn (or whatever) will tell
you where the process fails, it is often not very specific. And for
errors that have to do with the use of a conn, you may not even have
this.

More information can be gleaned from the log files, usually 
/var/log/messages or /var/log/secure. On some systems, the logfiles
are differently named. To find your error messages, check where your 
/etc/syslog.conf or equivalent is directing authpriv.

The amount of your error's description in your logs depends on your debug 
settings, klipsdebug= and plutodebug=, in ipsec.conf. See man ipsec.conf for 
details. Note that usually, either 'none' or 'all' will be what you want; 
you don't need to worry about the nuances of the debug options.

If you're having an negotiation problem (as you are, above) plutodebug 
is most relevant. If you have a connection established but the
packets aren't doing what you think they should, play with klipsdebug.
See also /doc/ipsec.html#parts for the division of duties within
Linux FreeS/WAN.

After raising your debug levels, restart Linux FreeS/WAN to ensure that
the conf file is re-read, then re-create the error to generate 
verbose logs. Proceed to the failure point in the logs and find 
the handful of lines which succinctly describe how things are going 
wrong or contrary to your expectation.


Interpreting the Error
- ----------------------

To interpret this text, use the following resources:

* the FAQ, doc/faq.html. Since the FAQ is constantly being updated,
the snapshot may have a new entry relevant to your problem. For example,
the faq in today's snapshot, addresses several more questions than the 
version on the site.

* doc/config.html. Instructions for some configurations you can 
make with Linux FreeS/WAN. See especially doc/config.html#multitunnel,
which is useful in a large proportion of the questions we see on the list.

* doc/trouble.html.  Debugging instructions and notes. Note that 
most people now test automatic keying only if that's what they're using 
in the field, and only revert to manual testing to test unexpected 
behaviour that seems to be occurring at a very basic level.

* the list archives. There are three: sandelman nexial, as listed
at mail.html, and the archive for the filtered list at exim.org:

http://www.exim.org/pipermail/linux-ipsec/
(also listed in the upcoming docs).

Each of them works differently, so it's worth checking each.

Take a snippet of the text of your error which doesn't include anything
site specific, ex. "No connection is known for", and search on this.
It's likely you'll find the same answer to someone else's question 
this way, and it's faster than asking real-time humans ;-)

* Sometimes a quick peek into the code where the error is being generated
can be helpful. The pluto code is pretty well documented with comments
and meaningful variable names.


Asking for Help
- ---------------

A combination of the freeswan.org pages mentioned above and an archive 
search will address nearly every problem. But for those times when 
you've found something unusual, or your forehead is sore from banging 
it on your monitor, there's always the mailing list ;-)

When writing the list, remember that more is more -- While sometimes 
an initial query with a quick description of your intent and error 
will twig someone's memory of a similar problem, it's often necessary to 
send a second mail with a complete problem report. See doc/prob.report
for details. Lastly, as a kindness to other list members, you might post 
a link to a website where you've published your barf file rather than 
the entire file, if that option's available to you.

Happy trouble shooting,

Claudia

Contents Previous Next