Known issues with Opportunistic Encryption                     Claudia Schmeing
------------------------------------------


This is an overview of known issues with OE, current March 2003.


This document supplements:


FreeS/WAN Quickstart Guide        doc/quickstart.html

Opportunism HOWTO                 doc/opportunism.howto

Opportunism spec                  doc/opportunism.spec

Internet Draft                    doc/draft-richardson-ipsec-opportunistic.txt


* Use the most recent Linux FreeS/WAN release (2.00) from ftp.xs4all.nl 
  to try OE.


DESIGN LIMITATIONS


* Because Opportunistic Encryption relies on DNS:
  - to authenticate one FreeS/WAN to another, and
  - to prove that we have the right to protect traffic for a given IP,
  this authentication/authorization is only as strong as your DNS is 
  secure.

  Without secure DNS, OE protects against passive snooping only.
  Because the public key and gateway information that FreeS/WAN gets from 
  DNS is not authenticated, a man-in-the-middle attack is still possible.  
  We hope that as DNSsec is widely adopted, OE with strong authentication
  will become more widespread.

  However, our software does not yet distinguish between strongly and weakly 
  authenticated OE. This information might be useful for defining local 
  security policy.


* Denial of service attacks are possible against OE. If you rely on OE rather 
  than VPN to connect several offices, a determined attacker could prevent you 
  from communicating securely.


* OE challenges the notion that all IPsec peers are "friends". Strangers
  can now tunnel _through_ your regular public interface, to get packets
  to another (ipsecN) interface. This calls for a re-visit to firewall policy.


* FreeS/WAN only creates OE connections when it traps an outgoing packet.
  Since most traffic is two-way, for most traffic, FreeS/WAN 2.x may 
  soon trap an outgoing packet and create an IPsec connection to
  protect both incoming and outgoing traffic. However, if a local 
  FreeS/WAN box accepts inbound traffic from a remote peer but 
  generates no outbound traffic in response, the local FreeS/WAN will not 
  attempt to initiate OE. Of course, the peer may also initiate OE upon
  trapping its own outbound traffic.
  
* OE is only as reliable as your DNS is.
  
  If your DNS service is flaky, you will not be able to reliably establish 
  OE connections to known OE-capable peers.
  
  If you ping a peer, but your FreeS/WAN does not find a TXT record signifying 
  the peer's ability to respond to OE negotiation), FreeS/WAN will not try to 
  opportunistically initiate, and communication will fallback to clear.

  For more secure and reliable DNS, we recommend that you run DNS
  within your security perimeter, either on your security gateway, or
  on a machine to which you have a VPN connection. It is also possible
  to have your DNS server located elsewhere on your LAN, though this may
  cause lag on startup.
  
  This mailing list message explains how to run a local caching name server:
  http://lists.freeswan.org/pipermail/design/2003-January/004205.html
 
  See also "Getting DNS through" in opportunism.howto
  http://lists.freeswan.org/pipermail/design/2002-April/002285.html .


CURRENT ISSUES


* In some situations, the local FreeS/WAN reports that you have a 
  connection but your peer has failed to clear. While an initial negotiation 
  achieved tunnel that both sides knew about, if one side later goes down, 
  the other may not know.

  The result is a one-way connection; highly undesirable, since it 
  gives a false sense of security. Only by snooping packets will you see 
  that the return packets are not encrypted. Workaround: Alter the 
  _updown script to block cleartext packets from a host we expect to send 
  encrypted. This is not specific to OE; a similar problem can occur
  for VPNs.

  We have encountered two examples of this behaviour: on rekey, and on peer 
  restart. 

  The restart problem: If A and B have an OE connection, but A is rebooted, 
  normally A will try to re-connect to B and (if it has no DNS-related 
  failures) it will succeed. But, if A is set up for responder-only OE, you 
  will have a one-way connection until B notices that its original tunnel 
  has expired. For details see:

      http://lists.freeswan.org/pipermail/design/2002-May/002582.html
      http://lists.freeswan.org/pipermail/design/2002-June/002610.html

  TIP:  If an OE connection isn't behaving, you can recreate it with

      ipsec whack --oppohere sourceIPaddress --oppothere targetIPaddress

  The rekey problem is now an old issue, and is summarized in that section,
  below.


* There is no good clean facility to delete OE connections.
  Available are:

      ipsec auto --status to list connections
      ipsec whack --deletestate to delete by state#.
   

* You may experience seeming gaps at rekey time. Once you generate traffic,
  you will find that the OE connection returns.

  By default, OE connections are not rekeyed; if they were we'd have a 
  mountain of useless connections. As a consequence, if your OE connection is
  idle at rekey time, it will go down until you generate further traffic.
  To ensure prompt rekeying, you can run a ping thorough the OE tunnel.

  There is an old issue with an Assertion failure at rekey time.


* At the moment, you can only run active OE on one physical interface.

  Active means --routed, to trap outbound packets.  It is this route
  that is a problem.

  Untested theory: you can have multiple active OE conns, for different 
  source addresses, but they all have to point their traffic out the single 
  interface.

  When responding: you can only define one OE connection (per host or subnet) 
  in ipsec.conf, and that conn will apply to one interface. Normally this 
  will be the public interface which your default route uses; it is, however, 
  configurable.

  Theoretically, it might make sense to select between multiple OE conns 
  based on some criterion, such as address ranges. This might be useful for 
  local OE, or in a complex routing scenario.

  Currently, Pluto expects only one OE connection. If you add another,
  Pluto may choose randomly between them, producing unpredictable results.


* Building OE conns between nodes on a LAN is not possible.

  This is a side effect of conflicts about ARP entries 
  in the rt_cache and our "stupid routing tricks".
  There is no known workaround at this time.

  "Stupid routing tricks" are an ongoing issue, and should
  go away in KLIPS 3, planned for summer 2003.

  See these explanations:
      http://lists.freeswan.org/pipermail/design/2002-April/002285.html
      http://lists.freeswan.org/pipermail/design/2002-August/003249.html


* CNAME lookups can fail.

  A common method of reverse delegation is to use CNAMES.
  Therefore, some instances, Opportunistic Encryption does not work 
  with reverse delegation.

  This problem was once capable of being a show stoppper.
  It no longer is, when we have Bind snap-pre9.3 running locally.
  See "old issues" for the old diagnosis.

  The new resolver follows a CNAME trail when looking for a TXT record, 
  but not when seeking a KEY. This transforms our former CNAME problems 
  into a non-issue.
 
  Why? FreeS/WAN makes many queries looking for TXT records, to see if 
  we can OE with potential peers. Therefore failed CNAME->TXT lookups 
  were costly in terms of time-outs on potential OE peers.

  FreeS/WAN makes few queries looking for KEY records, and when 
  it does, there is normally one there. Thus the continuing CNAME 
  lookup problem here does not impede FreeS/WAN much.
  
  More detail:

  In the past, the resolver would first query for a TXT record.
  If it did not find a TXT record, it would query a CNAME.
  This required two attempts at resolving, and happened with great
  frequency. Now, TXT requires one attempt only.

  KEY still requires two attempts, but as mentioned above, this is rarer.
  We need a KEY only when an OE peer is trying to negotiate, or we've found 
  an OE peer's TXT record which should point to a KEY.

  Backgrounder:

  TXT is a traditional record, and thus has "implied helper support",
  so that resolvers might more likely follow the CNAME trail.
  By contrast, KEY is non-traditional. At some point the DNS folks decided 
  no such help behaviour would be defined for new records, whereas support 
  for old helper behaviour would not be dropped.


* To make OE operation smoother, we may need a script that runs and warns 
  if we have the reverse DNS records, but not the software running.
  The reverse records advertise that we can do OE, but when the software is
  not running this is false advertising.


* Coterminal OE doesn't work in practise. This includes OE-in-WAVEsec.

  If you have coterminal OE connections (two OE connections which share
  one endpoint), you should have use of one of the encrypted links, but it 
  is not clear which one KLIPS will prefer. In particular, the behaviour 
  may not be symmetrical.

  Worse yet, it just seems to trip over itself and be generally
  unworkable.

  Weird but predictable:

  If you have both a gateway and a host who advertise (via DNS) an 
  ability to do OE you need to be serious about doing host-based
  OE, or you will be stuck in initiator-only mode. If your host 
  advertises but does not run OE, then when a peer tries to connect to 
  your host, it will fail to clear. The peer will then not try to encrypt 
  traffic bound for that host as it travels to the gateway. To remedy 
  the situation, restart ipsec on the peer (or otherwise flush out 
  the %pass eroute), and ping the peer from your host to initiate 
  OE.


OLD ISSUES

* One-way connection was created on rekey. Solved in 2.0

  If one side (A) has a shorter _keylife_ than the other,
  and that side also has _rekey=no_, then when the keylife has 
  expired, it will expect that its peer (B) will make a new conn to replace 
  the existing one. Unfortunately, B has no idea. 

  B continues to send out encrypted packets on the original connection,
  while A passes the return packets along in the clear.

  There is a proposed patch for (A) here:
      http://lists.freeswan.org/pipermail/design/2002-July/003114.html


* Failure to look up own host name is a show stopper.
  Solved in 1.98 and 1.98b.

  Solution: new setting %dnsondemand. Usage:

      leftrsasigkey=%dnsondemand   # now in sample ipsec.conf
      rightrsasigkey=%dnsondemand.

  From man ipsec.conf:

      The value  %dnsondemand  means  the key is to fetched from DNS 
      at the time it is needed.

  If Linux FreeS/WAN can't get the key for your public interface from 
  DNS, it will not keep trying, and you will not be able to do OE.

  The error message is:

  May 14 09:40:24 road Pluto[21210]: failure to fetch key for 193.110.157.18 
  from DNS: failure querying DNS for KEY of 18.157.110.193.in-addr.arpa.: 
  Host name lookup failure

  Workaround: 1 or 2
  1. Supply a key in the conn. leftrsasigkey=0s...
  2. Fix the KEY lookup failure and try again.


* Assertion failure at OE rekey time. Fixed in 2.0pre0. Patch for 1.98b posted 
  at http://lists.freeswan.org/pipermail/design/2002-August/003347.html


* 1.91 to 1.94 have serious problems with %trap and %hold bugs. These bugs, 
  introduced while coding the support structure for OE, affect both OE and VPN 
  connections.


* CNAME lookups can fail.

  In some instances, Opportunistic Encryption does not work 
  with reverse delegation.

  This has changed if you have Bind snap-pre9.3 running locally.
  See the "current issues" list for the new situation. Below is the 
  old diagnosis.

  A common method of reverse delegation is to use CNAMES.
  When delegating DNS, a provider uses a CNAME to indicate
  the domain which is now authoritative for the reverse lookup.

  Because of errors in BIND implementations, Linux FreeS/WAN 
  may not properly follow a well constructed CNAME trail 
  ending in a KEY. If you were to follow the same trail manually, 
  you would find the correct KEY.

  When a DNS server queried by Linux FreeS/WAN follows a CNAME, 
  it seems to forget what record type it is looking for, and it 
  returns a PTR, despite the fact that a KEY was requested.

  We have seen that this affects KEY lookups. We suspect that
  it affects all non-PTR records, including KEY and TXT.

  Fix (untested):

  Set up a DNS locally (within your security perimeter)
  which includes a fix. We believe the problem is fixed in 
  Bind 9, and if not, we can create a patch to fix a Bind 9 
  implementation, and submit it to the maintainers.

  Workaround:

  Meanwhile, if you are experiencing this bug, you can send your 
  provider KEY and TXT records for direct insertion into the reverse 
  ZONE files, rather than asking your provider to delegate authority 
  using CNAME.

  People who own IP blocks, rather than leasing them, may not 
  experience this problem. If you were assigned IPs more than 
  five years ago, you may own your IPs.