hide-eid/README.md

hide-eid
========

A suite of tools for hiding Endpoint IDs (IPv4/IPv6 addreses) from intermediate
participants in the Internet.

Overview
--------
As the Location/Identity Separation Protocol people have noted, IPv4 and IPv6
both use IP addresses (Endpoint IDs, in the lingo) to make routing decisions.
Focusing on this from the point of view of routing table efficiency / features,
they note that this is sub-optimal. Intermediate routers do not need the EID to
make routing decisions if an alternative (routing locater, or RLOC) is present
in the packet. More information about this idea can be found in RFC 6830,
[here](http://tools.ietf.org/html/rfc6830).

What seems to have gone unnoticed is that there are privacy implications here.
Recent PRISM, XKeyscore and other disclosures have shown that intermediaries
can be complicit in, or at least vulnerable to, attacks by national security
agencies, among others, who take advantage of the visibility of these EIDs to
construct logs of who is talking to whom. Even if the content of the message
is encrypted, the simple fact that an individual has communicated something
to an identifiable destination may be enough for these agencies to justify
taking further action against them - targeted surveillance, for instance.

Since the EID is not needed by these intermediaries, it makes sense to stop
giving it to them as quickly as is possible. These tools aim to implement a
simple scheme that achieves this goal, with speed / ease of implementation, and
the ability to scale to traffic levels of around 40Gbps (for small to medium
ISPs), in mind.

If the source and destination EIDs are not sent in the clear, and the payload
is encrypted, then the only identifiying information intermediaries have is
the source and destination RLOCs. For a HTTPS session between an individual and
a website, this could be a few tens of thousands of internet subscribers on the
one side, and a few thousand servers running websites on the other other.

To remove these EIDs, we need to start by creating an EID-to-RLOC map. A first
pass for this is an /etc/hosts equivalent; a second pass could be a DNS node
(like ip6.arpa); and a third pass might be using BGP transient attributes or a
proper LISP system. This allows access and hosting ISP to discover which RLOC
they should use for any given destination EID.

The registry also contains a public key that is to be used to encrypt the parts
of any passed packet that are sensitive. Assuming the protocol being run over IP
is encrypted (HTTPS or SSL SMTP, for instance), this may just be the IP+TCP/UDP
header, or it may be the whole packet.

When the access ISP receives a packet from its subscriber with a destination IP
that is present in this registry, it encrypts the relevant portion of the packet
with the public key, then encapsulates the packet in an IP header of its own.
This IP header has the RLOC for the wrapping ISP as the source IP, and the RLOC
obtained from the registry as the destination IP. The wrapped packet is then
forwarded for routing to the destination.

Since the RLOC is just an IP address, and one controlled by the destination ISP
at that, the route the wrapped packet takes through the internet will be about
the same as if the packet had never been wrapped. This is a large advantage of
the scheme over onion routing, such as tor; no significant latency is added.

When received by the destination ISP, it can use its private key to decrypt the
encapsulated packet, and send that decrypted packet on to its final hop. Return
traffic undergoes the same treatment, of course.


Usage
-----
Pass 1 now exists, in a rudimentary form. Here's how to put together a couple of
hide-eid endpoints that can talk to each other.

First, you need two machines - one is the source, the other the destination. Both
should have an IPv4 address routed to them that is not claimed on the machines
themselves. These will be your RLOCs. They should be globally routeable! Public
IPs, in other words.

On each machine, you'll also need a range of IPs. These will be your EIDs. They
need to be globally unique only within the context of the EID-to-RLOC registry
maintained by this project, for now - they can even be RFC1918 space, as long as
there are no overlaps within this registry. Remember, EIDs aren't used to make
routing decisions across the Internet..

Generate some ECC private keys, and their public components, in PEM format:

    $ openssl ecparam -genkey -out rloc1.private.pem -name secp160r2
    $ openssl ec -in rloc1.private.pem -pubout -out rloc1.public.pem
    $ openssl ecparam -genkey -out rloc2.private.pem -name secp160r2
    $ openssl ec -in rloc2.private.pem -pubout -out rloc2.public.pem

Add entries to the rloc-registry.json file to reflect your mappings. You need to
add an entry (a JSON object) to the "eid_rloc_map" array, like this:

    { "family":"ipv4", "network":"10.0.0.0", "netmask":8, "rloc":"1.2.3.4"}

IPv6 support isn't in yet. Once it is, IPv4-in-IPv6 and vice-versa mappings will
be permitted.

You also need to add an rloc:pubkey mapping to the "keys" object. Make sure
it's not the private key! Also, remember to add all the EID mappings and RLOCs
you want, not just one.

Then, on each machine:

    $ cd pass-1
    $ make all
    host1$ ./hide-eid rloc-registry.json eid0 eid0 <rloc1> <rloc1>.private.pem
    host2$ ./hide-eid rloc-registry.json eid0 eid0 <rloc2> <rloc2>.private.pem

You'll notice quite a lot of uninteresting output; it's wordy for all the wrong
reasons at the moment. Of particular note are a wide range of TODOs.

One of those TODOs is bgp/etc support for route injection. Since it's not done yet,
you need to add the routes yourself:

    host1$ ip route add <eid-range-for-rloc-2> dev eid0
    host2$ ip route add <eid-range-for-rloc-1> dev eid0

Also, make sure that an EID from each range is routable on the respective machines.
For testing, I just did:

    host1$ ip addr add <eid-ip-for-rloc-1> dev eid0
    host2$ ip addr add <eid-ip-for-rloc-2> dev eid0

The short version is that traffic to and from those EIDs must go into the TUN
device controlled by hide-eid for it to do the magic.

At this point, you should be able to ping <eid-ip-for-rloc-2> from host1, and
vice-versa, and get an ICMP echo reply back. You can also run TCP or UDP
servers on one of the IPs, and connect to them from the other IP. If you run
wireshark or tcpdump on an intermediate machine (or just one of the hosts, if
you focus on the egress/ingress traffic) you'll see obscure IP packets with
just the RLOC addresses as source and destination, and no visible UDP/TCP
headers. IP Protocol is set to 99 - "any private encryption scheme".


Encryption
----------
Encryption scheme is really the only novel portion of this project; the rest is
covered in the L/ISP RFCs. This code is all about slapping together a basic
L/ISP router (badly), and implementing cryptography for the encapsulated IP
packets, for the sake of experimenting. Crypto is hard, and experimentation is
key (ha ha).

### Current scheme

This seems less stupid.

* EC public keys in central repository
* Each participant knows only their private key
* Generate ECDH secret for each peer using their public + your private key
* pseudo-random 128-bit IV per-packet, put at the start of encrypted data
* Use as256gcm symmetric encryption with sha256( ecdh ) to encrypt / decrypt

Main point is that routers don't need to communicate with each other to
negotiate a shared key - they can independently derive the same asymmetric key
as long as they share some common assumptions, have their own private key, and
the peer's public key.

Asymmetric key size is smaller, and we're moving to a symmetric cipher for the
actual packet encryption, so hopefully this will be much faster than scheme 0.

Which curve should we be using? No clue. What size of key should we be using?
No clue. Is this kind of shared key appropriate when we're passing considerable
traffic? No clue.

### Scheme 0

This was stupid.

  * RSA public keys in central repository
  * Just use public key to directly encrypt packet data
  * Use private key to decrypt packets addressed to you.

This is slow, and you can only encrypt data that's smaller than the key modulus,
or something like that.

First result: rtt increased from 37ms to 80ms.

For access<->hosting, that kind of latency increase is bad, but bearable. For
hosting<->hosting, it's completely unacceptable.

Not all of it may be crypto-related - worth implementing a no-op branch that
just encapsulates, and checking the difference.


Limitations
-----------
You have to trust two ISPs.

Certainly for access ISPs, even with the best will in the world, the
infrastructure between them and their layer 1/2 service providers may be bugged.
This is not protected against by this scheme; if you suspect this is happening
to your ISP without their knowledge, you can run IPSec over the link and allow
them to terminate it just before (or on) the box that wraps the packets. If you
suspect it is happening with their knowledge, the best you can do is change ISP.
If we run out of good ISPs, this scheme adds nothing. You can always start a VPN
ISP.

If the other side of the link is complicit, this scheme does nothing. It isn't
going to stop Facebook from handing all their records of your accesses to them
over to the NSA. Stop using Facebook.

There are four cryptographic operations in each trip - encrypt outgoing packet,
decrypt outgoing packet, encrypt return packet, decrypt return packet. This is
going to be slower than no crypto. Too slow?

May break ICMP and other responses from intermediate ISPs. Path MTU discovery
breaks, for instance, with a naive implementation of this scheme, as does
ICMP tracerouting (this can be fixed, especially in IPv6 - see _ICMP_).


Selling points
--------------
Uptake can be low (but not zero) and significant benefits are still seen. Even
if just two ISPs take up the scheme, one access and one hosting, everyone
who uses the access ISP is now anonymous for any of their traffic that goes to
the hosting ISP. Privacy-conscious individuals can take note of that and move
to those ISPs, or tunnel their traffic to them, to regain their anonymity.

Faster than Tor. Especially in the latency stakes.

Requires no CPE changes. This killed IPv6 uptake for a decade - end users are
not easy to upgrade. L/ISP schemes typically require the holder of the EID to
be in charge of looking up and using RLOCs; this scheme does not need that.

Probably stateless. Putting the encrypted EIDs into the packet we send means
that the source and destination ISPs don't need to perform connection tracking.
This isn't NAT in the traditional sense.

Since both source and destination enjoy a large anonymity set, this scheme is
resilient to correlation attacks. An earlier revision only encrypted the
source EID, which was vulnerable to trivial attacks of that nature.


ICMP
----
As noted, a naive implementation breaks ICMP responses by intermediaries. This
is a result of the design; as they no longer know the EID, they can't send an
ICMP response of any sort to it. As IPv6, in particular, relies on ICMP for
protocol features such as path MTU discovery, this is something of a problem.

One solution to this is to have a wide range of RLOC IP addresses, and use them
in a round-robin manner to maintain a local map of RLOC -> real source IP, which
would be retained for a short period (some seconds, say). If an ICMP reply is
received, directed to one of these RLOCs, the EID can be looked up from the map
and the packet can be rewritten with it and forwarded appropriately. Of course,
this is state for the ISP to keep track of, and monopolises a segment of the IP
address space. That last is not a problem in IPv6, but will prevent its use in
almost all IPv4 deployments. Fortunately, IPv4 is legacy, and doesn't strictly
require ICMP, in the same way IPv6 does.


Why?
----
It's my position that anonymity is only necessary in the presence of oppression.
In the absence of oppression, anonymity primarily facilitates crime and wrong-
doing. When present, it continues to do that, but also provides a means of
escaping oppression, and defeating oppressors.

Are we in an oppressive society? Do oppressive societies exist? I believe the
answer to both of those questions is yes. I wish it were otherwise.


Author
------
    Name   : Nick Thomas
    Handle : lupine
    Web    : lupine.me.uk
    Comms  : nick@lupine.me.uk