hide-eid/README.md

hide-eid
========

A suite of tools for hiding Endpoint IDs (IPv4/IPv6 addreses) from intermediate
participants in the Internet.

Overview
--------
As the Location/Identity Separation Protocol people have noted, IPv4 and IPv6
both use IP addresses (Endpoint IDs, in the lingo) to make routing decisions. 
Focusing on this from the point of view of routing table efficiency / features,
they note that this is sub-optimal, and that intermediate routers do not need
the EID to make routing decisions, if an alternative (routing locater, or RLOC)
is present in the packet.

What seems to have gone unnoticed is that there are privacy implications here.
As recent PRISM, XKeyscore and other disclosures have shown, these intermediates
are complicit in, or at least vulnerable to, attacks by national security
agencies, who take advantage of the visibility of these EIDs to construct
comprehensive logs of who is talking to who; even if the content of the
communication is encrypted, the simple fact that an individual has communicated
something somewhere may be enough for these agencies to justify taking futher
action against them.

Since the EID is not needed by these intermediaries, it makes sense to stop
giving it to them as quickly as is possible. These tools aim to implement a
simple scheme that achieves this goal, with speed / ease of implementation, and
the ability to scale to traffic levels of around 40Gbps (for small to medium ISPs). 

Removing knowledge of the EID from them means that any affected traffic enjoys
the status of being in an anonymity set that is as large as the number of people
who share the same RLOC. In this scheme, I assume that this is an access ISP, on
one end of the path; and a hosting ISP, on the other end. This provides typical
anonymity sets of between a few hundred to a few million individuals. 

To remove these EIDs, we need to start by creating an EID-to-RLOC map. A first
pass for this is an /etc/hosts equivalent; a second pass could be a DNS node
(like ip6.arpa); and a third pass might be using BGP transient attributes or a
proper LISP system. This allows access and hosting ISP to discover which RLOC
they should use for any given destination EID.

The registry also contains a public key that is to be used to encrypt the parts
of any passed packet that are sensitive. Assuming the protocol being run over IP 
is encrypted (HTTPS or SSL SMTP, for instance), this may just be the IP+TCP/UDP
header, or it may be the whole packet. 

When the access ISP receives a packet from its subscriber with a destination IP
that is present in this registry, it encrypts the relevant portion of the packet
with the public key, then encapsulates the packet in an IP header of its own.
This IP header has the RLOC for the wrapping ISP as the source IP, and the RLOC
obtained from the registry as the destination IP. The wrapped packet is then
forwarded for routing to the destination.

Since the RLOC is just an IP address, and one controlled by the destination ISP
at that, the route the wrapped packet takes through the internet will be about
the same as if the packet had never been wrapped. This is a large advantage of
the scheme over onion routing, such as tor; no significant latency is added. 

When received by the destination ISP, it can use its private key to decrypt the 
encapsulated packet, and send that decrypted packet on to its final hop.


Usage
-----
Pass 1 now exists, in a rudimentary form. Here's how to put together a couple of
hide-eid endpoints that can talk to each other.

First, you need two machines - one is the source, the other the destination. Both
should have an IPv4 address routed to them that is not claimed on the machines
themselves. These will be your RLOCs. They should be globally routeable! Public
IPs, in other words.

On each machine, you'll also need a range of IPs. These will be your EIDs. They
need to be globally unique only within the context of the EID-to-RLOC registry
maintained by this project - they can even be RFC1918 space, as long as there
are no overlaps within this registry. Remember, EIDs aren't used to make routing
decisions across the Internet..

Generate some ECC private keys, and their public components, in PEM format:

  $ openssl ecparam -genkey -out rloc1.private.pem -name secp160r2
  $ openssl ec -in rloc1.private.pem -pubout -out rloc1.public.pem
  $ openssl ecparam -genkey -out rloc2.private.pem -name secp160r2
  $ openssl ec -in rloc2.private.pem -pubout -out rloc2.public.pem

Add entries to the rloc-registry.json file to reflect your mappings. You need to
put an entry (a JSON object) to the "eid_rloc_map" array, like this:

  { "family":"ipv4", "network":"10.0.0.0", "netmask":8, "rloc":"1.2.3.4"}

(IPv6 support isn't in yet)

You also need to add an rloc:pubkey mapping to the "keys" object. Make sure 
it's not the private key! Also, remember to add all the EID mappings and RLOCs, 
not just one.

Then, on each machine:

  $ cd pass-1
  $ make all
  host1$ ./hide-eid rloc-registry.json eid0 eid0 <rloc1> <rloc1>.private.pem
  host2$ ./hide-eid rloc-registry.json eid0 eid0 <rloc2> <rloc2>.private.pem

You'll notice quite a lot of uninteresting output; it's wordy for all the wrong
reasons at the moment. Of particular note are a wide range of TODOs.

One of those TODOs is bgp support for route injection. Since it's not done yet,
you need to add the routes yourself:

  host1$ ip route add <eid-range-for-rloc-2> dev eid0
  host2$ ip route add <eid-range-for-rloc-1> dev eid0

Also, make sure that an EID from each range is routable on the respective machines.
For testing, I just did:

  host1$ ip addr add <eid-ip-for-rloc-1> dev eid0
  host2$ ip addr add <eid-ip-for-rloc-2> dev eid0

The short version is that traffic to and from those EIDs must go into the TUN
device controlled by hide-eid for it to do the magic.

At this point, you should be able to ping <eid-ip-for-rloc-2> from host1, and
vice-versa, and get an ICMP echo reply back. You can also run TCP or UDP
servers on one of the IPs, and connect to them from the other IP. If you run
wireshark or tcpdump on an intermediate machine (or just one of the hosts, if
you focus on the egress/ingress traffic) you'll see obscure IP packets with
just the RLOC addresses as source and destination, and no visible UDP/TCP
headers. IP Protocol is set to 99 - "any private encryption scheme".


Encryption
----------
Encryption scheme is really the only novel portion of this project; the rest is
covered in the L/ISP RFCs. This code is all about slapping together a basic
L/ISP router (badly), and implementing cryptography for the encapsulated IP
packets, for the sake of experimenting. Crypto is hard, and experimentation is
key (ha ha).

Current scheme:
~~~~~~~~~~~~~~~
This seems less stupid.
 
  * EC public keys in central repository
  * Each participant knows only their private key
  * Generate ECDH secret for each peer using their public + your private key
  * pseudo-random 128-bit IV per-packet, put at the start of encrypted data
  * Use as256 symmetric encryption with sha256( ecdh ) to encrypt / decrypt

Main point is that routers don't need to communicate with each other to
negotiate a shared key - they can independently derive the same asymmetric key
as long as they share some common assumptions, have their own private key, and
the peer's public key. 

Asymmetric key size is smaller, and we're moving to a symmetric cipher for the
actual packet encryption, so hopefully this will be much faster than scheme 0.

Which curve should we be using? No clue. What size of key should we be using?
No clue. Is this kind of shared key appropriate when we're passing considerable
traffic? No clue.

Scheme 0: 
~~~~~~~~~
This was stupid.

  * RSA public keys in central repository
  * Just use public key to directly encrypt packet data
  * Use private key to decrypt packets addressed to you.

This is slow, and you can only encrypt data that's smaller than the key modulus,
or something like that.

First result: rtt increased from 37ms to 80ms. 

For access<->hosting, that kind of latency increase is bad, but bearable. For
hosting<->hosting, it's completely unacceptable.

Not all of it may be crypto-related - worth implementing a no-op branch that
just encapsulates, and checking the difference.


Limitations
-----------
You have to trust two ISPs.

Certainly for access ISPs, even with the best will in the world, the
infrastructure between them and their layer 1/2 service providers may be bugged. 
This is not protected against by this scheme; if you suspect this is happening
to your ISP without their knowledge, you can run IPSec over the link and allow
them to terminate it just before (or on) the box that wraps the packets. If you
suspect it is happening with their knowledge, the best you can do is change ISP.
If we run out of good ISPs, this scheme adds nothing. You can always start a VPN
ISP.

If the other side of the link is complicit, this scheme does nothing. It isn't
going to stop Facebook from handing all their records of your accesses to them
over to the NSA. Stop using Facebook. 

There are four cryptographic operations in each trip - encrypt outgoing packet,
decrypt outgoing packet, encrypt return packet, decrypt return packet. This is 
going to be slower than no crypto. Too slow?

May break ICMP and other responses from intermediate ISPs. Path MTU discovery
breaks, for instance, with a naive implementation of this scheme, as does
ICMP tracerouting (this can be fixed, especially in IPv6 - see _ICMP_).


Selling points
--------------
Uptake can be low (but not zero) and significant benefits are still seen. Even
if just two ISPs take up the scheme, one access and one hosting, everyone
who uses the access ISP is now anonymous for any of their traffic that goes to
the hosting ISP. Privacy-conscious individuals can take note of that and move
to those ISPs, or tunnel their traffic to them, to regain their anonymity.

Faster than Tor. Especially in the latency stakes.

Requires no CPE changes. This killed IPv6 uptake for a decade - end users are
not easy to upgrade. L/ISP schemes typically require the holder of the EID to
be in charge of looking up and using RLOCs; this scheme does not need that.

Probably stateless. Putting the encrypted EIDs into the packet we send means
that the source and destination ISPs don't need to perform connection tracking.
This isn't NAT in the traditional sense. 

Since both source and destination enjoy a large anonymity set, this scheme is
resilient to correlation attacks. An earlier revision only encrypted the
source EID, which was vulnerable to trivial attacks of that nature.


ICMP
----
As noted, a naive implementation breaks ICMP responses by intermediaries. This
is a result of the design; as they no longer know the EID, they can't send an
ICMP response of any sort to it. As IPv6, in particular, relies on ICMP for 
protocol features such as path MTU discovery, this is something of a problem.

One solution to this is to have a wide range of RLOC IP addresses, and use them
in a round-robin manner to maintain a local map of RLOC -> real source IP, which
would be retained for a short period (some seconds, say). If an ICMP reply is
received, directed to one of these RLOCs, the EID can be looked up from the map
and the packet can be rewritten with it and forwarded appropriately. Of course,
this is state for the ISP to keep track of, and monopolises a segment of the IP
address space. That last is not a problem in IPv6, but will prevent its use in 
almost all IPv4 deployments. Fortunately, IPv4 is legacy, and doesn't strictly
require ICMP, in the same way IPv6 does.


Why?
----
It's my position that anonymity is only necessary in the presence of oppression.
In the absence of oppression, anonymity primarily facilitates crime and wrong-
doing. When present, it continues to do that, but also provides a means of
escaping oppression, and defeating oppressors.

Are we in an oppressive society? Do oppressive societies exist? I believe the
answer to both of those questions is yes. I wish it were otherwise.


Author
------
  Name   : Nick Thomas
  Handle : lupine
  Web    : lupine.me.uk
  Comms  : nick@lupine.me.uk
Initial commit 2013-08-02 11:01:57 -07:00			`hide-eid`
			`========`
Add some information to the README. 2013-08-02 20:17:12 +01:00
			`A suite of tools for hiding Endpoint IDs (IPv4/IPv6 addreses) from intermediate`
			`participants in the Internet.`

			`Overview`
			`--------`
			`As the Location/Identity Separation Protocol people have noted, IPv4 and IPv6`
			`both use IP addresses (Endpoint IDs, in the lingo) to make routing decisions.`
			`Focusing on this from the point of view of routing table efficiency / features,`
			`they note that this is sub-optimal, and that intermediate routers do not need`
			`the EID to make routing decisions, if an alternative (routing locater, or RLOC)`
			`is present in the packet.`

			`What seems to have gone unnoticed is that there are privacy implications here.`
			`As recent PRISM, XKeyscore and other disclosures have shown, these intermediates`
			`are complicit in, or at least vulnerable to, attacks by national security`
			`agencies, who take advantage of the visibility of these EIDs to construct`
			`comprehensive logs of who is talking to who; even if the content of the`
			`communication is encrypted, the simple fact that an individual has communicated`
			`something somewhere may be enough for these agencies to justify taking futher`
			`action against them.`

			`Since the EID is not needed by these intermediaries, it makes sense to stop`
			`giving it to them as quickly as is possible. These tools aim to implement a`
			`simple scheme that achieves this goal, with speed / ease of implementation, and`
Some doc / example updates 2013-08-06 23:16:28 +01:00			`the ability to scale to traffic levels of around 40Gbps (for small to medium ISPs).`
Add some information to the README. 2013-08-02 20:17:12 +01:00
			`Removing knowledge of the EID from them means that any affected traffic enjoys`
			`the status of being in an anonymity set that is as large as the number of people`
			`who share the same RLOC. In this scheme, I assume that this is an access ISP, on`
			`one end of the path; and a hosting ISP, on the other end. This provides typical`
			`anonymity sets of between a few hundred to a few million individuals.`

			`To remove these EIDs, we need to start by creating an EID-to-RLOC map. A first`
			`pass for this is an /etc/hosts equivalent; a second pass could be a DNS node`
			`(like ip6.arpa); and a third pass might be using BGP transient attributes or a`
			`proper LISP system. This allows access and hosting ISP to discover which RLOC`
			`they should use for any given destination EID.`

			`The registry also contains a public key that is to be used to encrypt the parts`
			`of any passed packet that are sensitive. Assuming the protocol being run over IP`
			`is encrypted (HTTPS or SSL SMTP, for instance), this may just be the IP+TCP/UDP`
			`header, or it may be the whole packet.`

			`When the access ISP receives a packet from its subscriber with a destination IP`
			`that is present in this registry, it encrypts the relevant portion of the packet`
			`with the public key, then encapsulates the packet in an IP header of its own.`
			`This IP header has the RLOC for the wrapping ISP as the source IP, and the RLOC`
			`obtained from the registry as the destination IP. The wrapped packet is then`
			`forwarded for routing to the destination.`

			`Since the RLOC is just an IP address, and one controlled by the destination ISP`
			`at that, the route the wrapped packet takes through the internet will be about`
			`the same as if the packet had never been wrapped. This is a large advantage of`
			`the scheme over onion routing, such as tor; no significant latency is added.`

			`When received by the destination ISP, it can use its private key to decrypt the`
			`encapsulated packet, and send that decrypted packet on to its final hop.`

Another night's work - move to ECDH + AES256 from RSA pubkey 2013-08-08 00:48:02 +01:00
Some doc / example updates 2013-08-06 23:16:28 +01:00			`Usage`
			`-----`
			`Pass 1 now exists, in a rudimentary form. Here's how to put together a couple of`
			`hide-eid endpoints that can talk to each other.`

			`First, you need two machines - one is the source, the other the destination. Both`
			`should have an IPv4 address routed to them that is not claimed on the machines`
			`themselves. These will be your RLOCs. They should be globally routeable! Public`
			`IPs, in other words.`

			`On each machine, you'll also need a range of IPs. These will be your EIDs. They`
			`need to be globally unique only within the context of the EID-to-RLOC registry`
			`maintained by this project - they can even be RFC1918 space, as long as there`
			`are no overlaps within this registry. Remember, EIDs aren't used to make routing`
			`decisions across the Internet..`

Another night's work - move to ECDH + AES256 from RSA pubkey 2013-08-08 00:48:02 +01:00			`Generate some ECC private keys, and their public components, in PEM format:`
Some doc / example updates 2013-08-06 23:16:28 +01:00
Another night's work - move to ECDH + AES256 from RSA pubkey 2013-08-08 00:48:02 +01:00			`$ openssl ecparam -genkey -out rloc1.private.pem -name secp160r2`
			`$ openssl ec -in rloc1.private.pem -pubout -out rloc1.public.pem`
			`$ openssl ecparam -genkey -out rloc2.private.pem -name secp160r2`
			`$ openssl ec -in rloc2.private.pem -pubout -out rloc2.public.pem`
Some doc / example updates 2013-08-06 23:16:28 +01:00
			`Add entries to the rloc-registry.json file to reflect your mappings. You need to`
			`put an entry (a JSON object) to the "eid_rloc_map" array, like this:`

			`{ "family":"ipv4", "network":"10.0.0.0", "netmask":8, "rloc":"1.2.3.4"}`

			`(IPv6 support isn't in yet)`

			`You also need to add an rloc:pubkey mapping to the "keys" object. Make sure`
			`it's not the private key! Also, remember to add all the EID mappings and RLOCs,`
			`not just one.`

			`Then, on each machine:`

			`$ cd pass-1`
			`$ make all`
			`host1$ ./hide-eid rloc-registry.json eid0 eid0 <rloc1> <rloc1>.private.pem`
			`host2$ ./hide-eid rloc-registry.json eid0 eid0 <rloc2> <rloc2>.private.pem`

			`You'll notice quite a lot of uninteresting output; it's wordy for all the wrong`
			`reasons at the moment. Of particular note are a wide range of TODOs.`

			`One of those TODOs is bgp support for route injection. Since it's not done yet,`
			`you need to add the routes yourself:`

			`host1$ ip route add <eid-range-for-rloc-2> dev eid0`
			`host2$ ip route add <eid-range-for-rloc-1> dev eid0`

			`Also, make sure that an EID from each range is routable on the respective machines.`
			`For testing, I just did:`

			`host1$ ip addr add <eid-ip-for-rloc-1> dev eid0`
			`host2$ ip addr add <eid-ip-for-rloc-2> dev eid0`

			`The short version is that traffic to and from those EIDs must go into the TUN`
			`device controlled by hide-eid for it to do the magic.`

			`At this point, you should be able to ping <eid-ip-for-rloc-2> from host1, and`
			`vice-versa, and get an ICMP echo reply back. You can also run TCP or UDP`
			`servers on one of the IPs, and connect to them from the other IP. If you run`
			`wireshark or tcpdump on an intermediate machine (or just one of the hosts, if`
			`you focus on the egress/ingress traffic) you'll see obscure IP packets with`
Another night's work - move to ECDH + AES256 from RSA pubkey 2013-08-08 00:48:02 +01:00			`just the RLOC addresses as source and destination, and no visible UDP/TCP`
			`headers. IP Protocol is set to 99 - "any private encryption scheme".`


			`Encryption`
			`----------`
			`Encryption scheme is really the only novel portion of this project; the rest is`
			`covered in the L/ISP RFCs. This code is all about slapping together a basic`
			`L/ISP router (badly), and implementing cryptography for the encapsulated IP`
			`packets, for the sake of experimenting. Crypto is hard, and experimentation is`
			`key (ha ha).`

			`Current scheme:`
			`~~~~~~~~~~~~~~~`
			`This seems less stupid.`

			`* EC public keys in central repository`
			`* Each participant knows only their private key`
			`* Generate ECDH secret for each peer using their public + your private key`
			`* pseudo-random 128-bit IV per-packet, put at the start of encrypted data`
			`* Use as256 symmetric encryption with sha256( ecdh ) to encrypt / decrypt`

			`Main point is that routers don't need to communicate with each other to`
			`negotiate a shared key - they can independently derive the same asymmetric key`
			`as long as they share some common assumptions, have their own private key, and`
			`the peer's public key.`

			`Asymmetric key size is smaller, and we're moving to a symmetric cipher for the`
			`actual packet encryption, so hopefully this will be much faster than scheme 0.`

			`Which curve should we be using? No clue. What size of key should we be using?`
			`No clue. Is this kind of shared key appropriate when we're passing considerable`
			`traffic? No clue.`

			`Scheme 0:`
			`~~~~~~~~~`
			`This was stupid.`

			`* RSA public keys in central repository`
			`* Just use public key to directly encrypt packet data`
			`* Use private key to decrypt packets addressed to you.`

			`This is slow, and you can only encrypt data that's smaller than the key modulus,`
			`or something like that.`

			`First result: rtt increased from 37ms to 80ms.`

			`For access<->hosting, that kind of latency increase is bad, but bearable. For`
			`hosting<->hosting, it's completely unacceptable.`

			`Not all of it may be crypto-related - worth implementing a no-op branch that`
			`just encapsulates, and checking the difference.`
Some doc / example updates 2013-08-06 23:16:28 +01:00
Add some information to the README. 2013-08-02 20:17:12 +01:00
Minor fixes to README.md 2013-08-03 14:51:06 +01:00			`Limitations`
Add some information to the README. 2013-08-02 20:17:12 +01:00			`-----------`
			`You have to trust two ISPs.`

			`Certainly for access ISPs, even with the best will in the world, the`
			`infrastructure between them and their layer 1/2 service providers may be bugged.`
			`This is not protected against by this scheme; if you suspect this is happening`
			`to your ISP without their knowledge, you can run IPSec over the link and allow`
Another night's work - move to ECDH + AES256 from RSA pubkey 2013-08-08 00:48:02 +01:00			`them to terminate it just before (or on) the box that wraps the packets. If you`
			`suspect it is happening with their knowledge, the best you can do is change ISP.`
			`If we run out of good ISPs, this scheme adds nothing. You can always start a VPN`
			`ISP.`
Add some information to the README. 2013-08-02 20:17:12 +01:00
			`If the other side of the link is complicit, this scheme does nothing. It isn't`
			`going to stop Facebook from handing all their records of your accesses to them`
			`over to the NSA. Stop using Facebook.`

Another night's work - move to ECDH + AES256 from RSA pubkey 2013-08-08 00:48:02 +01:00			`There are four cryptographic operations in each trip - encrypt outgoing packet,`
			`decrypt outgoing packet, encrypt return packet, decrypt return packet. This is`
			`going to be slower than no crypto. Too slow?`
Some doc / example updates 2013-08-06 23:16:28 +01:00
Add some information to the README. 2013-08-02 20:17:12 +01:00			`May break ICMP and other responses from intermediate ISPs. Path MTU discovery`
			`breaks, for instance, with a naive implementation of this scheme, as does`
			`ICMP tracerouting (this can be fixed, especially in IPv6 - see _ICMP_).`


Minor fixes to README.md 2013-08-03 14:51:06 +01:00			`Selling points`
Add some information to the README. 2013-08-02 20:17:12 +01:00			`--------------`
			`Uptake can be low (but not zero) and significant benefits are still seen. Even`
			`if just two ISPs take up the scheme, one access and one hosting, everyone`
			`who uses the access ISP is now anonymous for any of their traffic that goes to`
			`the hosting ISP. Privacy-conscious individuals can take note of that and move`
			`to those ISPs, or tunnel their traffic to them, to regain their anonymity.`

			`Faster than Tor. Especially in the latency stakes.`

			`Requires no CPE changes. This killed IPv6 uptake for a decade - end users are`
			`not easy to upgrade. L/ISP schemes typically require the holder of the EID to`
			`be in charge of looking up and using RLOCs; this scheme does not need that.`

			`Probably stateless. Putting the encrypted EIDs into the packet we send means`
			`that the source and destination ISPs don't need to perform connection tracking.`
			`This isn't NAT in the traditional sense.`

			`Since both source and destination enjoy a large anonymity set, this scheme is`
			`resilient to correlation attacks. An earlier revision only encrypted the`
			`source EID, which was vulnerable to trivial attacks of that nature.`


			`ICMP`
			`----`
			`As noted, a naive implementation breaks ICMP responses by intermediaries. This`
			`is a result of the design; as they no longer know the EID, they can't send an`
			`ICMP response of any sort to it. As IPv6, in particular, relies on ICMP for`
			`protocol features such as path MTU discovery, this is something of a problem.`

			`One solution to this is to have a wide range of RLOC IP addresses, and use them`
			`in a round-robin manner to maintain a local map of RLOC -> real source IP, which`
			`would be retained for a short period (some seconds, say). If an ICMP reply is`
			`received, directed to one of these RLOCs, the EID can be looked up from the map`
			`and the packet can be rewritten with it and forwarded appropriately. Of course,`
			`this is state for the ISP to keep track of, and monopolises a segment of the IP`
			`address space. That last is not a problem in IPv6, but will prevent its use in`
			`almost all IPv4 deployments. Fortunately, IPv4 is legacy, and doesn't strictly`
			`require ICMP, in the same way IPv6 does.`


Minor fixes to README.md 2013-08-03 14:51:06 +01:00			`Why?`
Add some information to the README. 2013-08-02 20:17:12 +01:00			`----`
			`It's my position that anonymity is only necessary in the presence of oppression.`
			`In the absence of oppression, anonymity primarily facilitates crime and wrong-`
			`doing. When present, it continues to do that, but also provides a means of`
			`escaping oppression, and defeating oppressors.`

			`Are we in an oppressive society? Do oppressive societies exist? I believe the`
			`answer to both of those questions is yes. I wish it were otherwise.`


Minor fixes to README.md 2013-08-03 14:51:06 +01:00			`Author`
Add some information to the README. 2013-08-02 20:17:12 +01:00			`------`
Some doc / example updates 2013-08-06 23:16:28 +01:00			`Name : Nick Thomas`
			`Handle : lupine`
			`Web : lupine.me.uk`
			`Comms : nick@lupine.me.uk`
Minor fixes to README.md 2013-08-03 14:51:06 +01:00