import site.body

A different take on routing domains

One thing that has always amazed people has been the setup of my home network. The setup of which has helped me to land at least 2 jobs. After letting it fall into disrepair i finally found the energy to rebuild it and reassess some of the decisions i made 8 years ago. This coupled with my containers knowledge from maintaining doger.io allowed me to uncover and use various Linux features in a way you may not have thought of before and may not have known existed.

The Brief

My home network deals with traffic from different sources that must be treated differently on my network. Everything from completely isolated, isolated until final destination and providing normal routed traffic.

This setup can be distilled into 3 simple requirements:

  • Route traffic for multiple separate domains (eg internet, private and hostile)
  • Provide services to all networks at differing accessibility
  • Prevent traffic leaking from one domain another except through well defined rules

This may look familiar to some as being similar to Multi category security from SELinux and in fact i briefly spoke about this as LCA2015 (video, youtube, slides). In Multiple category security you must have all the required labels in order to be able to access the object in question, be it a packet, file or some other thing the kernel provides. Extending this to networking, you would tag a packet on ingress and prevent it from interacting with system processes or egressing if the rules did not allow it. While i did not use SELinux due to its maturity (my network was designed when SELinux was new) i did learn some lessons from it and it has driven the design of this solution.

Pushing packets

The predominate way that i move traffic around my network is by a series of OpenVPN tunnels in tun mode (no l2 traffic here, IPv6 is enabled). This 'network' is a series of machines globally around the world. To distribute and maintain routing information a BGP daemon is used.

BGP may seem like an odd choice however i have been fortunate enough to gain a experience with it (helped along by my own home deployment). As such i tend to reach for it very early as i can generally can have it up quicker and with less hassle than a staticly routed network.

Back in 2006 i mainly maintained this network via careful routing policy, firewalls and naming of the interfaces. This is a somewhat brittle approach that requires the full cooperation of all parts. A mistake in any one of the above can cause any of my machines to fall off the internet. A mistaken route change also has the possibility of taking all machines offline. As a single admin with a bit of networking tinkering under my belt, This was less of an issue. However in a team environment this is a highly risky environment.

This sounds like a job for alternate routing tables

When describing this environment one alternative solution that network admins are quick to point out is the use of alternate routing tables. This is a solution whereby different traffic is handled by the router in different ways depending on rules and is part of a complex topic known as 'Policy Routing'.

At the time i originally tried to set this network off i discarded alternate routing tables as an option for a couple of reasons. The primary reason was that my routing daemon of choice for BGP (quagga) did not support alternate tables. Alternate routing tables and Policy Routing are a topic unto themselves and this complexity and lack of documentation (at the time) was another reason for discarding this as a choice.

If it were not for the technology choices and requirements then this may be a valid solution for you and one you may already have in place on larger networks. The solution i have below is perhaps a little more user friendly at the cost of reduced efficiency (both cpu and allocation of resources such as IP addresses).

Cards on the table

So what am i trying to sell you on? In this case it is the use of network namespaces (for container people: a container that only deals with networking but more liteweight) for each 'context' on my network and setting up network links between them as required by policy to allow you to filter and scrub traffic before it goes to the next 'context'.

In this particular case the network namespace handling the network 'context' is what i call a 'rdomain' or routing domain and is a copy of the Linux networking stack to deal with traffic in that instance only. it has its own localhost, firewall routing table and other network features that do not overlap with the host system.

Each context is like an implicit 'tag' in the SELinux example above and the routes and firewalls as well as the interfaces between the rdomains themselves being the ruleset. Hopefully a ruleset you are more familiar with.

How does this work

The iproute2 command has the ability to set up and maintain network namespaces as well as set up interfaces between them. It can also move interfaces from one namespace to another and be told to rexecute itself in a particular namespace to make changes such as setting an ip or adding a route. Changes made in each of these namespaces do not affect other namespaces, effectively isolating any change you make from affecting the rest of the system. Creating them is as simple as issuing the command below:

/sbin/ip netns add {name}

Sidetracked by systemd

The creation of routing domains at boot time can be automated significantly via the use of systemd. If you are using another init system, i will assume this is for good reason and that you are capable of creating the relevant files for it to have routing domains created on boot.

rdomain@.service

[Unit]
Description=Isolated Routing Domain
After=network.target

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/ip netns add %i
ExecStop=/sbin/ip netns delete %i

[Install]
WantedBy=network.target
Alias=rdomain@%i.service

This is an example of a unit file with multiple instances, as such an instance needs to be activated as below. You will need to replace {name} with the name of your routing domain.

# systemctl enable rdomain@{name}

This will cause the routing domain to be started on boot, but it will not start to do this a separate command must be issued:

# systemctl start rdomain@{name}

Lets poke things!

You should now have an isolated routing domain with no connectivity to the outside world (it will only have a loopback interface). You can jump into it with the following command to play with it a bit and do some quick diagnostics.

# /sbin/ip netns exec {name} /bin/bash

To 'bridge the gap' between this isolated domain and the networking you use normally on the machine a 'pipe' of some sort needs to be built between the 2 contexts. The tool of the trade in Linux for this is the VETH pipe as i talk about VETH. One end lives in your normal networking setup and the other end of the pipe is visible in this isolated domain. Having an interface like this means you can implement routing policy and firewalling at either end and treat the routing domain like a directly attached network interface. This allows you to reuse whatever tools you were using on the machine to manage the new connection.

Stop... its systemd time (again)

Your rdomain may require some customization above and beyond what the base unit file above provides and this can be done by creating a directory named after the unit file in /etc/systemd/system (if using systemd).

# mkdir /etc/systemd/system/rdomain@{name}.d

Files ending in ".conf" will be read from this directory and applied one by one, allowing you to splice in some networking setup such as that shown in the example below.

This example sets up a link between your normal networking environment and your rdomain but does not set an ip (this is left as an exercise to the reader). It also ensures that the link is brought up at both ends so that it may pass traffic. Care has been taken to ensure that you don't need to hard code the instance name through the use of '%i'.

rdomain@{name}.service.d/10-networking.conf

[Unit]
Description=Inter domain routing for rdomain

[Service]
ExecStartPost=/sbin/ip li add veth-%i type veth peer veth0 netns %i
ExecStartPost=/sbin/ip li set veth-%i up
ExecStartPost=/sbin/ip netns exec %i /sbin/ip li set veth0 up
ExecStopPost=

Now remember how i said i used OpenVPN? what if i want to have my VPN transverse the internet at large but carry private traffic? Would OpenVPN have to exist in both network contexts at once? in this case no. OpenVPN generates a network interface that can be placed into a separate context to the one the OpenVPN is transmitting the encrypted traffic in. In fact OpenVPN has hooks to run scripts on the interface becoming available and with careful use of the /sbin/ip command can place the new interface in your private routing domain for you.

/sbin/ip link set {iface} netns {name}

OpenVPN also allows you to override the command used to set up the network (normally iproute2). By setting the 'iproute' config option to the command below OpenVPN will set up the interface in the routing domain for you.

/sbin/ip netns exec {name} /sbin/ip

You will see the above reused in the systemd customization file. This is a common pattern for executing commands in the right context and will likely be something you embed in a lot of config files that touch the routing domains directly.

The astute reader may have noticed that if i have multiple cooperating sites all carrying 3 types of traffic then i have tripled the number of VPN tunnels i need to create. This is correct, an alternative would be to have a single OpenVPN tunnel between both sites and run a GRE or VXLAN tunnel to the other endpoint to multiplex multiple isolated networks over the one connection. The GRE or VXLAN endpoint can then have its interface moved from the main routing container (aka 'the host') to that of the dedicated routing domain, 'private' in the above example. This can be automated via the use of OpenVPN 'route-up' and 'route-pre-down' config options that will invoke a script of your choosing (why not a systemd unit file via systemctl start {tunnel}).

As my routing daemon of choice is still quagga these days, the last step in this entire setup would be to run multiple instance of quagga on a single box in different contexts. As i am running systemd on all my routing instances it is a fairly simple operation to make quagga multi-instance. Once multiple instances are running i would have each routing domain peer with each other (paying attention to ingress routes filtering) rather than static routing however to reduce the amount of peering/setup work you may want to do you can drop this requirement and instead statically route between each namespace. BGP Per rdomain does mean you will have 3 BGP sessions (in my case) between each endpoint however the filtering policies should be simpler as a result and easier to generate automatically.

/sbin/ip netns exec private /usr/bin/quagga -c /etc/quagga/instances/private

The above is built up of lots of small parts, while the use of multiple namespaces is still obscure i feel that it is more 'user visible' than the alternate routing tables hack. The use of VETH in this particular case is the thread leading down the rabbit hole that provides discoverability and while a rocky road it is better than relying on someone having obscure knowledge.

There are some things this new solution does not solve well. The providing of services is diminished by the fact that for me this was mainly HTTP via nginx and nginx is unable to exists in all the contexts at once (and deal with that at all same with apache) as such it is likely that i would have to look at a multi-instance nginx as well or route/firewall traffic in such a way that it gets to the appropriate domain.

You may note in the above setup that it does support multiple overlapping address ranges, you could have 192.168.0.0/16 in 2 separate contexts without conflict until you try and route between the two. By using firewall remapping (eg NAT) or by keeping the 2 contexts isolated you can have both exist on the system without conflict.

The overlapping range feature however is part of a bigger concept of full traffic isolation. As each instance of the network stack exists in separation, there is a greatly reduced chance of an accident causing traffic leakage between domains. This property is one i am deeply interested in and gives robustness against accidental mistakes. This robustness against mistakes is a highly desirable property allowing experimentation without fear of rendering their machine inaccessible to repair and isolates damage to the rdomain.

The technique could also be applied to core or edge routers in a workplace. Providing you use Linux as your main router and are willing to take come chances (if you break it, you get to keep both halves of the network). It is not going to be as anywhere near as fast as commercial hardware (that VETH pipe causes the networking stack to see the packet multiple times) however it may come in handy one day if you have to deal with multiple customers or environments. One example being dealing with overlapping network ranges (eg from a merger or due to the requirements of being a service provider).

Not always about routing

It is possible to take this all a step further. If you are interested in further traffic isolation while allowing traffic to intermingle then some trickery can be done via IPSEC and authenticated traffic mixed with SELinux.

Another use case is for split routing protocol/routing daemons where the injection of routes into the kernel is a separate process to the speaker of the protocol. Trickery can be done to run the protocol in one domain while the routes get injected into another as UNIX file sockets on disk are treated differently to other network connections (they belong to the filesystem namespace and not the network namespace).

One other place where this would be appreciated is when you need to ensure that a process and all its children has its traffic routed via a captive interface with no possibility of it leaking (eg forking off a child process that is not proxy aware that makes a DNS lookup or TCP connection). Using the above techniques you can create an environment where there is no way to connect to an unsafe network for a subset of processes on your system.

An example of this above is confining your voip soft phone to a VPN. This can be done to ensure that the phone binds to the correct interface and does not try and send packets out a primary interface that may be behind a NAT (NAT likes to break voip in entertaining ways).

Another nice trick if you have a work VPN that uses the same IP address range as your home network. You can spin up an rdomain for work and use the normal networking for home and still access work resources when working from home while still being able to stream music from your fileserver.

In a similar vein, if you look after multiple networks or customers with overlapping ranges then instead of 'switching' between them with something like a VPN you can stay connected to all of them all the time and execute specific commands in different rdomains/contexts. With virtual desktops this can be a powerful workflow especially in interrupt driven/context heavy environments (such as talking to different customers via phone calls) due to reduced burden for context switching (and with less devastating results for running processes).

Keep routin' routin' routin' routin'

Now you may (Hopefully) get to this point and ask yourself why did i start doing it this way 10 years ago and what motivated me to write up the above? I cant really give a good reason except to refer to the first paragraph and mention 'crazy' a bunch of times.

It was interesting looking back at the decisions i have made years ago and applying a fresh set of eyeballs to the problem. The new setup is more robust and provides better isolation and was an interesting application of container technology. It is my hope that this is useful to someone else and gets used out in the real world to solve larger problems than mine. If you want to know more or just say hi feel free to drop me a line at 'blitz' at this domain or 'dablitz' at this domain for jabber/xmpp.

Links

For some more information on the above subjects then have a look at the following