Technical Overview

How Deflect fits together

NB: this flow should not be a list... watch this space for nice process diagram tbd

  1. user enters http://www.somesite.com/page into user agent
  2. UA (or proxy server) asks DNS
  3. DNS returns an edge server IP
  4. UA requests www.somesite.com/page from edge GET /page HTTP/1.1 www.somesite.com
  5. edge checks for local copy. If found and fresh, returns to client, end. if no local copy found, edge requests from origin GET /page HTTP/1.1 Host: www.somesite.com if found in cache but not fresh, client attempts to revalidate with an IMS request to origin: GET /page HTTP/1.0 Host: www.somesite.com If-Modified-Since: Sat, 6 Oct 1985 1:24:00 GMT
  6. origin webserver gets request, and builds page if dynamic If we requested If-Modified-Since, and the page has not been modified since the moment Marty arrived back in 1985, then the server might return “304 Not Modified” - in that case the edge updates time on cached object and serves from cache. Otherwise, whether IMS or not, the server will respond with the appropriate HTTP code and data: probably 200 OK (though it could be any code). If the edge cannot be reached in without origin-timeout, we choose to serve from cache no matter how old.
  7. edge returns response to UA
  8. UA optionally caches reponse locally

fin.

How to build Deflect from nothing

Preconfigure DNS

host names for edges (optional)

edge pool- edge.deflect.ca (multiple A records)

origin.site.com

note: do not reconfigure www.site.com at this point

Configure a master and secondary master node

debian ver

apt-get install prereqs

download trafficserver source

unpack in usr/local/src/

./configure --layout=apache
make && make install
mkdir /usr/local/deflect
cp -rp /usr/local/trafficserver /usr/local/deflect/trafficserver-app
cd /usr/local/deflect
cp -rp trafficserver-app/conf trafficserver-conf
mkdir trafficserver-confstg

install packages

copy custom scripts

build ats and place in repository

configure nagios server

2ndary master instructs tbd

Configure some edge nodes

automated node build - Daily tasks - create an edge node

add to monitoring - Daily tasks - Add a host to monitoring

Configure ATS

Follow the instructions for editing a config file in [Daily tasks - Apache Traffic Server configuration - it’s not necessary to push each file separately. It’s a trade-off between validating the configuration as you go along, and the reality that some changes cannot be test piece-meal. At the time of writing (15 Feb 2011) there are five files we run that are different from stock. Look at the files on disk, under /usr/local/deflect/trafficserver-conf/ for specifics, but here’s an overview:

  • remap.config

This is the file that controls what HTTP Host header maps to what origin server.

  • records.config

This file is the main configuration of the ATS application behaviour - for example, this file determines whether to check remap.config at all! Some caching behaviour is configured here- for example, whether to over-ride the Cache-control: headers we receive from origin servers.

  • cache.config

A lot less config here than you might expect- mostly just the TTL in cache per domain, and the catch-all default TTL.

  • plugins.config

We use the conf_remap and the stats_over_http plugins.

  • logs_xml.config

Our custom log format is defined here. It is basically netscape “common” format, with some addition caching information - it logs one code like TCP_HIT, TCP_MISS, TCP_IMS_HIT (that last mean ats sent an If-Modified-Since request and received a 306 Not Modified response)

Test

See Testing

  • Most important is functional testing.

Operational acceptance

master is backed up

dns is accurate and complete

logs are being collected from all edges

monitoring is active and alerting for all nodes (master and edges)

functional testing passed for all edges

performance has been tested and is adequate

Go-live

point the public URLs (eg site.com, www.site.com) at the pool of edge servers

fin

Notes on the sysadmin choices

Good Practices

  • make it easy for yourself and others

Sometimes you’re not the one fixing a problem- if standards are adhered to it’s a lot easier to troubleshoot. Other times, you might not remember what you did last time. Therefore: standardise, document, take copious but orderly backups, and broadcast the work you’re doing for the informational and peer review purposes. this means less headaches for everyone. Consider documenting work before you start, or work from existing documentation- and be very wary of any deviation from the document. If you have to deviate, then capture it!

  • infrastructure grade

Deflect is an infrastructure service- downtime = bad.

  • that’s all

Local Practices

Here’s some of the ways we try to implement the principles of stress free administration....

The bvi wrapper

Take a look at:

:/usr/local/deflect/scripts/bvi

it’s a simple script, for now, that creates a temporary copy of the file you want to edit and opens that for editing. After your editor (vi) quits, it compares the new files to the old one. If there’s a difference, it copies the old one to a dated backup of itself in the old/ subdirectory of the current directory, moves the new one into place, and emails a diff to sysadmin@
in the future i hope to smarten it up a bit. it’s not meant to provide security or strictly enforce policy - it’s just there to make good practices easy for the well-intentioned sysadmin. near future improvements include editor choices and alerting the watch list to files that have been edited without backups being taken.

Capturing your session with screen

A simple .screenrc file exists at ~root/.screenrc - if you run screen as root, screen will do two important things:

  1. keep a log of your session so to aid review, documentation, troubleshooting and collaboration later
  2. allow you to reconnect to a session if your login session is interrupted for any reason.

Even better practice is to copy .screenrc to your own homedir, create the directory $HOME/screenlogs and run screen as yourself, before you su to an administrative user account - this makes it even easier to enjoy the benefits of a logged session after the fact.

/usr/local/deflect/Makefile

This is a cludge, but it’s doing its job pretty well at the moment. Have a look inside the file - or run make in that directory with no arguments for useful help text. This is the main interface for controlling the edges, and any work that needs to be done on edges (plural) should be captured in the makefile.

In time this may be rolled into a shell or perl script, or the makefile cleaned up.

It currently calls some of the scripts in /usr/local/deflect/scripts/ - not a bad place to have a poke around to see what does what.

/usr/local/<packagename>

I like to keep everything where I can see it- so I build to /usr/local/. I think it makes upgrades easier, to name but one advantage.

SSH and passwords

As described in https://xkcd.com/936/ - the “search space” for two different passwords - that is how many you might have to try before you’d be certain of brute forcing this password.

gJ823$%aK\
= 6.05 x 10e19 or 5,748,511,570,879,116,626,495
vs
ifyou'rehappyandyouknowitclapyourhands
= 1.99 x 10e67 or 19,943,457,888,530,122,458,259,355,763,514,562,458,206,830,362,647,183,073,840,370,234,460

That’s six sextillion for a bitch-to-remember password that you have to write down or store (bad) vs twenty unvigintillion (I looked it up :) for a really easy to remember password - that’s a factor of over 100 tredecillion! SSH keys are like the PGP of login security. The properly guarded private key is never ever exposed- the public key is used to generate a challenge that can only be met by the private key. The response is then verified using the public key, and if it’s a match, and the target host’s sshd is configured to allow the owner of the private partner of the public key it has to log on as the requested user, access is granted. The future is ssh keys encrypted with strong passphrases- much better. I will make this section better in future, maybe using the number unvigintillion again.

Naming conventions

and its online backup backup are named for their roles. the edge hosts are named by the VPS provider for each, in the form providerN.deflect.ca, where N indicates whether this is the first, second, Nth VPS at that provider, eg

edge1.deflect.ca
edge2.deflect.ca
edge3.deflect.ca
edge4.deflect.ca