Log in

Previously I talked about using custom authentication for a web application, supporting two methods of authentication simultaneously. Browser-based users would be challenged with a typical styled login form, while applications integrating with the service would be challenged with HTTP Basic Authentication. The choice of which authentication to use would be based on how clients are classified. Taking advantage of content negotiation, clients preferring a HTML response would be classified as a "browser" and would be challenged with a login form; any other clients would be classified as an "app" and would be challenged with HTTP Basic Auth.

Implementing this custom authentication and classification scheme for Pylons was quite simple using repoze.who. Here I describe how I implemented it.

I'll start from a skeleton Pylons app.
$ paster create -t pylons CustomAuth

The defaults of 'mako' for templates and no SQLAlchemy are fine for this example.

I based my repoze.who configuration off this recipe from the Pylons cookbook but I'll quickly repeat the necessary steps here, so that the example is complete.

If you haven't done so already, install the repoze.who package with:
$ easy_install repoze.who

The next step is to add repoze.who to the WSGI middleware of your Pylons app. Edit config/middleware.py and add an import:
from repoze.who.config import make_middleware_with_config as make_who_with_config

Then after the comment "CUSTOM MIDDLEWARE HERE" add the following line:
app = make_who_with_config(app, global_conf, app_conf['who.config_file'], app_conf['who.log_file'], app_conf['who.log_level'])

Now edit development.ini and add to the [app:main] section:
who.config_file = %(here)s/who.ini
who.log_level = debug
who.log_file = stdout

Now create a who.ini file in the same location as development.ini containing:
use = repoze.who.plugins.form:make_redirecting_plugin
login_form_url = /account/login
login_handler_path = /account/dologin
logout_handler_path = /account/logout
rememberer_name = auth_tkt

use = repoze.who.plugins.auth_tkt:make_plugin
secret = yoursecret

# identification and challenge
use = repoze.who.plugins.basicauth:make_plugin
realm = CustomAuth

request_classifier = customauth.lib.auth:custom_request_classifier
challenge_decider = repoze.who.classifiers:default_challenge_decider

plugins =

plugins =

plugins =

plugins =

You would replace "customauth" with the package name of your Pylons app.

Take note of the request_classifier in [general]. It specifies a custom classifier function "custom_request_classifier" located in the lib.auth module of your application. This function is called for each request and returns a classification that, for this application, will be either "browser" or "app" (some other classifications are possible, like "dav", but we're not worrying about them in this application; they'll be treated like "app").

You can see that in the [identifiers] and [challengers] sections there are multiple plugins listed. The choice of plugin to use in each case is based on the value returned by the classifier. If the classifier is "browser" then the "form" challenger will be used, otherwise the "basicauth" challenger is chosen. This is the key to the custom authentication, and as you can see it is all handled by repoze.who and extremely simple to configure.

Create an auth.py file in the lib directory of the Pylons app containing:
from webob import Request

import zope.interface
from repoze.who.classifiers import default_request_classifier
from repoze.who.interfaces import IRequestClassifier

class UserModelPlugin(object):
    def authenticate(self, environ, identity):
        """Return username or None.
            username = identity['login']
            password = identity['password']
        except KeyError:
            return None
        if (username,password) == ('foo', 'bar'):
            return username
            return None
    def add_metadata(self, environ, identity):
        username = identity.get('repoze.who.userid')
        if username is not None:
            identity['user'] = dict(
                username = username,
                name = 'Mr Foo',

def custom_request_classifier(environ):
    """ Returns one of the classifiers 'app', 'browser' or any
    standard classifiers returned by
    classifier = default_request_classifier(environ)
    if classifier == 'browser':
        # Decide if the client is a (user-driven) browser or an application
        request = Request(environ)
        if not request.accept.best_match(['application/xhtml+xml', 'text/html']):
            # In our view, any client who doesn't support HTML/XHTML is an "app",
            #   not a (user-driven) "browser".
            classifier = 'app'
    return classifier
zope.interface.directlyProvides(custom_request_classifier, IRequestClassifier)

This is where the custom_request_classifier function is defined. It first calls the default_request_classifier provided by repoze.who, which attempts to classify the request as one of a few basic types: 'dav', 'xmlpost', or 'browser'. If the default classification results in 'browser' then we try to classify it further based on content negotiation. If the client prefers a HTML or XHTML response then we leave the classification as 'browser', otherwise we classify it as 'app'.

The other part of the auth module is the UserModelPlugin class. This class provides "authenticator" and "mdprovider" plugins. The job of the authenticate method is to authenticate the request, typically by verifying the username and password provided, but of course that depends on the type of authentication used. In this example, we simply provide a stub authenticator that compares authentication details against a hard-coded username/password pair. In a real app you would authenticate against data in a database or LDAP service, or whatever you decided to use.

The add_metadata method of UserModelPlugin is called to supply metadata about the authenticated user. In this example we simply supply a hard-coded name, but in a real app you would fetch details from a database or LDAP or whatever.

The final bit of code needed is the login form. Create an account controller:
$ paster controller account

Then edit controllers/account.py and add a login method to AccountController:
    def login(self):
        identity = request.environ.get('repoze.who.identity')
        if identity is not None:
            came_from = request.params.get('came_from', None)
            if came_from:
        return render('/login.mako')

Also add a test method to the same controller so that we can verify authentication works:
    def test(self):
        identity = request.environ.get('repoze.who.identity')
        if identity is None:
            # Force skip the StatusCodeRedirect middleware; it was stripping
            #   the WWW-Authenticate header from the 401 response
            request.environ['pylons.status_code_redirect'] = True
            # Return a 401 (Unauthorized) response and signal the repoze.who
            #   basicauth plugin to set the WWW-Authenticate header.
            abort(401, 'You are not authenticated')
        return """
Hello %(name)s, you are logged in as %(username)s.
<a href="/account/logout">logout</a>
""" %identity['user']

The test action checks whether a user has been authenticated for the current request. If not, it forces a 401 response which will have a different effect depending on which classification was chosen. If the request was classified as "browser" then, due to the repoze.who config specifying "form" as the challenger plugin for this classification, the repoze.who middleware will intercept the 401 response and replace it with a 302 redirect to the login form page. For any other classification, the "basicauth" challenger will be chosen which will return the 401 response with an appropriate "WWW-Authenticate" header.

Note that we needed to suppress the StatusCodeRedirect middleware for the 401 response to prevent Pylons from returning a custom error document and messing with our 401 error.

In a real application you may want to move the identity check into the __before__ method of the controller (or BaseController class) or into a custom decorator. Or you could use repoze.what.

In the templates directory create login.mako containing a simple form such as:
    <form action="/account/dologin" method="POST">
      Username: <input type="text" name="login" value="" />
      <br />
      Password: <input type="password" name="password" value ="" />
      <br />
      <input type="submit" value="Login" />

Now you should be ready to run the application and test authentication.
$ paster serve --reload development.ini

Using your favourite web browser, go to

You should immediately be redirected to /account/login (with a came_from parameter) with your login form displayed. Enter bogus details and you shouldn't make it pass the form. Now enter the hard-coded login details ("foo", "bar") and you should be authenticated and see the text from /account/test.

Now we can test whether basic auth works. Using curl, try to fetch /account/test
$ curl -i
HTTP/1.0 302 Found
Server: PasteWSGIServer/0.5 Python/2.5.1
Date: Tue, 03 Mar 2009 08:57:59 GMT
Location: /account/login?came_from=http%3A%2F%2F127.0.0.1%3A5000%2Faccount%2Ftest
content-type: text/html
Connection: close

    <p>The resource was found at <a href="/account/login?came_from=http%3A%2F%2F127.0.0.1%3A5000%2Faccount%2Ftest">/account/login?came_from=http%3A%2F%2F127.0.0.1%3A5000%2Faccount%2Ftest</a>;
you should be redirected automatically.
<!--  --></p>
    <hr noshade>
    <div align="right">WSGI Server</div>

You can see that, by default, the request is classified as 'browser' and so a 302 redirect to the login form was returned. Note that if no Accept header field is present, then it is assumed that the client accepts all media types, which is why the request was classified as "browser".

Now let's specify a preference for 'application/json' (using the Accept header) and see what we get.
$ curl -i -H "Accept:application/json"
HTTP/1.0 401 Unauthorized
Server: PasteWSGIServer/0.5 Python/2.5.1
Date: Tue, 03 Mar 2009 09:21:09 GMT
WWW-Authenticate: Basic realm="CustomAuth"
content-type: text/plain; charset=utf8
Connection: close

401 Unauthorized
This server could not verify that you are authorized to
access the document you requested.  Either you supplied the
wrong credentials (e.g., bad password), or your browser
does not understand how to supply the credentials required.

Perfect. We get a 401 response with a WWW-Authenticate header specifying "Basic" authentication is required. (Note that ideally we should return a JSON response body as that is what the client requested.)

Now we can repeat the request, including our authentication details.
$ curl -i -H "Accept:application/json" -u foo:bar
HTTP/1.0 200 OK
Server: PasteWSGIServer/0.5 Python/2.5.1
Date: Tue, 03 Mar 2009 11:39:43 GMT
Content-Type: text/html; charset=utf-8
Pragma: no-cache
Cache-Control: no-cache
Content-Length: 107

Hello Mr Foo, you are logged in as foo.
<a href="/account/logout">logout</a>

And there we have it. Dual authentication on the same controller.

RESTful HTTP with Dual Authentication

For a recent web service project I wanted to make it as RESTful as possible. It needed to provide both a user interface (for interactive users) as well as exposing an API for programmatic integration. So I implemented both, but the two are not separate. Every applicable resource is exposed under only one URI each, usable by both interactive users (with web browsers) and by applications.

The "magic" of HTTP content negotiation is what makes this work. Clients that prefer HTML will get a rich HTML UI to interact with the application and data. Clients that prefer JSON will get back a JSON representation of the resource and, similarly, those that prefer XML will get back an XML representation. So most URIs provide 3 representations of themselves: HTML, JSON and XML.

When web browsers make a HTTP request they send an "Accept" header indicating their preference for HTML, so interactive users get the rich HTML UI, all styled and pretty looking. However, they are still viewing exactly the same resource as those fetching the JSON or XML representation, just that it is pleasing to the eye and is surrounded by navigation and other UI niceties.

All this should be pretty familiar to those who already play with RESTful HTTP. The part of the implementation that may not be familiar is how I handled authentication.

To keep with the typical "web experience" for interactive users, I wanted to provide the conventional login form/cookies method of authentication. This method is all but useless for applications, so I wanted to provide HTTP Basic Auth for them.

Now, given that a resource lives on a single URI, how do we support both types of authentication at once? Or perhaps the question is: should we? I decided the answer was "yes", as I didn't want to force interactive users to have to use HTTP Auth (login forms are intrusive and unstyled [1]; most users aren't used to them; and, perhaps worse of all, you can't logout with most browsers without plugins or hackary [2]).

So how did I support two forms of authentication simultaneously without separating web UI URIs from "API" URIs? I relied on our old friend, content negotiation. I decided that: any client who negotiates to receive a HTML representation is classified as a "browser" and will be challenged for authentication with a login form (redirected to the login page) and remembered with cookies. Any other client will be classified as an "app" and will be challenged with HTTP Basic Auth (with a 401 response).

I tossed this idea around for a while, deciding if it was too much hackary, but decided to implement it and see how it faired in practice. My conclusion is that it does the job well, allowing a resource to not only provide multiple representations of itself, but to allow the authentication method to be chosen that best fits the client.

I share this because I am interested in comments from the RESTful community as to how others tackle this kind of problem. Is this a suitable use of content negotiation or am I pushing the whole RESTful ideology too far?

Is it better practice to separate the "UI" from the "API", in effect exposing a resource in two places (doesn't sound very RESTful to me)? Is it better practice to enforce only one type of authentication, making users accept the awkward way that browsers handle HTTP Auth?

On a final (implementation-related) note, I built the application in question using Pylons and for the custom authentication I used repoze.who which ended up being the perfect tool for the job. repoze.who is very pluggable and so with minimal code I was able to configure it to handle authentication in exactly the way I wanted. If I get a chance later I'll write about how I configured repoze.who with Pylons to handle dual authentication.

[1] When will the W3C improve HTTP authentication so that it can be optionally styled, doing away with the need for custom form/cookie auth for most web sites?

[2] When will browser makers add a simple logout option for HTTP Auth?

Mirrored swap with zfs on OpenSolaris

I recently installed OpenSolaris 2008.11 on my development server (highly recommended, btw). Out-of-the-box it installs with zfs root filesystems (a relatively new feature in the Solaris/OpenSolaris world) which makes it much easier to do many administrative tasks, such as taking filesystem snapshots, performing safe upgrades (upgrades are performed on a snapshot/clone of the live root, which can then be booted from; fallback to previous root is the easy backout method); and mirror the root filesystem onto a second disk.

After installing a second disk, mirroring the root filesystem was as easy as a zpool attach command (after partitioning & labelling the disk for Solaris use).

The install didn't, however, configure a swap partition on top of zfs. Just a plain old standard swap slice. Very boring!

Pre-Solaris 10 days I would configure mirrored swap (and root) using Disksuite. In these modern times I wanted to see how difficult it would be to setup a mirrored swap on top of zfs. Not too difficult at all, it turns out. This is how to do it.

Choose a slice that exists on both disks with the same size. In my case, the OpenSolaris install had configured a 2GB slice to use for swap. I disabled swap on that slice with:
$ pfexec swap -d /dev/dsk/c3d1s1

Then create a new mirrored zfs pool across the two disks (if you only have one disk, just create a standard zpool on the one slice):
$ pfexec zpool create -m legacy -f swap mirror c3d0s1 c3d1s1

Specify "-m legacy" to prevent zpool from creating and mounting a zfs filesystem at /swap automatically. We don't want to use this zfs pool for normal filesystems, and "legacy" tells zfs to leave it alone.

Next, create a zfs volume that can be accessed as a block device (like "/dev/{dsk,rdsk}/path"). This type of zfs volume is called a "zvol" and comes with block devices at "/dev/zvol/{dsk,rdsk}/path". It seems that zvols must be created with a fixed size (probably reasonable, given the confusion that growing and shrinking such devices could cause) so we use "-V" to specify the size of the volume. The only gotcha is that the size must be a multiple of the volume block size, so I chose the largest multiple of 512KB below the size of the slice (1.95GB in my case):
$ pfexec zfs create -V 1945600k swap/swap0

We can verify that worked by checking for a block device:
$ ls -l /dev/zvol/dsk/swap/swap0 
lrwxrwxrwx   1 root     root          35 Feb 13 18:48 /dev/zvol/dsk/swap/swap0 -> ../../../../devices/pseudo/zfs@0:2c

Finally, tell Solaris to start using it for swap and we are done:
$ pfexec swap -a /dev/zvol/dsk/swap/swap0
$ swap -l
swapfile                  dev    swaplo   blocks     free
/dev/zvol/dsk/swap/swap0 182,2         8  3891192  3891192

Lastly, check the status of the zfs pool, make sure it is healthy (usually worth doing this sooner!):
$ zpool status swap
  pool: swap
 state: ONLINE
 scrub: none requested

        NAME        STATE     READ WRITE CKSUM
        swap        ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c3d0s1  ONLINE       0     0     0
            c3d1s1  ONLINE       0     0     0

errors: No known data errors

Update: One last step (that I forgot in the original write-up) is to make the swap setting persistent. This is done with an entry in /etc/vfstab:
/dev/zvol/dsk/swap/swap0        -               -               swap    -       no      -

Make sure to test it with a reboot.
I find SMF (in Solaris and OpenSolaris) to be the best thing to happen to service management since someone decided that runlevels and symlinks were a handy way to control services at startup & shutdown. No more init.d scripts ... win.

An arguable drawback with SMF is that you have to define your service configuration with an XML file, called a service manifest. I think it would be fair to say that most people do what I used to do: copy an existing manifest and change the relevant bits. A simple but practical method, admittedly, but I decided it could be improved upon.

For that reason I recently put together a little tool called Manifold. It is a simple command-line tool, written in Python, that creates the SMF manifest for you after asking you some questions about the service.

The best way to explain what it does is with a demonstration. Here I will use Manifold to create an SMF manifest for memcached, showing how to validate the result and create the service with it.

Using manifold to create an SMF manifest for memcached is easy. Give it an output filename, then it will prompt for all the answers it needs to create the manifest.
$ manifold memcached.xml

The service category (example: 'site' or '/application/database') [site] 

The name of the service, which follows the service category
   (example: 'myapp') [] memcached

The version of the service manifest (example: '1') [1] 

The human readable name of the service
   (example: 'My service.') [] Memcached

Can this service run multiple instances (yes/no) [no] ? yes

Enter value for instance_name (example: default) [default] 

Full path to a config file; leave blank if no config file
  required (example: '/etc/myservice.conf') [] 

The full command to start the service; may contain
  '%{config_file}' to substitute the configuration file
   (example: '/usr/bin/myservice %{config_file}') [] /opt/memcached/bin/memcached -d

The full command to stop the service; may specify ':kill' to let
  SMF kill the service processes automatically
   (example: '/usr/bin/myservice_ctl stop' or ':kill' to let SMF kill
  the service processes automatically) [:kill] 

Choose a process management model:
  'wait'      : long-running process that runs in the foreground (default)
  'contract'  : long-running process that daemonizes or forks itself
                (i.e. start command returns immediately)
  'transient' : short-lived process, performs an action and ends quickly
   [wait] contract

Does this service depend on the network being ready (yes/no) [yes] ? 

Should the service be enabled by default (yes/no) [no] ? 

The user to change to when executing the
  start/stop/refresh methods (example: 'webservd') [] webservd

The group to change to when executing the
  start/stop/refresh methods (example: 'webservd') [] webservd

Manifest written to memcached.xml
You can validate the XML file with "svccfg validate memcached.xml"
And create the SMF service with "svccfg import memcached.xml"

View the resulting manifest:
$ cat memcached.xml 
<?xml version="1.0"?>
<!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1">
        Created by Manifold
--><service_bundle type="manifest" name="memcached">

    <service name="site/memcached" type="service" version="1">


        <dependency name="network" grouping="require_all" restart_on="error" type="service">
            <service_fmri value="svc:/milestone/network:default"/>

        <instance name="default" enabled="false">

                <method_credential user="webservd" group="webservd"/>

            <exec_method type="method" name="start" exec="/opt/memcached/bin/memcached -d" timeout_seconds="60"/>

            <exec_method type="method" name="stop" exec=":kill" timeout_seconds="60"/>

            <property_group name="startd" type="framework">
                <propval name="duration" type="astring" value="contract"/>
                <propval name="ignore_error" type="astring" value="core,signal"/>

            <property_group name="application" type="application">

        <stability value="Evolving"/>

                <loctext xml:lang="C">



Now validate the manifest and use it to create the SMF service:
$ svccfg validate memcached.xml
$ sudo svccfg import memcached.xml 
$ svcs memcached
STATE          STIME    FMRI
disabled        9:52:18 svc:/site/memcached:default

The service can be started and controlled using svcadm:
$ sudo svcadm enable memcached
$ svcs memcached
STATE          STIME    FMRI
online          9:52:53 svc:/site/memcached:default
$ ps auxw | grep memcached
webservd 16098  0.0  0.1 2528 1248 ?        S 09:52:53  0:00 /opt/memcached/bin/memcached -d

Find more information at the Manifold project page or download Manifold from pypi.
I've been working with Pylons quite a lot lately and have been very impressed. Today I discovered a handy tool for debugging Pylons (and any WSGI/Paster served apps) that provides a web interface to enumerate, poke at, and even kill currently active request threads.

It is called "egg:Paste#watch_threads" (if you can call that a name) and obviously it is a feature of Paster (so if you serve your Pylons app via mod_wsgi, for example, you wouldn't be able to use it; not that it is recommended to enable it in a production environment given the information/power it exposes).

Enabling it for a Pylons app is simply a matter of modifying the config file (development.ini). It took me a bit of scanning of the Paster docs to work out how to get the config correct, so I'll share the simple magic here.

You need to replace this part of the Pylons config (e.g. development.ini):
use = egg:Myapp
full_stack = true

with this:
use = egg:Paste#urlmap
/ = myapp
/.tracker = watch_threads

use = egg:Paste#watch_threads
allow_kill = true

use = egg:Myapp
full_stack = true
#... rest of app config ...

What we are doing is replacing the main app with a composite app. The composite app uses "egg:Paste#urlmap" to mount the Pylons app at "/" while also mounting the "watch_threads" app at "/.tracker" (use whatever path you like; I borrowed from the examples I found).

So now if you fire up the Pylons application it should behave like normal, but you should also be able to browse to "/.tracker" (e.g. to see the active request thread debugger.

Below is a screenshot demonstrating watch_threads examining a Pylons app I was working on. Two threads are active; the request/WSGI environment is being shown for one of them.

Python 3.0 on Mac OS X with readline

Python 3.0 is out now and even though an OS X package isn't available yet, it is easy to build from source on a Mac. However, without some tweaking, you usually end up with a Python interpreter that lacks line-editing capabilities. You know, using cursor keys to edit the command-line and access history. The problem is that Apple doesn't provide a readline library (due to licensing issues they offer a functionally similar but different library called editline) so by default Python builds without readline support and hence no editing/history support. This always frustrates me.

Luckily, this is easily fixed so keep reading.

You can tell when readline isn't going to be included by examining the end of the make output. You will see something like this:
Failed to find the necessary bits to build these modules:
_gdbm              ossaudiodev        readline        
To find the necessary bits, look in setup.py in detect_modules() for the module's name.

The steps below detail my method for adding readline (and gdbm which you can skip if you don't want it) support to Python 3.0 (this probably works with other Python versions too).

Firstly, install the readline and gdbm libraries. One of the easiest ways to do that is to use MacPorts (aka DarwinPorts). If you don't have it already you can download the MacPorts installer to set things up. Once that is done then open Terminal/iTerm and enter:
$ sudo port install readline
$ sudo port install gdbm

If that works, then you are ready to build Python. Get the Python 3.0 source code and unpack it. You need to tell setup.py where to find the libraries you installed. MacPorts (usually) installs all of the software it manages in /opt/local/ so in setup.py find the two lines:
add_dir_to_list(self.compiler.library_dirs, '/usr/local/lib')
add_dir_to_list(self.compiler.include_dirs, '/usr/local/include')

and add two similar lines before them that point to /opt/local/lib and /opt/local/include, like:
add_dir_to_list(self.compiler.library_dirs, '/opt/local/lib')
add_dir_to_list(self.compiler.include_dirs, '/opt/local/include')
add_dir_to_list(self.compiler.library_dirs, '/usr/local/lib')
add_dir_to_list(self.compiler.include_dirs, '/usr/local/include')

Now you can configure and build Python.
$ ./configure --enable-framework MACOSX_DEPLOYMENT_TARGET=10.5 --with-universal-archs=all
$ make
$ make test
$ sudo make frameworkinstall

Note that if you've got any other non-Apple distributed versions of Python installed and want to keep the default version as it was, use (for example, to revert default back to 2.5):
$ cd /Library/Frameworks/Python.framework/Versions/
$ sudo rm Current && sudo ln -s 2.5 Current

Finally, so that the command "python3.0" works from the command-line, you need to either add /Library/Frameworks/Python.framework/Versions/3.0/bin/ to your PATH; or symlink /Library/Frameworks/Python.framework/Versions/3.0/bin/python3.0 to a standard directory in your PATH, like /usr/bin or /usr/local/bin . On my box, I install custom stuff into /usr/local/ and so I added these symlinks:
$ sudo ln -s /Library/Frameworks/Python.framework/Versions/3.0/bin/python3.0 /usr/local/bin/
$ sudo ln -s /Library/Frameworks/Python.framework/Versions/3.0/bin/2to3 /usr/local/bin/

Building ffmpeg on Solaris 10

Building some software projects on Solaris can often be challenging, usually when the project has mainly Linux-centric developers. I've had plenty of experience coercing such software to build on Solaris and today I'll provide a recipe for building ffmpeg on Solaris 10.

This recipe describes building ffmpeg from SVN trunk which was at revision 15797 at the time of writing. I mention this because ffmpeg is a surprisingly agile moving target. There are no actual releases, everyone must work from SVN and the developers are certainly not shy from making major incompatible changes between SVN revisions. Sometimes the changes effect the build process (configure options, etc) and sometimes they effect the actual ffmpeg args. So what I describe here may not work next week, but it should at least provide a good starting point.

Solaris supports a number of POSIX standards (see standards(5)) and so it is important to make sure that PATH is set correctly so that the correct commands are used. This does effect the build process. The PATH below is recommended, and includes /usr/ucb in the right place. Solaris is fun eh.

The recommended PATH is:
  $ export PATH=/usr/xpg6/bin:/usr/xpg4/bin:/usr/ccs/bin:/usr/ucb:/usr/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/sfw/bin:/opt/sfw/bin

GNU make is required. Solaris ships with GNU make and calls it gmake, which usually works fine. However, this version of ffmpeg creates a Makefile that causes gmake (3.80) to crash with an error "gmake: *** virtual memory exhausted. Stop.". I had to install GNU make 3.81 and use that instead. I installed it in /opt/make-3.81/bin/ and added it to the front of the PATH:
  $ export PATH=/opt/make-3.81/bin:$PATH

SVN checkout a copy of the latest ffmpeg source (I used r15797 for this)
 $ svn co svn://svn.mplayerhq.hu/ffmpeg/trunk ffmpeg-svn-trunk
 $ cd ffmpeg-svn-trunk

Write the following diff to a file "solaris_10_patch.diff":
Index: libavcodec/eval.c
--- libavcodec/eval.c   (revision 15797)
+++ libavcodec/eval.c   (working copy)
@@ -36,7 +36,8 @@
 #include <string.h>
 #include <math.h>
-#ifndef NAN
+#if !defined(NAN) || defined(__sun__)
+  #undef NAN
   #define NAN 0.0/0.0
Index: libavcodec/avcodec.h
--- libavcodec/avcodec.h        (revision 15797)
+++ libavcodec/avcodec.h        (working copy)
@@ -3015,4 +3015,9 @@
 #define AVERROR_NOENT       AVERROR(ENOENT)  /**< No such file or directory. */
 #define AVERROR_PATCHWELCOME    -MKTAG('P','A','W','E') /**< Not yet implemented in FFmpeg. Patches welcome. */
+#ifdef __sun__
+#undef isnan
+#define isnan(x)        __extension__( { __typeof(x) __x_n = (x); __builtin_isunordered(__x_n, __x_n); })
 #endif /* AVCODEC_AVCODEC_H */

Patch the ffmpeg source with it:
 $ patch -p0 < solaris_10_patch.diff

Run configure with the options I've specified below (use whatever prefix you like). You'll notice I have had to disable some features/protocols which were causing build difficulties (and I didn't need them). Also notice you have to explicitly specify "bash".
  $ bash ./configure --prefix=/opt/ffmpeg-SVN-r15797 --extra-cflags="-fPIC" --disable-mmx --disable-protocol=udp --disable-encoder=nellymoser

Then you should be ready to build and install (as root most likely).
  $ make
  # make install

Hope that helps.


Eddie 0.37.2 released

Eddie 0.37.2 has been released. The big change is that Eddie is now a properly installable Python package. This allows it to be distributed in package format and can be very easily installed using "easy_install EDDIE-Tool". Other bugfixes and minor improvements are also included. CHANGELOG.

Download Eddie

If you haven't heard of it before, Eddie is a multi-platform monitoring tool developed in Python.

Aug. 20th, 2008

When designing the FLVio RESTful HTTP API I ended up choosing XHTML as the data representation format. My natural instinct was to use XML and invent my own schema, but RESTful Web Services convinced me otherwise.

While explaining to a customer today about simply using a web browser to help debug the API I said,

"It is no coincidence that we use XHTML to represent data as it is not only a well-understood XML format but also makes life much easier when debugging."

Which has proven itself true so far. Any browser becomes a debugging tool for the API. Although, until browsers support all the HTTP verbs (or XHTML5 / Web Forms 2.0) you'll need an addon like Poster for Firefox to test commands like PUT and DELETE.


Zoner - DNS management UI

A couple of years ago, while learning TurboGears, I wrote a web application to simplify management of DNS zone files. Fast forward to today and I finally found a few minutes to clean it up a bit and make a release.

It is called Zoner and differs from many DNS management interfaces in that it works directly with live zone files. The zone files remain the master copy of domain details and can still be edited manually without effecting Zoner, as opposed to storing the domain structure in a database and generating zone files when needed (or reconfiguring bind to read directly from SQL). It also stores an audit trail for all changes (made through Zoner) and zones can be rolled back to any previous version.

Zoner might also be a useful reference app for anyone learning TurboGears 1.0. It is relatively simple, uses SQLAlchemy and Kid with Paginate and Form widgets.
We have just pushed live a new version of the FLVio video web service that gives clients the option to encode (Flash-compatible) H.264 video.

Many people probably already know that Adobe added support for H.264 video (in an mp4 container) to Flash Player late last year. This was welcome news to many people as H.264 is an open standard and provides much higher quality video (at lower bandwidths) than standard "FLV" video.

The only gotcha is that end users need to have a recent version of Flash Player installed (Flash Player 9 Update 3 aka version or newer) to playback H.264 video.

However, many popular Flash media players can be configured to attempt to playback H.264 video within the browser and automatically fallback to the FLV alternative if the version of Flash Player is too old. FLVio has been designed to support this by providing the option to encode both FLV and H.264 videos automatically for the client, providing easy access to the best of both worlds: high quality video playback and backwards compatibility.
One of my web applications is a CherryPy server that serves large files. I wanted to enable HTTP 1.1 byte range requests so I expected to have to get my hands dirty modifying my app to look for the right headers and do the dirty work.

Not so! I was already taking advantage of CherryPy's built-in helper function serveFile (cherrypy.lib.cptools.serveFile in CP 2) to efficiently serve static files back to the client. Glancing at the code for serveFile revealed that support for HTTP 1.1 byte ranges was already supported. But why were HTTP 1.1 range requests being ignored by my app?

The answer was simply that I had to tell CherryPy to enable HTTP 1.1 features. A quick change to the application config file to add:
server.protocol_version = "HTTP/1.1"

and a restart and success!
$ telnet media.serve.flvio.com 80
Connected to media.serve.flvio.com.
Escape character is '^]'.
GET /media/mediakit/thumb/moovoob/2.jpg HTTP/1.1
Range: bytes=10-20

HTTP/1.1 206 Partial Content
Date: Thu, 03 Jul 2008 10:10:15 GMT
Server: CherryPy/2.3.0
Accept-Ranges: bytes
Content-Length: 11
Content-Range: bytes 10-20/12438
Content-Type: image/jpeg
Last-Modified: Wed, 02 Jul 2008 05:47:28 GMT

telnet> cl
Connection closed.


FLVio - Video Web Service

I have spent most of this year, so far, designing and building a video web service, which has been branded as FLVio. We have just announced the launch of FLVio with our first live customer, one of a few who helped us with beta testing.

The idea behind FLVio is to solve all the problems behind adding video content (especially UGC) to a web site. Every second web site that launches nowadays seems to be some kind of social network, and many of them want all the bells & whistles that the big guys have, including user-generated video content. FLVio helps small (and large) businesses integrate video content without the pain and upfront expense, by solving these key problems:

  • storage

  • encoding

  • delivery

Videos are relatively large, so you need reliable storage, and plenty of it. Simple as that.

Videos (especially UGC) can be uploaded in any of a huge variety of video formats and codecs, all of which need to be re-encoded into a format that is playable within the browser and optimised for efficient web delivery. FLVio encodes almost all non-proprietary formats into Flash-compatible video (FLV and H.264), solving the other problem with re-encoding and that is CPU resources. The last thing you want to do is to have your web application server grinding away to re-encode user uploaded videos into FLV. Offloading that workload to FLVio leaves your server resources available for rendering web applications as they should be.

FLVio delivers video via progressive HTTP download, the favoured method these days for serving Flash-based video. Videos are served directly from the FLVio web servers to the web browser, avoiding the need to scale up your own web farm to handle the multitude of long-lived requests that media delivery typically requires, not to mention the unknown bandwidth costs that media delivery can add. FLVio has partnered with a Content Delivery Network (CDN) provider so that we can also accelerate media delivery for the best possible user experience.

FLVio integrates with a web application by means of a RESTful API. All interaction with FLVio is behind the scenes, at the API level, so web applications keep full control over the user experience, including upload forms and video playback. The fact that video management and delivery has been "outsourced" is transparent to users of the web application. I won't go into detail about the API here, for more details you can read a brief technical overview here. For the curious, the whole service was built with Python and is running on a farm of Solaris servers.

We've got a simple demonstration of a FLVio-based application where you can upload a video and see the results of the re-encoding process.

Any questions or comments, feel free to contact FLVio or myself directly.


gcc pre-defined macros

gcc defines some macros based on the platform, architecture, etc that it is running on. I always forget the gcc arguments that makes it display all these macro definitions, so here's a reminder for myself.
gcc -E -dM foo.c

foo.c can be anything, even an empty file (gcc only pre-processes the file).

Here's an ultra-simple mini script that takes care of the temp file creation.

touch $tmpfile
gcc -E -dM $tmpfile
rm $tmpfile

If I run this script on my Mac I get a large list of macro definitions, i.e.:
$ ./gcc_macros.sh
#define __DBL_MIN_EXP__ (-1021)
#define __FLT_MIN__ 1.17549435e-38F
#define __CHAR_BIT__ 8
#define __WCHAR_MAX__ 2147483647
#define __DBL_DENORM_MIN__ 4.9406564584124654e-324
#define __FLT_EVAL_METHOD__ 0
#define __DBL_MIN_10_EXP__ (-307)
#define __FINITE_MATH_ONLY__ 0
#define __SHRT_MAX__ 32767
#define __LDBL_MAX__ 1.18973149535723176502e+4932L
#define __APPLE_CC__ 5465
#define __UINTMAX_TYPE__ long long unsigned int
#define __SCHAR_MAX__ 127
#define __USER_LABEL_PREFIX__ _
#define __STDC_HOSTED__ 1
#define __DBL_DIG__ 15
#define __FLT_EPSILON__ 1.19209290e-7F
#define __LDBL_MIN__ 3.36210314311209350626e-4932L
#define __strong 
#define __APPLE__ 1
#define __DECIMAL_DIG__ 21
#define __LDBL_HAS_QUIET_NAN__ 1
#define __DYNAMIC__ 1
#define __GNUC__ 4
#define __MMX__ 1

and so on.


Packaging a Twisted application

At work I've created a number of Twisted applications for handling various internal services. Unlike my TurboGears applications, which I package as eggs to install using easy_install (provided by setuptools) I have no nice way to deploy my Twisted apps.

Until now.

Twisted provides a nice plugin system that allows an application to plug itself into the "twistd" command-line application starter. When properly packaged a Twisted application can be automatically plugged into the Twisted world at installation time and started by using twistd.

The only trouble is that there is no documentation for how to package a Twisted application so it can be deployed in this way.

Here I try to provide some documentation by showing an example of what is required to package a simple Twisted application. In fact, I will take the Twisted finger tutorial and write what I consider to be Step 12: "How to package the finger service as an installable Twisted application plugin for twistd" (aka "The missing step").

Step 12: How to package the finger service as an installable Twisted application plugin for twistd

Create a directory structure like this:

finger/finger.py is the finger application from http://twistedmatrix.com/projects/core/documentation/howto/tutorial/index.html packaged as finger.

twisted/plugins is a directory structure containing the finger_plugin.py file that will be described below. Note that there must be no __init__.py files within twisted and twisted/plugins.

finger_plugin.py provides a class implementing the IServiceMaker and IPlugin interfaces. Basically, this is the plugin point that defines the services the application will provide and any command-line options that it supports.
# ==== twisted/plugins/finger_plugin.py ====
# - Zope modules -
from zope.interface import implements

# - Twisted modules -
from twisted.python import usage
from twisted.application.service import IServiceMaker
from twisted.plugin import IPlugin

# - Finger modules -
from finger import finger

class Options(usage.Options):
    synopsis = "[options]"
    longdesc = "Make a finger server."
    optParameters = [
        ['file', 'f', '/etc/users'],
        ['templates', 't', '/usr/share/finger/templates'],
        ['ircnick', 'n', 'fingerbot'],
        ['ircserver', None, 'irc.freenode.net'],
        ['pbport', 'p', 8889],
    optFlags = [['ssl', 's']]

class MyServiceMaker(object):
    implements(IServiceMaker, IPlugin)
    tapname = "finger"
    description = "Finger server."
    options = Options
    def makeService(self, config):
        return finger.makeService(config)

serviceMaker = MyServiceMaker()

setup.py is the standard distutils setup.py file. Take note of the "packages" and "package_data" arguments to setup(). Also note the refresh_plugin_cache() function which is called after setup() completes. This forces a refresh of the Twisted plugins cache (twisted/plugins/dropin.cache).
# ==== twisted/plugins/finger_plugin.py ====
'''setup.py for finger.

This is an extension of the Twisted finger tutorial demonstrating how
to package the Twisted application as an installable Python package and
twistd plugin (consider it "Step 12" if you like).

Uses twisted.python.dist.setup() to make this package installable as
a Twisted Application Plugin.

After installation the application should be manageable as a twistd

For example, to start it in the foreground enter:
$ twistd -n finger

To view the options for finger enter:
$ twistd finger --help

__author__ = 'Chris Miles'

import sys

    import twisted
except ImportError:
    raise SystemExit("twisted not found.  Make sure you "
                     "have installed the Twisted core package.")

from distutils.core import setup

def refresh_plugin_cache():
    from twisted.plugin import IPlugin, getPlugins

if __name__ == '__main__':
    if sys.version_info[:2] >= (2, 4):
        extraMeta = dict(
                "Development Status :: 4 - Beta",
                "Environment :: No Input/Output (Daemon)",
                "Programming Language :: Python",
        extraMeta = {}

        description="Finger server.",
            'twisted': ['plugins/finger_plugin.py'],

MANIFEST.in contains one line, which I assume tells distutils to modify the existing Twisted package (to install twisted/plugin/finger_plugin.py) or something like that.
graft twisted

With all that in place you can install the package the usual way,
$ python setup.py install

Then you should be able to run twistd to see and control the application. See the twistd options and installed Twisted applications with:
$ twistd --help
Usage: twistd [options]
    athena-widget      Create a service which starts a NevowSite with a single
                       page with a single widget.
    ftp                An FTP server.
    telnet             A simple, telnet-based remote debugging service.
    socks              A SOCKSv4 proxy service.
    manhole-old        An interactive remote debugger service.
    portforward        A simple port-forwarder.
    web                A general-purpose web server which can serve from a
                       filesystem or application resource.
    inetd              An inetd(8) replacement.
    vencoderd          Locayta Media Farm vencoderd video encoding server.
    news               A news server.
    words              A modern words server
    toc                An AIM TOC service.
    finger             Finger server.
    dns                A domain name server.
    mail               An email service
    manhole            An interactive remote debugger service accessible via
                       telnet and ssh and providing syntax coloring and basic
                       line editing functionality.
    conch              A Conch SSH service.

View the options specific to the finger server:
$ twistd finger --help
Usage: twistd [options] finger [options]
  -s, --ssl         
  -f, --file=       [default: /etc/users]
  -t, --templates=  [default: /usr/share/finger/templates]
  -n, --ircnick=    [default: fingerbot]
      --ircserver=  [default: irc.freenode.net]
  -p, --pbport=     [default: 8889]
      --help        Display this help and exit.

Make a finger server.

Start the finger server (in the foreground) with:
$ sudo twistd -n finger --file=users
2007/12/23 22:12 +1100 [-] Log opened.
2007/12/23 22:12 +1100 [-] twistd 2.5.0 (/Library/Frameworks/Python.framework/
Versions/2.5/Resources/Python.app/Contents/MacOS/Python 2.5.0) starting up
2007/12/23 22:12 +1100 [-] reactor class: <class 'twisted.internet.selectreactor.SelectReactor'>
2007/12/23 22:12 +1100 [-] finger.finger.FingerFactoryFromService starting on 79
2007/12/23 22:12 +1100 [-] Starting factory <finger.finger.FingerFactoryFromService instance at 0x1d0a4e0>
2007/12/23 22:12 +1100 [-] twisted.web.server.Site starting on 8000
2007/12/23 22:12 +1100 [-] Starting factory <twisted.web.server.Site instance at 0x1d0a558>
2007/12/23 22:12 +1100 [-] twisted.spread.pb.PBServerFactory starting on 8889
2007/12/23 22:12 +1100 [-] Starting factory <twisted.spread.pb.PBServerFactory instance at 0x1d0a670>
2007/12/23 22:12 +1100 [-] Starting factory <finger.finger.IRCClientFactoryFromService instance at 0x1d0a5f8>

twistd provides many useful options, such as daemonizing the application, specifying the logfile and pidfile locations, etc.

Unfortunately Twisted and setuptools don't play nicely together, so I'm not able to package my Twisted app as an egg, take advantage of the setuptools package dependency resolution system, or install it using easy_install.





Eddie 0.36 Released.

Eddie is a system monitoring agent, written entirely in Python, that I've been working on for many more years than I can remember. I finally got a chance to make a new release. You can get it here http://eddie-tool.net/

This version has been a long time coming, but has been well tested over that time. This version features many enhancements and bugfixes, some of them listed below. A special thanks to Zac Stevens and Mark Taylor for their contributions.

  • Added support for Spread messaging as an alternative to Elvin.
  • Implemented a DiskStatistics data collector for Linux.
  • More command-line options and support for running as daemon.
  • Added a "log" action. Use it to append to a log file, log via syslog, or print on the eddie tty.
  • Variables can be set in directives, which can then be used in rule evaluation. For example, if the directive has "maxcpu=30", then the rule can address this as "rule='pcpu > _maxcpu'".
  • HTTP checks support cookie persistence.
  • Added "DBI" directive, for database query checking.
  • Added Solaris SMF method/manifest files to contrib.
  • Many more enhancements and bugfixes - see http://dev.eddie-tool.net/trac/browser/eddie/trunk/doc/CHANGES.txt



$ ssh root@
root@'s password: 
Last login: Fri Sep 21 21:53:30 2007 from
# uname -a
Darwin CM iPhone 9.0.0d1 Darwin Kernel Version 9.0.0d1: Fri Jun 22 00:38:56 PDT 2007; root:xnu-933.0.1.178.obj~1/RELEASE_ARM_S5L8900XRB iPhone1,1 Darwin
# python
Python 2.5.1 (r251:54863, Jul 27 2007, 12:05:57) 
[GCC 4.0.1 LLVM (Apple Computer, Inc. build 2.0)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.uname()
('Darwin', 'CM iPhone', '9.0.0d1', 'Darwin Kernel Version 9.0.0d1: Fri Jun 22 00:38:56 PDT 2007; root:xnu-933.0.1.178.obj~1/RELEASE_ARM_S5L8900XRB', 'iPhone1,1')


PyCon UK 2007 Thumbs Up

I spent the weekend in Birmingham at the very first ever PyCon UK 2007 conference. Everyone agreed it was an outstanding success - I went to EuroPython a few months ago and I must admit that PyCon UK had the edge on it for fun and value.

Like I did at EuroPython, I gave a lightning talk on PSI, although this time I was better prepared with real slides, instead of using vim as a presentation tool and attempting to give a real-time demo (which ran me out of time too quickly).

I have even made the slides available, for anyone who may be curious.

PSI 0.2a1 released

Today I finally released the first alpha version of PSI - the Python System Information package. Just ahead of this weekend's PyCon UK, where you'll find me.

PSI is a C extension that gives Python direct access to run-time system information by querying the relevant system calls. This version provides information about run-time process details. A Python program can take a snapshot of a process or all currently active processes on a system and inspect process details to its heart's content. PSI provides a consistent interface across all supported architectures, so programs written for one should (mostly) work on others. Where a particular architecture cannot supply the requested information that others can it will raise an appropriate exception.

This release supports 3 popular architectures: Solaris, Mac OS X and Linux. Hopefully more are on the way if I can round up volunteers.

If you want to have a play just: download it; svn checkout the source; or easy_install psi.

Here's some examples of it in action:
>>> import psi

>>> a = psi.arch.arch_type()
>>> a
<psi.arch.ArchMacOSX object type='Darwin'>
>>> isinstance(a, psi.arch.ArchMacOSX)
>>> isinstance(a, psi.arch.ArchDarwin)
>>> a.sysname
>>> a.nodename
>>> a.release
>>> a.version
'Darwin Kernel Version 8.9.1: Thu Feb 22 20:55:00 PST
2007; root:xnu-792.18.15~1/RELEASE_I386'
>>> a.machine

>>> psi.loadavg()
(0.705078125, 0.73046875, 0.7626953125)

>>> import os
>>> mypid = os.getpid()
>>> mypid
>>> p = psi.process.Process(mypid)
>>> p.command
>>> p.command_path
>>> p.user
>>> p.start_datetime
datetime.datetime(2007, 9, 1, 10, 58, 51)
>>> p.parent
<psi.process.Process object pid=13860>
>>> p.parent.command
>>> "%0.1f MB" % (p.resident_size/1024.0/1024.0)
'9.7 MB'
>>> "%0.1f MB" % (p.virtual_size/1024.0/1024.0)
'43.5 MB'

>>> ps = psi.process.ProcessTable()
>>> ps.count
>>> ps.pids
(0, 1, 27, 31, 39, 40, 41, 42, 43, 44, 45, 46, 47, 49, 50,
 51, 56, 59, 63, 66, 67, 69, 71, 72, 89, 117, 122, 134,
 136, 149, 155, 156, 159, 162, 172, 175, 176, 177, 179,
 180, 182, 183, 190, 194, 214, 229, 238, 242, 245, 246, 248,
 251, 256, 257, 264, 265, 267, 268, 270, 271, 272, 273, 274,
 286, 392, 401, 402, 403, 1135, 1258, 1442, 1589, 1703,
 1704, 1705, 1706, 1707, 1708, 1709, 1710, 1712, 1713,
 1714, 1715, 1716, 1717, 1718, 1719, 1720, 1721, 2575,
 2577, 2578, 2616, 2631, 2632, 9118, 9903, 10159, 10990,
 12444, 12596, 13122, 13582, 13840, 13904, 13973, 13974,
 13976, 14404, 14579, 14580, 14587, 14627, 14719)
>>> p = ps.processes[114]
>>> p.command


Private PYPI

Recently at work we streamlined the way we internally manage & deploy our Python packages & applications. Taking advantage of setuptools, we release all our packages as eggs and host our own "Private PYPI" (as we call it) as a central repository for our private packages. With a simple setuptools configuration tweak, our developers & sysadmins can install & deploy internal Python packages & applications using good old easy_install. easy_install will first look for packages in our Private PYPI repository and then fall back to the public PYPI (aka Cheeseshop) if necessary.

Our Private PYPI is a simple TurboGears application that I threw together in literally 5 minutes. It exposes a directory of packages (eggs and tarballs) as downloadable links on a web page, which is all easy_install needs to find and retrieve them.

Configuring easy_install to look for packages in the private PYPI before the public PYPI is simply a matter of creating a ~/.pydistutils.cfg file containing a file_links option pointing at the PYPI URL, in a [easy_install] section. For example:
find_links = http://internal.server/pypi/

Gotta love simple yet powerful package management.

Zipped python eggs are evil

I was recently trying to deploy a TurboGears app as a non-privileged user, configured with no home directory (home directory was just "/"). The app failed to start with a bunch of import errors, even though it worked fine when run as my user. The reason for import failure ended up being the method that is used to support importing zipped eggs. It appears that when a zipped egg is imported, it is actually unzipped to a directory in $HOME/.python-eggs/ where the package is then referenced.

So, if a user does not have write access to its $HOME directory then the temporary unzip will fail and so will the import. Very disappointing.

This whole zipped egg thing feels too much like a hack. At the very least shouldn't it attempt to unzip to the system tmp directory so it can still import the package and continue?

Anyway, the lesson to learn is always install eggs unzipped (which I was starting to do anyway, as I often need to examine the insides of an installed package when debugging and having to unzip the eggs first is a bit of a pain).


Variable requests for Apache Bench (ab)

I've been using ab (ApacheBench - comes with Apache httpd) lately to do some performance benchmarking of our internal web services at work. It is nice & simple to use, but unfortunately it is limited to only requesting the same URL over and over. For some services, such as a search engine that normally receives different query parameters with every request, this does not really represent reality.

I have created a patch for ab that gives it a new option (-R). This allows you to specify a file and ab will append lines from the file to the base URL for every request, in the order they are read from the file. If ab reaches the end of the file before the test is finished it will return to the first line and repeat them all.

An example explains this better.

Out of the box you may use ab to benchmark the speed of your site's search:
$ ab -n 5000 http://www.something/search?q=ipod

This will cause ab to send 5000 requests to the specified URL. Handy, but it is testing the same query over & over, which is not what the site would see in practice.

Instead, you could use the -R patch, by first creating a file (let's call it requests.txt) containing something like:

and running ab with:
$ ab -n 5000 -R requests.txt http://www.something/search?q=

As ab constructs a query it will fetch the next line from requests.txt and append it to the base URL and that becomes the query to use for that request. In this example it would query the URLs:

and so on.

This is much more useful, at least for the types of benchmarks I want to do.

You can find the ab patch here.

mod_proxy_balancer gets a thumbs up

At work we run a bunch of web applications (mostly TurboGears, CherryPy & Twisted apps) and host them behind Apache, using mod_proxy (and sometimes mod_rewrite) to present a clean URL to the outside world, but allowing each of the apps to run on their own private ports behind the scenes. Different people manage different web apps.

In front of our web farms we use hardware load balancers to handle request arbitration, which provides nice protection from servers or Apache instances going down.

The biggest problem I've had with this configuration until now is that when we need to perform maintenance on a particular web application, bringing that application down causes Apache to return an unhelpful message like "Service unavailable" to the client, as its attempt to reverse proxy the connection to the internal service fails.

For a long while I've wanted mod_proxy to be smarter, where I could tell it "hey, if the normal service you are forwarding to is not available, forward to this one instead". And "this one" would simply be the the same service running on a different peer server.

Well, that is exactly what mod_proxy_balancer in Apache 2.2 allows you to do. It goes beyond that and can provide weighted load balancing of internal services, but it also allows you to define "hot spares" which are only used if the normal service(s) are unavailable. This is what I'm using, with a config like:

# Reverse Proxy /myapp to an internal web service, with fail-over to a hot standby
<Proxy balancer://myappcluster>
    # the hot standby on server2
    BalancerMember status=+H
<Location /myapp>
    ProxyPass           balancer://myappcluster

This config tells Apache to proxy requests for /myapp to a web service on localhost at

If that service becomes unavailable (ie: you take it down for maintenance) then it will automatically send requests to instead. The "status=+H" defines that member as a Hot Standby. When the default service is back on-line mod_proxy_balancer will pick that up within about 60 seconds or so and revert back to forwarding all requests to it.

The ProxyPassReverse directives are unrelated to the proxy balancing smarts, but are usually required if you want to handle redirects/etc properly.

You can also get real load balancing if you define some BalancerMember entries that aren't hot standbys. mod_proxy_balancer will balance requests across them and hot standby members won't be used until all normal members become unavailable. You can control the weighting of members and the balancing method to, if you like. See proxypass and mod_proxy_balancer docs.

EuroPython 2007 photos

Another week, another EuroPython. Good fun all round. Cheers to Google for paying for everyone's beer on Monday night :-)

Here are my photos.

In Vilnius for EuroPython

Here I am in Vilnius, Lithuania, for another EuroPython conference. The city is very nice, from the small amount I've seen so far, although it hasn't stopped raining, so sightseeing isn't easy.

I am impressed by their offering of free wifi. The hotel (where the conference is also located) offers free wifi throughout, and I've just sat down at a coffee shop in a big shopping centre and was surprised to find another free wifi signal. Given the low costs of wifi infrastructure and broadband, more cities should encourage free wifi. I can't really see London doing it though... (nothing is free, or even cheap, in London).

Anyway, with any luck I'll walk around the "old town" today, which dates back to the 13th century and try and see more of the culture than shopping centres and wifi hotspots.

The conference starts tomorrow, so not much time for seeing sights after that. No doubt that will be when the rain stops and the sun comes out.

EuroPython 2007 booked

I'm all booked in for EuroPython 2007 now. If you're going to be there, drop me a comment so I know to look out for you. Perhaps we can meetup and try out some of the Lithuanian beers.

Introspecting Python objects within gdb

I had to debug a Python C extension recently. Using gdb, it was easier than I thought to walk through the source and introspect Python objects. Here's how to do it.

The first step is to make sure you've got a Python build that contains debugging symbols. Build Python manually using "make OPT=-g".

The nice Python guys have even supplied some handy gdb macros. Grab the Misc/gdbinit file from the Python source tree and make it your ~/.gdbinit file.

$ cd Python-2.5/Misc
$ cp gdbinit ~/.gdbinit

Now let's play with gdb. Fire it up and point it at the interpreter.

$ gdb
(gdb) file /opt/python-2.4.4-debug/bin/python
Reading symbols for shared libraries .... done
Reading symbols from /opt/python-2.4.4-debug/bin/python...done.

A very useful feature with gdb is the ability to set breakpoints on files that haven't been loaded yet, such as shared libraries. Let's set one in the source of a module I've been playing with. The shared library won't be loaded until Python processes the import statement, but gdb will still let us set it.

(gdb) b processtable.c:654
No source file named processtable.c.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (processtable.c:654) pending.

Now let's fire up the unit tests, to get something happening. You can see the pending breakpoint is automatically resolved when the relevant library is loaded.

(gdb) run setup.py test
Starting program: /opt/python-2.4.4-debug/bin/python setup.py test
Reading symbols for shared libraries . done
Reading symbols for shared libraries . done
Reading symbols for shared libraries . done
running test
Reading symbols for shared libraries . done
Reading symbols for shared libraries . done
Breakpoint 1 at 0x627338: file processtable.c, line 654.
Pending breakpoint 1 - "processtable.c:654" resolved
test_args (tests.process_test.ProcessCommandTest) ... ok
test_command (tests.process_test.ProcessCommandTest) ... ok
test_command_path (tests.process_test.ProcessCommandTest) ... ok
test_env (tests.process_test.ProcessCommandTest) ... ok
test_nice (tests.process_test.ProcessPriorityTest) ... ok
test_priority (tests.process_test.ProcessPriorityTest) ... ok
test_resident_size (tests.process_test.ProcessSizeTest) ... ok
test_virtual_size (tests.process_test.ProcessSizeTest) ... ok
test_flags (tests.process_test.ProcessTimeTest) ... ok
test_parent_pid (tests.process_test.ProcessTimeTest) ... ok
test_status (tests.process_test.ProcessTimeTest) ... ok
test_terminal (tests.process_test.ProcessTimeTest) ... ok
test_threads (tests.process_test.ProcessTimeTest) ... ok
test_current_gid (tests.process_test.ProcessUserTest) ... ok
test_current_group (tests.process_test.ProcessUserTest) ... ok
test_current_uid (tests.process_test.ProcessUserTest) ... ok
test_current_user (tests.process_test.ProcessUserTest) ... ok
test_real_gid (tests.process_test.ProcessUserTest) ... ok
test_real_group (tests.process_test.ProcessUserTest) ... ok
test_real_uid (tests.process_test.ProcessUserTest) ... ok
test_real_user (tests.process_test.ProcessUserTest) ... ok
test_bad_arg (tests.process_test.SimplestProcessTest) ... ok
test_pid (tests.process_test.SimplestProcessTest) ... ok
test_type (tests.process_test.SimplestProcessTest) ... ok
test_args (tests.processtable_test.ProcessTableProcessTests) ...
Breakpoint 1, ProcessTable_init (self=0x4410e0, args=0x405030, kwds=0x0) at processtable.c:654
654 if (PyList_Insert(self->processes, 0, (PyObject*)proc_obj)) {

Python ran some tests until it hit our breakpoint, inside the C extension module. We can view the source, of course.

(gdb) list
651 /* Add processes to list in reverse order, which ends up ordering
652 * them by ascending PID value.
653 */
654 if (PyList_Insert(self->processes, 0, (PyObject*)proc_obj)) {
655 return -1; /* failure */
656 }
657 Py_DECREF(proc_obj);
658 }

We are inside the __init__ function of a class. So there's the usual Python self object. In C extension modules, self is a pointer to a struct representing the internal attributes of the class. Let's take a look at self->processes.

(gdb) p self
$1 = (ProcessTableObject *) 0x4410e0
(gdb) p self->processes
$2 = (PyObject *) 0x4b5940

In this case, self is a pointer to our custom class. self->processes is a pointer to a PyObject, which could be any Python object type. The .gdbinit we borrowed from the Python source defines a very useful macro for inspecting the target of PyObject pointers.

(gdb) pyo self->processes
object : []
type : list
refcount: 1
address : 0x4b5940
$3 = void

Cool, so self->processes is a list type, and its current value is an empty list. Our breakpoint is located within a loop, so let's iterate around and get an object added to this list.

(gdb) cont

Breakpoint 1, ProcessTable_init (self=0x4410e0, args=0x405030, kwds=0x0) at processtable.c:654
654 if (PyList_Insert(self->processes, 0, (PyObject*)proc_obj)) {
(gdb) pyo self->processes
object : [<psi.process.process object="object" pid="16543">]
type : list
refcount: 1
address : 0x4b5940
$4 = void

Cool, the list now contains an object. Let's add another by looping again.

(gdb) cont

Breakpoint 1, ProcessTable_init (self=0x4410e0, args=0x405030, kwds=0x0) at processtable.c:654
654 if (PyList_Insert(self->processes, 0, (PyObject*)proc_obj)) {
(gdb) pyo self->processes
object : [<psi.process.process object="object" pid="16536">, <psi.process.process object="object" pid="16543">]
type : list
refcount: 1
address : 0x4b5940
$5 = void

So, self->processes is a list and currently contains 2 objects. Is it possible to fetch an element from the list and examine it? Sure is. We need to call the Python C functions that know how to deal with Python objects. gdb will allow us to do this.

(gdb) pyo PyObject_GetItem(self->processes,Py_BuildValue("i",0))
object : <psi.process.process object="object" pid="16536">
type : psi.process.Process
refcount: 3
address : 0x4dbf28
$6 = void

PyObject_GetItem(obj, y) is the C equivalent of obj[y] or obj.__getitem__(y)). The "y" must also be a Python object, you cannot just give it a C int. So we use Py_BuildValue() to build a Python integer object. The above is the equivalent of self.processes[0]. (Note that you cannot have any spaces within the argument given to pyo, as arguments to gdb macros are split by white space and pyo will only use the first one ($arg0).)

So, how do we examine the Process object itself? We can easily look at an attribute of the object, which might be handy. Let's look at the "command" attribute of the Process object.

(gdb) pyo PyObject_GetAttr(PyObject_GetItem(self->processes,Py_BuildValue("i",0)),Py_BuildValue("s","command"))
object : 'gdb-i386-apple-d'
type : str
refcount: 3
address : 0x640bb0
$7 = void

and same for the other object in the list.

(gdb) pyo PyObject_GetAttr(PyObject_GetItem(self->processes,Py_BuildValue("i",1)),Py_BuildValue("s","command"))
object : 'python'
type : str
refcount: 3
address : 0x63cf60
$8 = void

Cool, so even though we are deep within a C extension module, we can still introspect our objects with relative ease.


Me == TurboGears Contributor

TurboGears 1.0.2 has just been released, and yours truly has been listed as a contributor in the CHANGELOG. Admittedly the patches I submitted were only a couple of minor enhancements to the paginate functionality, but it is nice to help out on the project, and 1.0.2 saves me from running my own patched branch of TG.


Solaris and readline

A lot of open source software these days expects a GNU environment. I'm sure I'd be safe to say that well over 90% of these projects are developed in a Linux environment and assume such an environment for deployment.

When building for Solaris you often run into issues as Solaris is not a GNU environment. Although, these days Sun provides a lot of GNU software with Solaris, just most of it is not installed in standard locations (certainly not standard from a Linux POV).

This is the case with GNU readline. It is an optional Solaris package (SFWrline), that is distributed by Sun on the "Companion" disc that can be downloaded along with the Solaris install discs. It gets installed into /opt/sfw (instead of /usr or /usr/local in a Linuxy world) but is otherwise "normal".

Unfortunately I have to jump through hoops of various sizes to get readline linked with some of my favorite software. Fair enough, the configure/build tools of these projects does not automatically look in /opt/sfw for optional GNU libraries (be nice if they did though, when they were configured on a Solaris box) but in some cases teaching the software that the libraries are in custom locations is more difficult than it should be. I'm mainly pointing the finger at Python here.

In the cases of Python and SQLite, I have documented how to build these with readline support on Solaris 10, follow the links below if you'd like to know how.

Building Python with readline on Solaris 10

Building SQLite with readline on Solaris 10
For my definition of "faster" anyway.

My favorite "performance test" is the build of a software package I use now & then. This ends up mainly being a test of CPU performance, but because make allows you to parallelize the build, it does allow you to compare multi-core performance.

If you're a developer who compiles a lot, this comparison may be useful to help you choose a development platform. Otherwise, this is probably just good for a laugh.

On this Google spreadsheet I have tabulated all the systems that I have performed my test on so far. Of interest, for this discussion, are "thumper01" and "cmmbp".

thumper01 is a Sun Fire X4500, worth roughly £35k. It has 2x dual-core AMD Opteron 2.6 GHz CPUs, 16GB of RAM, and ~20TB of storage (spread over 48 500GB SATA drives). Admittedly, it is pretty overkill for this test, but as the test is mainly constrained by CPU, it is interesting to see how these Opterons perform.

cmmbp is my 15" MacBook Pro. Intel Core 2 DUO 2.33 GHz, 3GB RAM, rest as standard. Worth roughly £2k.


If we don't parallelize the build, my laptop outperforms the Sun beast by a decent margin. If we allow make to run 2 compilations concurrently, my baby still wins. With a concurrency of 4, obviously the 4-core beast will win (I didn't bother trying a level of concurrency greater than the number of cores on any box - maybe I should?)

Anyway, if nothing else, it shows that the Intel Core 2 DUOs do kick some serious ass.

Latest Month

March 2009


RSS Atom
Powered by LiveJournal.com
Designed by Tiffany Chow