As a bunch of Unix systems used by students, CQUEST has always needed to make and delete Unix accounts/logins for students, and in the mean time deal with people who'd forgotten or lost their passwords. (This isn't unique to Unix systems, of course; pretty much anyone who needs to allow only authorized users access has had this problem.)
Recently we decided that our old system of managing student accounts had problems (primarily that it required far too much manual work by everyone involved) and that it was time to replace it. We decided that the core principle of the new system would be to get the students to do as much of the work as possible. Other people (instructors, departments, and system staff) would only be involved for exceptions, not for routine stuff.
Our account management system is based around the idea that things such as enrollment in courses or being a course tutor creates entitlements. If you have an entitlement, you can ask the system for an account; in addition, specific entitlements give you other things, such as free printer pages for being in a specific course.
All CQUEST student accounts are the same, regardless of where the source of entitlements come from. In particular, we don't have special accounts for course tutors, with special additional privileges.
Accounts are sustained by continued entitlements. When an account stops having entitlements, it starts slowly decaying; after a while it will be suspended, and after another while it will be deleted entirely (but the login kept reserved). Eventually all information about the person and the account will be removed entirely.
The biggest source of entitlements is enrollment in courses. This information comes to us automatically from the Registrar.
Our experience to date has been that a decent number of students are accessing the system from off-campus IP addresses (possibly more from off-campus than from on). We hope that this means that students appreciate the convenience involved, instead of them not being able to find an on-campus web browser they can use.
Our major design question was simple: how does a student prove their identity to us?
We couldn't set up a secret CQUEST password between the student and us, because that would have required manpower we didn't have. That left existing secrets shared between the student and the University, such as their ROSI PIN and in the future UTORAuth. Unfortunately at the time we wanted to deploy the system (the summer term of 2003), the secrets either did not exist in usable form (UTORAuth) or we were not able to use them (ROSI PINs).
Until a better alternative becomes available, we're relying on a pair of relatively uncommon pieces of information: the student authenticates using their student number and their birthday. As far as we know, neither piece of information is commonly available; use of the student number as a relatively private piece of information is even somewhat common (eg, the practice of posting marks publicly only by student number, omitting the name).
This is not an ideal situation but it seems to be good enough for now. It makes the system workable, at any rate, and we feel that the security exposures (to us, if not necessarily to an individual student) are acceptably low.
The Arts & Science Registrar was quite cooperative with us. We were able to set up a regular automated transfer of current enrollment data without significant problems. In the future we understand that there's probably going to be a central (non-Registrar) point where people can obtain relatively current data without having to bog down the core databases that students et al update; when that becomes available we expect to move to using it, to reduce direct load on the Registrar.
We considered allowing (or forcing) students to pick strong initial passwords for their new accounts (and strong passwords to reset their account's password to when they requested a password reset), but decided it was ultimately going to be too frustrating for all parties involved.
Instead, our initial passwords for accounts (and what the password is reset to on request) is their student number. We already have a reasonably robust infrastructure to force students to change their initial password on their first login. (This is also no worse than our old account management system.) In the future we may be able to move to a more robust initial password based on an overall better authentication method.
Students chose their own login names, or rather most of them; to avoid collisions with existing and future system and staff accounts, student logins have a two-character prefix, t-, glued on the front. The student-chosen part is forced to be (lower case) alphanumeric.
Currently we have made no attempt to exclude potentially offensive login names. We feel that this is mostly a social problem not really amenable to technical solutions; the space of rude words and other such things is vast. This is also consistent with our hands off, low manpower philosophy.
Experience to date is that most but not all students chose login names derived from their real name (first names, last names, initials, etc). Some have made more fanciful choices, presumably with personal meanings. Someone recently took the obvious in hindsight login name t-bone.
There are three important parts to the system: the web server, the core database management code, and the Unix backend which carries out login maintenance activities on CQUEST's master /etc/passwd server. The first two parts run on a dedicated accounts management machine; the third runs on our existing master machine, which is also our core fileserver.
acct.cquest is our dedicated accounts management machine. While it lives in our server subdomain it isn't part of the CQUEST machines: it doesn't share /etc/passwd, it doesn't NFS mount anything, and in general it is as utterly locked down as we can reasonably make it (however, it is backed up via the same Amanda-based system that backs up regular CQUEST servers). It does run our usual Linux distribution, but that was a pragmatic decision made on the basis of wanting low management overhead. Since it has a relatively undemanding life, it is a single-CPU PClone with 512M of memory and 18G of RAID-1 disk space in the form of two SCSI drives. (RAID-1 is standard on all of our important servers, as is SCSI.)
The web server is a standard Red Hat Apache installation, somewhat stripped down to avoid extraneous modules. We have both a http: (regular HTTP) and a https: (SSL HTTP) instance active, the latter using the standard mod_ssl module. The non-SSL virtual webserver just redirects all traffic to a warning page on the https instance, since we don't want any of the transactions going through unencrypted.
Although the university offers relatively cheap (a few hundred dollars a year) certificate signing through Verisign, we opted for a self-signed SSL certificate. One of the major reasons for this was that, as far as we could see, obtaining a Verisign certificate required reading and agreeing to a hundred pages or so of scary legal pseudo-contract documentation. Users do not seem put off by having to deal with the resulting warnings; we have had no complaints about it and indeed the warnings from IE are relatively mild and to the point. (Mozilla's are more scary, unfortunately.)
The web server content is a few static HTML files with instructions and web forms to fill out and a few CGI programs that process the forms. When invoked, the CGI programs check and process their input, contact the database management code, and present the results to users as more HTML. In other words, nothing complex; the most tedious part of the work was generating all of the various variants of the HTML output, especially on errors.
The CGI programs are careful to be quite paranoid, and they try to do as extensive checking of input as possible before submitting it to the backend web-access daemon. As good paranoid programmers we have carefully checked for even impossible errors that would take deliberate malice to introduce.
The CGI programs also devote a lot of code (in cooperation with the web backend) to generating good explanations of errors, with as much information as possible. Helpful error handling is important; if nothing else it saves the system staff getting anguished email from students about mysterious (to the student) problems they're having. We also avoid having students jump through extra hoops; if they're going to want a piece of information right after getting an error message, we might as well give them the information in the error itself.
We try to encourage students to get accounts by explicitly telling them in status inquiries if they're entitled to an account but don't already have one.
As part of the authentication process users need to supply their date of birth. However, they need to do it in a form that matches what the Registrar uses, and the Registrar uses YYYY-MM-DD, with month and day padded with a leading 0 as necessary. This opens up a lot of possibilities for users to get it wrong if we let them enter the full date raw.
The current form design asks users to pick their month and day of birth from option lists then fill in the year; the order is presented to them as Month Day, Year (more easily seen on one of the web pages on the web server). This seems likely to be suitably unambiguous to users.
Behind the scenes, the form generates zero-filled month and day digits as input to the CGI programs, which then glue the three fields together into one after doing some consistency checking. By not giving the user freeform choices, we've significantly reduced the chance of mistakes and errors.
At the heart of the account management system is a database that records everything we want to know about people, student numbers, course enrollments, logins, and quotas (print and otherwise). The database is stored using a PostgreSQL instance that runs on acct.cquest.
The PostgreSQL database itself is an almost pure data storage and retrieval system; we don't use stored procedures or other methods of embedding application logic into the SQL database. The database does enforce certain consistency constraints — for example login names must be unique and we can't have the same student number map to two different people — but that's it.
The database is the authoritative source of information for things that it contains, including logins. However, the database only contains student logins. There is no attempt to manage non-student logins through the database, or even have any knowledge about them there. This is one reason why student logins must have a prefix that guarantees they don't collide with non-student logins; there's no other mechanism for it.
We have decided not to trust PostgreSQL's own internal security systems; this is not because we know they are bad but because they are large and necessarily very general. The database is not network accessible at all; while IP access to SQL databases is useful sometimes, we don't want to put that much trust in secure buffer handling in a large C program. While we do use PostgreSQL's internal security features to restrict access to the database to a few local Unix accounts (and then restrict the privileges of most of those accounts fairly strongly), we consider this just a useful backup to the checks and restrictions in the programs doing the work.
All access and manipulation of the database is done through a set of command line programs and daemons. Daemons are used by outside programs (the Unix backend, the print quota backend, and the web server's CGI programs). The programs perform periodic operations (such as loading new data from the Registrar) and let the system staff perform manual tasks (such as adding entitlements for tutors).
The administrative tools and daemons are responsible for higher level issues of database correctness and user-friendliness, such as giving nice messages if you try to add a login name that already exists. The daemons communicate with their clients using simple text-based protocols. All authentication and validity checking is performed by the daemons, in particular checking student date of birth; we never assume that we can trust the other end.
Although this may sound funny, CQUEST primarily keeps track of people (although a person is kind of minimal: a name, a date of birth, a record state, and a few dates). Other entities in the database, such as logins and student numbers, are necessarily associated with a particular person and cannot exist without one. (Internally, CQUEST makes up a unique identifier for each person we know about. We find out about people mostly through the Registrar's information.)
Things such as enrollments and entitlements are not directly associated with logins; they are instead associated with people. A person, even an entitled person, can have zero, one, or several logins (zero or one is the usual state; to get several requires manual intervention by the system staff).
A person also has a state. This is whether they're active (currently entitled to a CQUEST account), suspended or deleted (not entitled to a CQUEST account for various amounts of time), or should be purged entirely (because we haven't seen hide nor hair of them in enrollment data for long enough that we want to clean up the database).
Logins have a state too. Normally this state is inherited from the person's state; an active person will have their logins be active, a deleted person will have their logins be deleted. However, there's a manual interface to force a login to be suspended or deleted right now, no questions asked. Such manual states are sticky; they stay as is no matter what the overall person's state becomes.
The core idea behind the aging of accounts from active through suspended and into eventual deletion is that noticing when something goes away is difficult. Instead of a model where deleting the last entitlement of an account starts timers to suspend and delete the account, we operate in a model where the continued existence of entitlements forestalls these state changes.
At all times, every person has three dates associated with them: suspension, deletion, and purging (when we will remove their information from the database). Each entitlement gives a certain amount of time delay on these actions happening, and periodically we perform a database update that sets these times for each person to the largest of their current value or the current date plus the largest delay from a current entitlement. (In particular we have to perform this update before any operation that deletes entitlements, which usually happens by reloading the course enrollment data from the Registrar.)
While a person has an entitlement, every update will advance the dates further into the future, implicitly sustaining their accounts. When their last entitlement vanishes we don't need to do anything special; implicitly their dates stop updating and sooner or later we'll hit those dates. At that point, another update notices that their state is wrong for the relevant dates and they move from active to suspended or even deleted.
Yet another update stage then synchronizes the state of their login or logins with their overall state, unless the login has a manually set state.
The UID of Unix logins is recorded in the database and is assigned by the programs that manipulate the database to create accounts. The database is careful to constrain the numeric range that UIDs can assume, and contains facilities to track and automatically assign the next valid UID to new accounts. Duplicate UIDs are allowed in order to simplify certain scenarios, such as a student wanting a new and different login name instead of their old one.
Putting the UID in the database means that all of the vital information needed to accurately recreate a login is contained in the database. This simplifies rollback and roll-forward after Unix problems, and means only one place needs to be consulted to look up who an old UID belongs to. It also makes it possible to assign the same UID to two different accounts without having to explicitly communicate a desire for this to the Unix backend.
Our traditional approach to this sort of problem would be to use files to store data and to parse and manipulate it entirely inside the programs we wrote. (This is how our previous account management system worked.) Our use of an SQL-based database in this project was an experiment that has been very successful and useful.
SQL has saved us writing a bunch of code. In case after case, a lot of program logic has been reduced to one SQL operation. This has naturally resulted in smaller and more rapidly and easily written programs. It also offloads much of the complex work to the PostgreSQL backend, which is much closer to the data and has had a lot of work put into it to optimize these things.
SQL was not difficult to learn. The author found good online resources that explained everything quite lucidly, and SQL itself is not a large and complicated language. Only a few things proved tricky or unexpectedly performed badly; in general, almost all of the necessary SQL was immediately obvious. Repeatedly, tasks that the author expected to be complex and require program processing and much SQL turned out to require only one not overly complex SQL command.
Our choice of PostgreSQL over the perhaps better known MySQL has also turned out to be fortuitous and correct. While the choice was initially made because we felt PostgreSQL had more mature and well-proven transactional support (obviously important here), PostgreSQL's greater support for various elements of SQL, particularly database constraints and SQL subselects, has proven to be useful and reassuring.
The Unix backend pulls information from the database by talking to a query daemon running on acct.cquest via TCP/IP. The query daemon consults a special database table on behalf of the Unix backend, a table that records a dated journal of changes to logins.
The daemon has two operations: to get all journal entries since a given date, and to get a complete list of the database's logins. Operations that know what logins they've affected, such as creation, password resets, and explicit suspension or deletion of a specific login record specific details about this in the journal. Operations with unspecific effects, such as the periodic mass account state changes, record in the journal a directive to pull the entire database login state.
Logins may be in several states: active; suspended, where no access is allowed but files are preserved; and deleted, where the account and files should be removed on the Unix side. Logins may also be removed entirely from the database, in which case they should also disappear on the Unix side. The dump of the database login state can include logins in all three states.
Logins are explicitly retained in the database in their deleted state for a while to prevent the login name from being immediately reused.
The Unix backend's job is to transfer changes to logins (including creation and deletion) from the database machine to the actual CQUEST /etc/passwd and fileservers, resynchronizing the actual CQUEST machines with the database. It records the last successfully transfered record that the daemon gave it for future use.
The Unix backend only manipulates student accounts, just as the database only contains information for student logins. This means that it can afford to be obsessively distrustful of the data it is getting from the database and from /etc/passwd: it knows exactly what student login names look like, exactly what UID range they should be in, and exactly what their home directory name should be like. This guards against various corruptions:
The separation of powers insures that database compromises can only have limited effects. An intruder who completely corrupts the database machine can only manipulate student accounts; system accounts are safe unless the Unix backend itself is broken, despite the Unix backend's vast powers (as it must run on our core fileserver as root).
The Unix backend never directly manipulates /etc/passwd or other aspects of the system. Instead it breaks down changes into little operations that will be performed by a collection of scripts, one per operation. Only when it has successfully dealt with everything from the network conversation and all is consistent does it actually start invoking the actual operations. Any script failing will immediately abort the operations at that point (this is accomplished by feeding the operations to a subordinate sh -e process).
As a final line of defense, the scripts for the operations themselves check that various constraints are true: for example, the rmhomedir script will refuse to remove a directory tree not owned by the user in question.
The little scripts are by and large just small covers on top of existing Red Hat commands to manipulate /etc/passwd and /etc/shadow (for password resets). Even when this is not true, the small scope of their individual jobs made them easy and obvious to write, and easy to feel confidant about their effects.
Structuring the operations as small scripts made them and the Unix backend easy to test. They could be invoked one by one to verify their effects, and the Unix backend could report the list of scripts to run instead of running them (from the same data structure that would run them in live mode). This in turn made it easier to verify the Unix backend itself.
Unlike UIDs, the Unix backend assigns home directories; they do not appear in the database's information about logins. It also assigns the initial shell and knows what the shell should be for suspended accounts. This was decided for fuzzy reasons, partly for flexibility in changing our minds about this and partly so that the database side of things would have to know relatively little about the Unix accounts.
With the exception of most of the Unix backend's little operation scripts, all of the programs were written in Python. This has been an immense boost to our productivity and to the quality of the end result.
Python is a real programming language, albeit a high-level one, not just a scripting language like sh. It also has a rich library of utility modules to do things like talk to PostgreSQL and decode strings from CGI invocations (and nearly complete access to Unix system calls and library functions). As a dynamic high-level language, a little code goes a long way.
At the same time its dynamic nature has freed me from having to worry about many low-level details: memory allocation, making sure bytes could never overrun assigned buffers, and so on. Its exception model has insured that any failed operations that were not explicitly caught would abort the program. Both automatic handling of memory and enforced handling of errors are strong boosts for program security; Python programs are simply not susceptible to several classes of security exposures and failure modes of C or C++ programs. In an environment as sensitive as an account management system, this has been highly useful.
Not only do we think we have a great system, but we'd like you to have a copy too.
Most of the parts of our system are actually not specific to CQUEST. The author believes that it should be fairly easy to adopt to another similar environment with very few changes (many of them as simple as editing strings to talk about your systems instead of 'CQUEST').
If you are interested in getting a copy of the source code (which includes our database table definitions), contact the CQUEST system staff. You will really want to know Python, and probably SQL as well. Documentation is a bit scanty.
Scribbled by Chris Siebenmann