Managing the Ever-Growing To Do List

               Remy Evard - Northeastern University

                            ABSTRACT

     A system administrator's most important task is managing the
list of user requests, work assignments, and active problems.  If
these items aren't prioritized and handled, issues can be
forgotten or delayed, and important problems may go unsolved
while immediate yet trivial problems get all the attention.  In
the best case, one will spend too much time working on the list
of tasks instead of working on the tasks themselves.

     This paper is an account of our experiences with tackling
the problem of keeping track of tasks.  We present a software
system that we have developed and a methodology for using it to
stay on top of the growing list of things to accomplish.  We feel
that our experiences may be of use to other system administration
groups.

                          Introduction

     Sometimes the day of a system administrator goes something
like this: You come in bright and early, planning to finally
finish that program you've been working on sporadically for the
last month.  You make the mistake of checking your mail, and see
a pile of seemingly simple problems that have built up over
night.  About an hour later, the truly simple ones have been
solved, and you've pushed the not-so-simple ones off until the
afternoon.  You pull out your program, but notice more mail has
come in.  You ignore it and start to code, only to be interrupted
by the phone.  You help out the poor confused user on the other
end while poking around your office looking for materials for an
upcoming meeting.  Your manager stops you, inquires about your
long-term projects, and asks you to check on a problem in the
machine room.  Fifteen minutes later, after fixing a jammed
printer, you make it to the machine room, reboot the server, and
check your mail while the server comes up.  And so the day goes.
Exhausted, you head home, knowing you got a lot done, but not
knowing exactly what it was.

     The point is that the system administrator's job consists of
hundreds of tasks from many sources.  Users have requests and
questions.  Managers assign projects and responsibilities.
Problems appear from all over.  And, perhaps most importantly,
you have your own ideas and goals to accomplish.

     It is critical to be able to organize all of these tasks.
If they aren't handled in some reasonable fashion, then the
simple things are taken care of first, while important (but
complex) tasks go undone.  Worse, some problems get completely
forgotten about.  The large blocks of time that are required to
concentrate on difficult problems become scarce as interrupts
become commonplace.  As the list of undone tasks grows (if there
is such a list), the overhead for keeping up with it takes more
and more time.  The problem only worsens as the number of users
and the number of administrators grows.

     This paper is an account of our experiences with tackling
the problem of keeping track of tasks.  We present a software
system that we have developed and a methodology for using it to
stay on top of the growing list of things to accomplish.

                        Site Information

     The Experimental Systems Group manages the computing
environment in the College of Computer Science of Northeastern
University, consisting of approximately 350 computers of various
types and around 1200 active users. The group is made up of both
full-time staff members and student volunteers, totaling an
average of 10 people each quarter.

                         A Contact Point

     The first step in a solution is to create a well-known
mechanism for the users to submit requests or problems.  Whether
or not such a mechanism exists, the problems will find their way
to you, one way or another.  It is to your advantage to choose
what that route is.  If you don't, you'll have some users
visiting your office, some calling you, some emailing you and
your manager, some paging you, and some calling you at home.
Regardless of the actual method for reporting problems, the user
population should be made aware of its existence and how to use
it.  By creating a method for reporting problems and telling
people to use it, you'll limit the number of sources for
problems, and you'll cut down on user confusion.

     We use a single email alias for user problems, as do many
sites.  Users are told, repeatedly, to mail requests to
``systems''.  Everyone on the Systems Group receives the mail.
When someone replies, they send a copy of the reply to the list,
so that everyone can read it and follow the conversation, should
they so desire.  Most people on the group filter mail to systems
into a specific mail folder, making it easier for them to
organize and track.

     We use ``systems'' only for request-related mail.  When we
send mail to the other members of the group for information or
discussion purposes, we use a different alias, which allows us to
prioritize the mail differently, and organize user requests in
one place.

     We've thought several times about having more than one
mailing list for the users to use.  We could have ``systems'' for
most problems, and ``macs'' for Macintosh-related problems.  We
have elected not to do this because, in our primarily student-
based environment, we don't believe the users will categorize the
mail correctly - if it's a network problem with Macs, where
should it go?  Furthermore, with just one alias, the instructions
are almost simple enough for our constantly changing user
population: ``send mail to systems if you have a problem.''

     Training the users to send mail to systems is an ongoing
effort.  When the user's home directory is first created, they're
given a README file that, among other things, tells them to send
mail to `systems' if there is a problem.  We state this on our
hardcopy documentation and on our news postings.  And we say it
to users in the hall and on the phone if their problem isn't an
emergency.  If they send mail to someone directly, we resend the
mail to `systems' and send them a canned response saying that
ther mail should be sent to `systems'.  If they do it again,
their mail takes a little longer to be resent...

     We want mail to go to ``systems'' and not to individuals for
several reasons.  If the individual is gone, busy, or on
vacation, no one else will know about the problem, much less be
able to work on it.  It's useful for others members of the group
to know what's going on.  Perhaps most importantly, the users are
rarely correct in their guess as to who will to work on their
problem. When they send mail to systems, we can pick and choose
who will work on what, rather than letting the user target a
specific individual.

     We do handle requests from other sources when the need
arises.  The phone has an answering machine, and we run a help
desk during certain hours for people who don't have accounts or
can't use mail.  And when someone in the hall has a problem,
we'll try to solve it if it will take less than a minute or so.

     Using a shared email address has several benefits:
 o Users have one place to send things.
 o Everyone reads the mail, so people are aware of ongoing
   issues.
 o It's a great learning mechanism for junior members of the
   Systems Group.
 o Nearly everyone can send email, and it's pretty simple,
   (unlike, for example, posting news or running a graphical
   interface), so it works for most users.
 o Logging the mail for record keeping is simple.
Unfortunately, it has some real problems as well:
 o Everyone reads the same mail, creating some serious overhead.
 o Sometimes two or more people try to answer the same problem,
   wasting time and sending two (potentially confusing) answers
   to the user.
 o Sometimes people assume someone else will answer, or forget
   to send a copy of their their reply back to systems.
 o Everyone has their own mail queue, so everyone tracks
   problems to make sure they get solved.  This translates into
   duplicated effort and confusion.
 o Requests get lost or forgotten in the mail deluge.
Thus, a shared email address is a step in the right direction,
but isn't a complete solution.

                    Failed Interim Solutions

     The most critical failing of the simple email address was
that it didn't keep track of requests.  I, as manager of the
group, spent enormous amounts of time monitoring the email queue,
making sure that things got answered and that people were working
on the important problems.  I tried keeping lists of tasks that
needed to be done on paper tablets, on white boards, and in ASCII
files.  I was always updating the lists and rechecking them.
Those lists weren't easy to keep up-to-date, and weren't easily
modifiable by everyone in the group.

     In order to try to keep up with the incoming queue in real-
time while making progress on long-term projects, we developed
the concept of a ``hotseat'', which was intended to be occupied
by a person who read systems mail, handling incoming requests and
freeing time for others to work on problems requiring more
concentration.  This failed badly, because it wasn't possible to
assign problems to other people or to record the status of a
request.  While the person on the hotseat did manage to trap
interrupts, the next person on the hotseat spent much time
duplicating the effort of the previous person.

     For quite some time, we planned on developing a better
solution than a simple address to keep track of incoming
requests.  When it became obvious that manually-maintained lists
and hotseat organization weren't helping, and that the overhead
in trying to keep track of the requests was substantial, we
realized it was past the time to move to an automated tracking
system.

                    Problem Tracking Systems

     Many sites use automated tracking systems to keep track of
their tasks.  After assessing the way that we worked, we decided
we needed a system that:
 o Kept a database of unsolved requests.
 o Attached a user, a priority, a request date, and a due date
   to a request.
 o Let us assign an owner to a request.
 o Allowed us to see the list of requests based on varying
   criteria.

------------------------------------------------------------------

         Figure 1:  Example management interface screen
------------------------------------------------------------------
Essentially, it needed be a tool to organize requests and help us
choose what to work on next.

     Many possible solutions exist, but we were unable to locate
a system that fit our needs.  Commercial solutions tend to be
very expensive, as most require a powerful data base engine.
Existing free implementations didn't quite work either.  Some,
like Queue-MH and PITS, required that the administrator use a
specific mail program or tracking system interface.  Others, most
of which were based on the UNIX dbm library routines, weren't
portable across all the UNIX platforms that we needed to run them
on.  And others, such as GNATS, just didn't fit our work model.
A list of tracking systems that we tried is given in Appendix A.
While they didn't fit our needs, they may well be appropriate for
yours.

     Since we couldn't locate a solution that we thought would
work for us, we developed our own.

                        Our Solution: Req

     Req, which is pronounced like ``wreck'', not ``reek''
(although neither is particularly complimentary), was designed to
integrate with the way that we already used mail.  It consists of
two main parts.

     The first part is an email filter.  All mail that is sent to
``systems'' is assigned a unique number and stored in a file.
The number is inserted into the subject line, and then the
message is passed on to the members of the Systems Group.

     Suppose a user sent mail with this subject line:
 Subject: Help! How do I send mail?
Everyone on the mailing list would receive this mail:
 Subject: [Req #1837] Help! How do I send mail?

     The filter checks the subject line before inserting a new
number.  If an old number already exists, the message is appended
to the previous file related to that request number.  In this
way, a log of all mail associated with a number (and therefore
with a particular problem) is created.  The request log is kept
in RFC-822 format, with special headers indicating the owner, the
priority, and so on.  Thus, the log looks very much like a
conventional Internet email message, and can be used as such if
necessary.

     The second part of the system is a management interface.  It
currently may be accessed on a UNIX command line or via an X
Window tool; see Figure One.  An interface for emacs and for
Macintoshes are under development.  The management interface
displays the following information:
 o The request number.
 o The priority as assigned by the Systems Group.
 o The person working on the request, or the ``owner''.
 o How old the request is.
 o How long it's been since the user received a reply about the
   problem.
 o The status: ``stalled'', ``open'', etc.
 o Who made the request.
 o What the request is about.

     While the primary role of the request system is to keep
track of requests, the major use of it by administrators is to
help one look at the queue of requests and decide what to work
on.  The allows one to scan the items that he or she owns or that
are currently unowned, choosing which request to work on based on
priority, length of time in the queue, or whim.

     Using the interface, an authorized user may:
 o Review a request.
 o Send mail to the user about the request.
 o Take ownership of problems, or assign them to others.
 o Prioritize items.
 o Merge related items.
 o Change their status.
 o Add comments to a file.
 o Resolve a request.
 o Browse the active requests or the resolved requests.

     Almost all of these functions can also be performed via
email, by sending a message with the request number in the
subject line and a command in the message header.  For example,
if the line:
 X-Request-Do: give dave
appears in the header, then the owner of the request will be
changed to ``dave'', and an entry will be made in the log of that
request noting the change and who made it.

     We designed the system to be as free of policy as we could,
in order to provide maximum flexibility while we tested it.  Some
policy decisions, such as the number of priority levels, were
encoded by necessity.  Others, such as who may gave a request to
whom, were left open.

                       Additional Features

     Req also has a few features that, while not essential to the
purpose, we have found to be quite useful.
 o Common answers.  The X interface to req has options to
   include and to make files of frequently asked questions.  We
   keep these comprehensive answers in a directory, and use them
   whenever someone asks one of those questions.
 o User access.  Users may run an interface to req that shows
   them the status of their own requests.  They can read the log
   of their own requests, seeing who has been working on it and
   checking on the current state of their request.
 o Statistical analysis.  We have programs that analyze the
   request queue, showing us trends in the requests, such as
   busy times of the quarter, average length to resolve a
   request, and number of requests per user.

                              Usage

     Over time, we've settled into a usage pattern with req.  We
still tell users to mail their requests to ``systems''.  Each
request is assigned a number, entered into the system, and is
sent on to the group.

     We revived the ``hotseat'' idea once req was in place.  The
role of the person on the hotseat is to work with the req
interface, keeping an eye on incoming problems and acting as a
buffer for the other members of the group.  If a new problem can
be handled in 15 minutes or less, the person on the hotseat works
on it immediately.  If it can't be, the hotseat person gives the
request to some other member of the systems group, who is
notified by mail.  The person on hotseat also answers the phone
and sits at the help desk during help desk hours. Essentially,
this person acts as the only interface to user problems,
shielding the other members of the group from interruptions while
giving quick feedback to simple user requests.  When this person
isn't handling interruptions, he or she works on the request
queue, making sure that all the problems are owned and making
progress on simpler requests.  We trade off hotseat duty - each
group member is on the hotseat at most one day a week.

     The other members of the group work on longer-term projects.
They use the req interface to look at their queues, choosing what
to work on based on priority.  Because the hotseat person is
acting as a shield, others can often put in two or three hours of
solid work on a problem, rather than being continually
interrupted.

     As the manager, I occasionally take the time to look over
the whole queue, making sure that I agree with the priorities
assigned to items, ensuring that progress is being made on the
requests, and assigning some jobs to people who have less work
than others.

                          Observations

     Using a problem tracking system has made it considerably
easier to prioritize our time and to keep track of requests.
Much less time is spent keeping up with the queue, therefore more
time is spent making progress on the items themselves.

     We haven't lost any user requests since we installed req.
We do have problems that have been in the queue for four months,
but those are low priority items.  The associated users have been
made aware that we won't forget about the problem and we'll get
to it when the time is right.

     About one week into using req, we made the decision to use
it to keep track of everything we needed to do, not simply
limiting it to user-related requests.  So we put our own tasks in
the queue, adding, for example, the list of software we wanted to
install, a number of hardware repairs, and everything else that
we had been keeping track of on other lists.  We had thought
briefly about using req to manage another mailing list, but that
would have meant that we would have had two queues to look at and
choose from, not one.  This decision has had an extremely
positive effect: instead of trying to remember to do things, we
simply send mail to systems, and let req act as our memory
device.

     While the req system was designed to be relatively policy-
free, policies are definitely important.  For example, we made
the policy early on that anyone could give any request to anyone
else.  To give a request to someone means that you think they'll
do a better job than you will, not that you think they should do
it.  This has cut down on the political issues related to being
on the hotseat and giving incoming requests to your co-workers or
your boss.

     Our solution is a bit odd, in that group members see both
the mail sent to the systems mailing list and the request in the
req system.  For the most part, we use the mail as a way of
watching what's going on and a convenient way of quickly replying
to something, while we use req for keeping track of items and
deciding priorities.  People actively delete the mail in their
systems mailbox now, as opposed to keeping it around forever as
they did before req.  We will probably move to a system where no
one gets any systems mail, but we're reluctant to lose the
communication and education functions of the mailing list.

                          Problems Left

     The request system has, for the most part, solved the
problems that we had with incoming requests.  However, a few
problems related to time management still remain.

     We need to learn how to manage large-scale projects.  The
request system is great for keeping track of small items, like a
request to install software or fix a printer, but it's not quite
appropriate for planning next year's networking infrastructure.
When one is in charge of a big project and has lots of small
things to do, it's easy to get stuck on the small things while
ignoring the big thing.  Our current solution is to create a
request item for the project and use it to log progress on the
project.  This may or may not work - we don't know yet.

     While we have merged the systems lists into one location,
people still have individual to-do lists of their own, including
items like meeting schedules, phone calls to return, and so on.
We would like to integrate these types of lists into the overall
solution, so that it is possible to tell in one glance what one
should be focusing on next.

                  Critical Points in a Solution

     We have built a system that helps us keep track of user
requests and systems administrator tasks.  With it, we are able
to respond quickly to user requests while still putting
concentrated time into long-term projects.  The important points
of our solution include:
 o Letting the users know how to submit requests.
 o Keeping track of those requests.
 o Organizing the requests in ways that allow us to prioritize
   them.
 o Designing policies and procedures to maximize response time
   and concentration time.

     This solution fits our work model, which is based on an
email paradigm and a relatively small group of administrators.
The solution that works for you may or may not be similar to
ours, and should fit your own work model.

                          Availability

     The req software was designed to be installed outside of our
environment.  Req is built in C and perl, while Tkreq, the X
interface to req, is written in Tk/Tcl.  These packages, as well
as documentation that goes into much more detail about req than
this paper, is available at ftp.ccs.neu.edu in the file
/pub/sysadmin.

                        Acknowledgements

     In a flash of overnight inspiration, Robert Leslie wrote
Tkreq, The X interface to the req system.  For weeks thereafter,
he was badgered into adding features to it.

     Other members of the Systems Group, including Lauren Burka,
Brian Dowling, Geoff Hulten, Ivan Judson, Shane Kilmon, Dave
Kormann, Ray Matthieu, Jim Mokwa, and Matthew Wojcik, were
essential in the design phase of Req, and coped with my endless
experimentation and questioning once it was operational.

                       Author Information

     Rmy Evard is the leader of the Experimental Systems Group at
Northeastern University, where he has been for two busy years.
He received his M.S. in computer science from the University of
Oregon in 1992, where he worked as a graduate student systems
administrator.  His current research interests include
distributed virtual environments and automation of systems
administration.

                          Bibliography

Tinsley Galyean, Trent Hein, and Evi Nemeth, Trouble-MH, A Work-
     Queue Management Package for a >3 Ring Circus, in LISA IV,
     pp 93-96, Colorado Springs, Co, 1990.
William Howell, Managing In The 90s: Meeting The Challenge, July
     1992.  Presented at: SANS-I, Washington, D.C., July 1992,
     UNC-CAUSE, Asheville, NC, October 1992; International Help
     Desk Conference, Orlando, FL, February 1993; NC Help Desk
     Chapter, Greensboro, NC, March 1993
David Koblas & Paul M. Moriarty, PITS: A Request Management
     System, in LISA VI, pp 197-202, Long Beach, CA, 1992.
RFC 1297, NOC Internal Integrated Trouble Ticket System;
     Functional Specification Wishlist.
James M. Sharp, Request: A Tool For Training New Sys Admins and
     Managing Old Ones, in LISA VI, pp. 69-72, Long Beach, CA,
     1992.

                           Appendix A

     This appendix lists the non-commercial problem tracking
systems which we discovered or evaluated.  They are presented
here in the hopes that they may be useful to others.  No attempt
to compare them has been made, and there are systems that are not
on this list that were unaware of or unable to locate.

     Most of these tools, as well as comments about them, may be
found at ftp.ccs.neu.edu in the file /pub/sysadmin/tracking.
 o GNATS, The Gnu Problem Report Management System, available on
   prep.ai.mit.edu as /pub/gnu/gnats-3.2.tar.gz.  GNATS is
   oriented towards bug report tracking, but can be used for
   systems administration.   GNATS has a Tk/Tcl based front end
   called tkgnats, which is available in the GNATS contrib
   directory.
 o The NEARnet Trouble Ticket System, available on ftp.near.net
   as /pub/nearnet-ticket-system-v1.3.tar.  It is built on an
   Informix Relational Database, and uses Embedded-SQL on top of
   MMDF.
 o NETLOG, the JvNCnet trouble ticketing system, is available
   via anonymous ftp from ftp.jvnc.net as /pub/netlog-tt.tar.Z.
   It runs on UNIX systems and does not use a database.
 o Queue MH, available on ftp.cs.colorado.edu as
   /pub/sysadmin/utilities/queuemh.tar.Z, is a set of scripts
   built around the UNIX MH mail system.
 o PTS/Xpts, on ftp.x.org as /contrib/pts* is an X Windows based
   problem tracking system for both users and administrators.
 o Request, a Task Tracking Tool, is on pearl.s1.gov as
   /pub/request/request2.1.1.tar.Z.  It is written in perl and
   uses dbm libraries, providing an ASCII interface for problem
   submission and management.
 o Requete, available on ftp.crim.ca in /pub/requete-*.tar.Z is
   still under development.  It has an X interface.