Sunday, October 31, 2010

How to Referee Sage Trac Tickets

The current workflow and tools for peer review in Sage are not perfect, and it's clear they can be improved in many ways. However, the current system is also stable and works well, and though it will surely evolve over time, it is well worth learning. This post is about the current Sage peer review workflow. I will restrict my discussion to tickets that involve the Sage library, not standard or optional packages or other scripts not in the Sage library.

Why Referee a Ticket?

There are a couple of reasons that you might referee a ticket:

  1. You want to contribute to Sage development, but don't know where to start. There's a list right here of about 200 tickets (sorted by component) that need review, and probably some require only background you know something about.

  2. You want a specific chunk of code to get into Sage that somebody else wrote. If you care about this code, you are probably qualified to review it.

  3. Somebody asked you to review a certain ticket, perhaps in exchange for them reviewing one of your tickets.
Refereeing trac tickets is different than refereeing papers for inclusion in a journal in several ways. There is no central editor that assigns tickets to reviewers -- anybody (you, right now!) can just jump in and start refereeing tickets. Also, everything about refereeing is completely open and archived (hopefully) forever.

What To Expect

The code attached to a trac ticket can do all kinds of things, including:
  1. Fix a little (or huge) bug.

  2. Add documentation to a function.

  3. Introduce a new feature to Sage (and possibly new bugs!).

  4. Change the name of an existing function.

  5. Raise (or lower) the "doctest coverage" of Sage.

How to Referee a Ticket

Refereeing a ticket involves several steps. The steps below are mainly aimed at tickets that add new capabilities to Sage, but I've included some remarks about other types of tickets in another section below.
  1. Use Mercurial queues to apply all the patches on the trac ticket to your own (very recent, usually alpha) version of Sage, then type "sage -br" to rebuild the Sage library. You download
    each foo.patch, then do "hg qimport foo.patch" followed by "hg qpush" (I personally use this little script). Note: this can be automated; type "sage -merge" to learn how. If you can't apply the patch, then the ticket may "need work", i.e., the author has to rebase the patches; note this and change the Action at the bottom of the ticket page to "needs work".

  2. Test out the code to make sure the doctests work, using "sage -t affected_files" and "make ptest" in the SAGE_ROOT directory. Doing "make ptest" is a good idea, since often a change in one part of Sage breaks something far, far away that was written by somebody else long, long ago. (Again, see "sage -merge" for automating this.) If any tests fail, the ticket probably "needs work"; note this and change the action to "needs work". I often do this testing on sage.math.washington.edu, since it has 24 processors.

  3. Make sure that every function introduced in the code, even ones that start with underscores (!), have examples that fully illustrate how they work. Make sure that all input parameters to these functions are tested in the examples, and ensure that the function is actually being used by the examples. If the patch is large you can do "sage --coverageall" both before and after applying the patches, and make sure the coverage score doesn't go down. Also, make sure the docstrings are laid out according to our standard template, e.g.,
    """
    Short description... (one sentence usually)
    
    Long description... (optional but sometimes good)
    
    INPUT:
        - param1 -- description of
          the first param
        - param2 -- description of
          the second param
    OUTPUT:
        - description of output
    
    AUTHORS: 
        - Sage Developer1
        - Sage Developer2
    
    EXAMPLES::
    
        sage: 2+2
        4
    
    TESTS::
    
        sage: further examples nobody should look at.
    """
    
    It is essential that there be two colons after "EXAMPLES" and a blank line!! Also, do not omit the INPUT/OUTPUT blocks, since I often get complaints from end users that these are missing.

  4. If you've got this far without marking the ticket "needs work", then every function in the new code should work and have solid docstrings, including INPUT/OUTPUT blocks and examples that illustrate how the function works. Much of what you did in 1-3 is pretty routine (and yes, there are people working on trying to automate some of this). In this step, you finally get to have more fun and be creative: think seriously about whether the code is "probably correct" and sufficiently well documented to read. If not, do not give the code a positive review. The Sage library is already huge, and we don't want new code that is just going to cause a lot of pain to maintain later (there are other places for such code, such as Purple Sage). Basically, at this point, pretend you are a playtester or hacker and try to break the functionality introduced by the trac ticket. Some techniques include:

    • Throw random input at the functions and see what happens.
    • If you know any theorems or conjectures that leads to consistency checks on the functions, try them out. For example, if a function factors something as a product, try factoring products, then multiplying the factors back together. If the function computes the Ramanujan tau function, try it on large inputs and test the standard congruences.
    • If the functions are implemented in other software (e.g., Magma or Mathematica) write a little program to systematically compare the output of Sage and the other system for some range of values. Also, compare timings; if code is 100 times slower in Sage than Magma, there better be a good reason for this.
    • Very often when reading code, your "devious mind" will think of ways to potentially break it. Make sure that you have a sage prompt up, so you can try out your ideas for breaking the code. If anything succeeds at breaking the code, put that in your report and set the trac Action to "Needs work". Also, if you think of good doctests to add, note these.

Special Cases

As mentioned above, most of the steps above assume that the code attached to the trac ticket is implementing new features.
  • If the code fixes a bug, then if at all possible, there should be a new doctest that illustrates that the bug is fixed. This doctest should mention the relevant trac ticket.

  • If a trac ticket mainly adds new documentation, pay special attention to the English grammar, writing, etc. In my experience, almost nobody (definitely not me!) ever writes a few paragraphs of English without making several mistakes, typos, etc.

  • If the code changes the name of an existing function, then this must happen in accordance with our deprecation policy. You have to leave the old function name in place and add a deprecation warning. This can be removed after ONE YEAR. The proper way to do this is to put the following at the top of the function:
    from sage.misc.misc import deprecation
    deprecation("function_name is deprecated...")
    
    Also, you may want to put in the Sage version number to give people a clue about when to remove the function (see the second argument to the deprecation function).
The core Sage library is meant to be mature, stable and highly polished, and function names, class names, etc., are not allowed to just be changed left and right. There are other places to post code that uses Sage, but which is not yet stable.

Posting Patches as the Referee

Often when going through the above steps, it is easiest to just make changes directly to the source code. E.g., if you see documentation misspelled, or think of examples to add. Just make a new patch that has all these changes and post a "referee patch". If you do this, then you should notify the original patch author(s), so they can referee your referee patch. This can get a little confusing, but usually sorts itself out.

Summary


I hope you can see from the above that refereeing tickets can be really fun. Think of it as a game to break whatever is posted on a trac ticket. It's much better if you find issues now when the author is still interested in the code, rather than some poor user finding bugs a few years later when the author of the code has moved on to other things and forgotten everything about the code. The core Sage library can only someday become high quality with your help as referees. And getting substantial code into Sage itself can only have any hope of attaining a similar status to publishing in a peer reviewed journal if this process of code refereeing itself gets refined, documented, and results in good quality mathematical software.

Remember, the goal is that only high quality code goes into Sage that satisfies the requirements outlined above; this is much more important than worrying about following "bureaucratic protocols".