Monday, September 2, 2013

Status report: integrating IPython into https://cloud.sagemath.com -- my approach

I'm still working on the IPython notebook integration into https://cloud.sagemath.com right now. This will be a valuable new feature for users, since there's a large amount of good content out there being developed as IPython notebooks, and the IPython notebook itself is fast and rock solid.

I spent the last few days (it took longer than expected) creating a generic way to *securely* proxy arbitrary http-services from cloud projects, which is now done. I haven't updated the page yet, but I implemented code so that
   

https://cloud.sagemath.com/[project-id]/port/[port number]/...

gets all http requests automatically proxied to the given port at the indicated project. Only logged in users with write access to that project can access this url -- with a lot of work, I think I've set things up so that one can safely create password-less non-ssl web services for a groub of collaborators, and all the authentication just piggy backs on cloud.sagemath accounts and projects: it's SSL-backed (with a valid cert) security almost for free, which solves what I know to be a big problem users have.

The above approach is also nice, since I can embed IPython notebooks via an iframe in cloud.sagemath pages, and the url is exactly the same as cloud.sagemath's, which avoids subtle issues with firewalls, same-source origin, etc. For comparison, here's what the iframe that contains a single ipynb worksheet looks like for wakari.io:
  

    iframe class="notebookiframe" id="" src="https://prod-vz-10.wakari.io:9014/auto_login/acd84627972f91a0838e512f32e09c9823782ec0?next=/notebook_relative/Listing 2.ipynb"

    
and here's what it's going to look like in cloud.sagemath:
  

     iframe class="notebookiframe" id="" src="https://cloud.sagemath.com/70a37ef3-4c3f-4bda-a81b-34b894c89701/port/9100/Listing 2.ipynb"

    
With the wakari.io approach, some users will find that notebooks just don't work, e.g., students at University of Arizona, at least if their wifi still doesn't allow connecting to nonstandard ports, like it did when I tried to setup a Sage notebook server there once for a big conference. By having exactly the same page origin and no nonstandard orts, the way I set things up, the parent page can also directly call javascript functions in the iframe (and vice versa), which is potentially very useful.

IPython notebook servers will be the first to use this framework, then I'll use something similar to serve static files directly out of projects. I'll likely also add sage cell server and the classic sage notebook as well at some point, and maybe wiki's, etc.

Having read and learned a lot of about the IPython notebook, my main concern now is their approach to multiple browsers opening the same document. If you open a single worksheet with multiple browsers, there is absolutely no synchronization at all, since there is no server-side state. Either browser can and will silently overwrite the work of the other when you (auto-)save. It's worse than the Sage Notebook, where at least there is a sequence number and the browser that is behind gets a forced refresh (and a visible warning message about their being another viewer). For running your own IPython notebook on your own computer, this probably isn't a problem (just like a desktop app), but for a long-running web service, where a single user may use a bunch of different computers (home laptop, tablet, office computer, another laptop, etc.) or there may be multiple people involved, I'm uncomfortable that it is so easy for all your work to just get overwritten, so I feel I must find some way to address this problem before releasing IPython support. With cloud.sagemath, a lot of people will likely quickly start running ipython notebook servers for groups of users, since it would take about 1 minute to setup a project with a few collaborators -- then they all get secure access to a collection of ipython notebooks (and other files). So I'm trying to figure out what to do about this. I'll probably just implement a mechanism so that the last client to open an ipython notebook gets that notebook, and all older clients get closed or locked. Maybe in a year IPython will implement proper sync, and I can remove the lock. (On the other hand, maybe they won't -- having no sync has its advantages regarding simplicity and *speed*.)