Thursday, October 23, 2014

Update on python-glibc

My pure-python bindings to glibc are progressing at a nice rate. I've made some interesting changes today that I'd like to share.
  • First, there is a clear difference between the raw glibc functions (all in the glibc module) and anything else. You can use them directly just as you would have from C. There's no magic going on and it's all there.
  • Second, we now have a growing collection of python wrappers (in the new pyglibc package), that give low-level primitives nice, high-level, pythonic API. Some of those are straight out of Python 3.4 (but are not a code copy), those include selectors.EpollSelector and select.epoll, some are custom (there's nothing to based this on) like signalfd and pthread_sigmask. More are on the way.
  • Third, and this is pretty interesting. I've decided to build a PEP3156 compatible event loop API. This is paramount for how this code can be consumed. It should roughly work out of the box as a drop-in replacement for the Python 3.4 only asyncio module. Did I mention that it works on Python 2.7? A lot is still missing but I am making progress. This ultimately means that once my contraption makes it into plainbox it won't have to be supported forever (aka job security) and can be discarded once we can depend on Python 3.4. It also means there's a clear, well defined API, a reference implementation (and some others if you look hard enough.
All of that is coming in the 0.6 release that I plan to make later today. The API is stable as I don't like changing my examples over and over so if you want to give it a try, please do so.

My ultimate goal is to scratch my itch. I want to build a reliable test launcher that does monitoring and cleanup. My only constraint is support for Python 3.2 on Ubuntu 12.04 that I have to support. I'm doing a little bit more by supporting Python 2.7 (since it's not costing me anything) on anything that is running the recent enough glibc.

If you're interested in discussing this, using it, adding patches or the like, ping me please.

Wednesday, October 22, 2014

Launching a process to monitor stdout, stderr and exit code reliably

Recently I'm fixing a rather difficult bug that deals with doing one simple task reliably. Run a program and watch (i.e. intercept and process) stdout and stderr until the process terminates.

Doing this is surprisingly difficult and I was certainly caught in a few mistakes the first time I tried to do this. I recently posted a lengthy comment on the corresponding bug. It took me a few moments to carefully analyze and re-think the situation and how a reliable approach should work. Non the less I am only human and I certainly have made my set of mistakes.

Below is the reproduction for my current approach. The implementation is still in progress but it seems to work (I need to implement the termination phase of non-kill-able processes and switch to fully non-blocking I/O). So far I've used epoll(7) and signalfd(7). I'm still planning to use timerfd_create(2) for the timer, perhaps with CLOCK_RTC for hard wall-clock-time limit enforcement. I'll post the full, complete examples once I'm done with this but you can look at how it mostly looks like today in the python-glibc git tree's demos/ directory.

I'd like to ask everyone that has experience with this part of systems engineering to poke holes in my reasoning and show how this might fail and misbehave. Thanks.

The current approach, that so far works good on all the pathological cases is to do this.
The general idea is that we're in a I/O loop, using non-blocking I/O and a select-like mechanism to wait for wait for:
 - timeout (optional, new feature)
 - read side of the stdout pipe data
 - read side of the stdout pipe being closed
 - read side of the stderr pipe data
 - read side of the stderr pipe being closed
 - SIGCHLD being delivered with the intent to say that the process is dead
In general we keep looping and terminate only when the set of waited things (stdout depleted, stderr depleted, process terminated) is empty. This is not always true so see below. The action that we do on each is event is obviously different:
If the timeout has elapsed we proceed to send SIGTERM, reset the timer for shutdown period, followed by SIGQUIT and another timer reset. After that we send SIGKILL. This can fail as the process may have elevated itself beyond our capabilities. This is still undecided but perhaps, at this time, we should use an elevated process manager (see below). If we fail to terminate the process special provisions apply (see below).
If we have data to read we just do and process that (send to log files, process, send to .record.gz). This is a point where we can optimize the process and improve reliability in event of sudden system crash. Using more modern facilities we can implement tee in kernel space which lowers processing burden on python and, in general, makes it more likely that the log files will see actual output the process made just prior to its death.
We can also use pipes in O_DIRECT (aka packet mode) here to ensure that all writes() end up as individual records, which is the indented design of the I/O log record concept. This won't address the inherent buffering that is enabled in all programs that detect when they are redirected and no longer attached to a tty.
Whenever one of the pipes is depleted (which may *never* happen, lesson learned) we just close our side.
When the child dies, and this is the most important part and the actual bugfix, we do the following sequence of events:
 - if we still have stdout pipe open, read at most one PIPE_BUF. We cannot read more as the pipe may live on forever and we can just hang as we currently do. Reading one PIPE_BUF ensures that we catch the last moments of what the originally started process intended to tell us. Then we close the pipe. This will likely result in SIGPIPE in any processes that are still attached to it though we have no guarantee that it will rally kill them as that signal can be blocked.
 - if we still have stderr pipe open we follow the same logic as for stdout above.
 - we restore some signal handling that was blocked during the execution of the loop and terminate.
There's one more trick up our sleeve and that is PR_SET_CHILD_SUBREAPER but I'll describe that in a separate bug report that deals with runaway processes. Think dbus-launch or anything that double-forks and demonizes

If you have any comments or ideas please post them here (wherever you are reading this), on the launchpad bug report page or via email. Thanks a lot!