Doing this is surprisingly difficult and I was certainly caught in a few mistakes the first time I tried to do this. I recently posted a lengthy comment on the corresponding bug. It took me a few moments to carefully analyze and re-think the situation and how a reliable approach should work. Non the less I am only human and I certainly have made my set of mistakes.
Below is the reproduction for my current approach. The implementation is still in progress but it seems to work (I need to implement the termination phase of non-kill-able processes and switch to fully non-blocking I/O). So far I've used epoll(7) and signalfd(7). I'm still planning to use timerfd_create(2) for the timer, perhaps with CLOCK_RTC for hard wall-clock-time limit enforcement. I'll post the full, complete examples once I'm done with this but you can look at how it mostly looks like today in the python-glibc git tree's demos/ directory.
I'd like to ask everyone that has experience with this part of systems engineering to poke holes in my reasoning and show how this might fail and misbehave. Thanks.
- timeout (optional, new feature)
- read side of the stdout pipe data
- read side of the stdout pipe being closed
- read side of the stderr pipe data
- read side of the stderr pipe being closed
- SIGCHLD being delivered with the intent to say that the process is dead
- if we still have stdout pipe open, read at most one PIPE_BUF. We cannot read more as the pipe may live on forever and we can just hang as we currently do. Reading one PIPE_BUF ensures that we catch the last moments of what the originally started process intended to tell us. Then we close the pipe. This will likely result in SIGPIPE in any processes that are still attached to it though we have no guarantee that it will rally kill them as that signal can be blocked.
- if we still have stderr pipe open we follow the same logic as for stdout above.
- we restore some signal handling that was blocked during the execution of the loop and terminate.