Dan O'Reilly
2014-07-26 20:59:16 UTC
I think it would be helpful for folks using the asyncio module to be able
to make non-blocking calls to objects in the multiprocessing module more
easily. While some use-cases for using multiprocessing can be replaced with
ProcessPoolExecutor/run_in_executor, there are others that cannot; more
advanced usages of multiprocessing.Pool aren't supported by
ProcessPoolExecutor (initializer/initargs, contexts, etc.), and other
multiprocessing classes like Lock and Queue have blocking methods that
could be made into coroutines.
Consider this (extremely contrived, but use your imagination) example of a
asyncio-friendly Queue:
import asyncio
import time
def do_proc_work(q, val, val2):
time.sleep(3) # Imagine this is some expensive CPU work.
ok = val + val2
print("Passing {} to parent".format(ok))
q.put(ok) # The Queue can be used with the normal blocking API, too.
item = q.get()
print("got {} back from parent".format(item))
def do_some_async_io_task():
# Imagine there's some kind of asynchronous I/O
# going on here that utilizes asyncio.
asyncio.sleep(5)
@asyncio.coroutine
def do_work(q):
loop.run_in_executor(ProcessPoolExecutor(),
do_proc_work, q, 1, 2)
do_some_async_io_task()
item = yield from q.coro_get() # Non-blocking get that won't affect our
io_task
print("Got {} from worker".format(item))
item = item + 25
yield from q.coro_put(item)
if __name__ == "__main__":
q = AsyncProcessQueue() # This is our new asyncio-friendly version of
multiprocessing.Queue
loop = asyncio.get_event_loop()
loop.run_until_complete(do_work(q))
I have seen some rumblings about a desire to do this kind of integration on
the bug tracker (http://bugs.python.org/issue10037#msg162497 and
http://bugs.python.org/issue9248#msg221963) though that discussion is
specifically tied to merging the enhancements from the Billiard library
into multiprocessing.Pool. Are there still plans to do that? If so, should
asyncio integration with multiprocessing be rolled into those plans, or
does it make sense to pursue it separately?
Even more generally, do people think this kind of integration is a good
idea to begin with? I know using asyncio is primarily about *avoiding* the
headaches of concurrent threads/processes, but there are always going to be
cases where CPU-intensive work is going to be required in a primarily
I/O-bound application. The easier it is to for developers to handle those
use-cases, the better, IMO.
Note that the same sort of integration could be done with the threading
module, though I think there's a fairly limited use-case for that; most
times you'd want to use threads over processes, you could probably just use
non-blocking I/O instead.
Thanks,
Dan
to make non-blocking calls to objects in the multiprocessing module more
easily. While some use-cases for using multiprocessing can be replaced with
ProcessPoolExecutor/run_in_executor, there are others that cannot; more
advanced usages of multiprocessing.Pool aren't supported by
ProcessPoolExecutor (initializer/initargs, contexts, etc.), and other
multiprocessing classes like Lock and Queue have blocking methods that
could be made into coroutines.
Consider this (extremely contrived, but use your imagination) example of a
asyncio-friendly Queue:
import asyncio
import time
def do_proc_work(q, val, val2):
time.sleep(3) # Imagine this is some expensive CPU work.
ok = val + val2
print("Passing {} to parent".format(ok))
q.put(ok) # The Queue can be used with the normal blocking API, too.
item = q.get()
print("got {} back from parent".format(item))
def do_some_async_io_task():
# Imagine there's some kind of asynchronous I/O
# going on here that utilizes asyncio.
asyncio.sleep(5)
@asyncio.coroutine
def do_work(q):
loop.run_in_executor(ProcessPoolExecutor(),
do_proc_work, q, 1, 2)
do_some_async_io_task()
item = yield from q.coro_get() # Non-blocking get that won't affect our
io_task
print("Got {} from worker".format(item))
item = item + 25
yield from q.coro_put(item)
if __name__ == "__main__":
q = AsyncProcessQueue() # This is our new asyncio-friendly version of
multiprocessing.Queue
loop = asyncio.get_event_loop()
loop.run_until_complete(do_work(q))
I have seen some rumblings about a desire to do this kind of integration on
the bug tracker (http://bugs.python.org/issue10037#msg162497 and
http://bugs.python.org/issue9248#msg221963) though that discussion is
specifically tied to merging the enhancements from the Billiard library
into multiprocessing.Pool. Are there still plans to do that? If so, should
asyncio integration with multiprocessing be rolled into those plans, or
does it make sense to pursue it separately?
Even more generally, do people think this kind of integration is a good
idea to begin with? I know using asyncio is primarily about *avoiding* the
headaches of concurrent threads/processes, but there are always going to be
cases where CPU-intensive work is going to be required in a primarily
I/O-bound application. The easier it is to for developers to handle those
use-cases, the better, IMO.
Note that the same sort of integration could be done with the threading
module, though I think there's a fairly limited use-case for that; most
times you'd want to use threads over processes, you could probably just use
non-blocking I/O instead.
Thanks,
Dan