Skip to content Skip to sidebar Skip to footer

Using Dask Throws Importerror When Sagemath Code Is Run In Python

This question is very similar to my earlier question and was prompted by one of the comments. Recently, I have been trying to parallelize some code using Dask. The code involves co

Solution 1:

Unfortunately, you might be out of luck here (somewhat). It looks like sage is not developed with threaded execution driven by another language in mind - their root level modules modify key elements of the python environment and really try to take control of low-level functionality by default. For example, sage.__init__ modifies the way that both inspect and sqllite work (gross!)

The specific issue you're running into is that importing sage invokes the signal module, which cannot be run from a thread other than the main one. The issue isn't in sage operations, but simply the import statement:

In [8]: defhello_sage():
   ...:     from sage.allimport Integer
   ...:     return'Hello World'
   ...:

In [9]: futures = client.submit(hello_sage)

In [10]: distributed.worker - WARNING - Compute Failed
Function:  hello_sage
args:      ()
kwargs:    {}
Exception: ValueError('signal only works in main thread of the main interpreter')

Unfortunately, this is fairly incompatible with dask, which runs all delayed jobs within threads. It's not that dask can't import modules locally to a remote function (it definitely can), it's that those functions can't use signal to control execution.

Because of the way sage is written, as far as multithreading goes I think your only choice is to go with the parallelization options their developers have provided. That said, you can trick sage into thinking it's in a world of its own by having threads start their own subprocesses:

In [1]: import dask.distributed as dd

In [2]: from subprocess import Popen, PIPE

In [3]: definvoke_sage_cli():
   ...:     cmd = ["sage", "-c", "print(factor(35))"]
   ...:     p = Popen(cmd, stdout=PIPE, stderr=PIPE, text=True)
   ...:     o, e = p.communicate()
   ...:
   ...:     if e:
   ...:         raise SystemError(e)
   ...:
   ...:     return o
   ...:

In [4]: client = dd.Client(n_workers=4)

In [5]: future = client.submit(invoke_sage_cli)

In [6]: print(future.result())
5 * 7

This is a pretty hacky way of getting around this issue, and I think it's unlikely to offer any performance benefits over the native sage parallelization options as long as you're working on a single machine. If you're using dask to scale up a Kubernetes cluster or work with nodes on an HPC or something, then you could definitely use this route to schedule distributed jobs and then have sage manage multithreading within each node.

Post a Comment for "Using Dask Throws Importerror When Sagemath Code Is Run In Python"