I measured time spent in sqlite operations when running the mypy_parallel benchmark. When using 8 workers, sqlite writes were over 10x slower compared to using just 1 worker, likely because only one worker can be writing at any time. This seems to be one of the main sources of limited parallel scaling on macOS at least.
Here are some ideas which might help:
- Instead of writing cache files in each worker, send them via IPC and have the main process write everything.
- Use a separate extra cache file per worker, and the main process will copy per-worker cached data to the main database. Every worker reads from the main database.
- Use a separate extra cache file per worker, and somehow allow each worker to read from the right database but always write to their own database.
- Shard the database -- have 1 (non-exclusive) database per worker, and use module name based hash to choose the target database for both reading and writing.
Next steps:
- Measure on Linux as well
- Prototype some of the ideas and measure impact (at least ones that are easy enough to implement)
cc @ilevkivskyi
I measured time spent in sqlite operations when running the mypy_parallel benchmark. When using 8 workers, sqlite writes were over 10x slower compared to using just 1 worker, likely because only one worker can be writing at any time. This seems to be one of the main sources of limited parallel scaling on macOS at least.
Here are some ideas which might help:
Next steps:
cc @ilevkivskyi