gh-146073: Add fitness/exit quality mechanism for JIT trace frontend#148089
gh-146073: Add fitness/exit quality mechanism for JIT trace frontend#148089cocolato wants to merge 31 commits intopython:mainfrom
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
|
It appears that the current parameters do not yet guarantee runtime safety; I will continue to work on fixes and optimizations. |
|
I've commented on the issue #146073 (comment) |
This comment was marked as outdated.
This comment was marked as outdated.
|
@markshannon @Fidget-Spinner gentle ping, I’d like to hear your suggestions about the current parameters |
markshannon
left a comment
There was a problem hiding this comment.
I've a few comments, mostly broad ideas and suggestions, rather than anything that needs fixing.
I think we should merge this soon. We can tweak the parameters later as we refine our ideas.
Include/internal/pycore_optimizer.h
Outdated
|
|
||
| /* Exit quality thresholds: trace stops when fitness < exit_quality. | ||
| * Higher = trace is more willing to stop here. */ | ||
| #define EXIT_QUALITY_CLOSE_LOOP (FITNESS_INITIAL / 2) |
There was a problem hiding this comment.
I think we want this higher, close to FITNESS_INITIAL*0.9. It is only super-short loops that we want to unroll.
| * Higher = trace is more willing to stop here. */ | ||
| #define EXIT_QUALITY_CLOSE_LOOP (FITNESS_INITIAL / 2) | ||
| #define EXIT_QUALITY_ENTER_EXECUTOR (FITNESS_INITIAL * 3 / 8) | ||
| #define EXIT_QUALITY_DEFAULT (FITNESS_INITIAL / 8) |
There was a problem hiding this comment.
I'd increase this to make sure that the fitness cannot drop from above EXIT_QUALITY_DEFAULT to below EXIT_QUALITY_SPECIALIZABLE in a single uop.
Include/internal/pycore_optimizer.h
Outdated
| * N_BACKWARD_SLACK more bytecodes before reaching EXIT_QUALITY_CLOSE_LOOP, | ||
| * based on AVG_SLOTS_PER_INSTRUCTION. */ | ||
| #define N_BACKWARD_SLACK 50 | ||
| #define FITNESS_BACKWARD_EDGE (FITNESS_INITIAL - EXIT_QUALITY_CLOSE_LOOP \ |
There was a problem hiding this comment.
A backwards edge isn't necessarily bad, although too many suggests a poor trace.
Maybe don't penalize the first backwards edge much, but penalize subsequent ones a lot.
What we really want is to avoid unrolling a loop that doesn't include the trace start.
It would be easier to reason about, if we used a simpler calculation for the value.
Include/internal/pycore_optimizer.h
Outdated
|
|
||
| /* Backward edge penalty for JUMP_BACKWARD_NO_INTERRUPT (coroutines/yield-from). | ||
| * Smaller than FITNESS_BACKWARD_EDGE since these loops are very short. */ | ||
| #define FITNESS_BACKWARD_EDGE_COROUTINE (FITNESS_BACKWARD_EDGE / 4) |
There was a problem hiding this comment.
It is not the length of the loop that matter. We want to include yields in generators in the traces of the loop that calls them, whether that is a yield from loop or a for loop doesn't much matter.
Python/optimizer.c
Outdated
| assert(curr_instr->op.code == JUMP_BACKWARD_JIT || curr_instr->op.code == RESUME_CHECK_JIT || (exit != NULL)); | ||
| tracer->initial_state.jump_backward_instr = curr_instr; | ||
|
|
||
| // Reduce side-trace fitness as chain depth grows, but clamp the reduction |
There was a problem hiding this comment.
This is probably fine for now, but reducing the fitness can skew some of our assumptions about back edges and such.
One other thing to note for the future, is that the fitness at an exit might help us pick a better warmup value for that exit.
There was a problem hiding this comment.
Mark is right, we should remove this feature. Side exits are important and we shouldn't penalize them.
|
@markshannon Thanks for review! I'm holding off on changing the fitness parameters for now, but I can run some benchmarks if you think it's necessary. |
|
Still seeing a big slowdown in richards on https://github.com/colesbury/fastmark: Main of this branch: This branch: I'm going to check if this is affecting the optimizer somehow. |
…ER_EXECUTOR for RESUME
Treat back edges as an exit, not a penalty, this way they are more likely to end at a backedge instead of ending at random spots
| OPT_STAT_INC(trace_too_long); | ||
| goto done; | ||
| } | ||
| assert(uop_buffer_remaining_space(trace) > space_needed); |
There was a problem hiding this comment.
MAX_TARGET_LENGTH must be less than UOP_MAX_TRACE_LENGTH / OPTIMIZER_EFFECTIVENESS; otherwise, assert(uop_buffer_remaining_space(trace) > space_needed) will fail. Or should we revert the assert to a check?
There was a problem hiding this comment.
We want to keep it as an assert.
Otherwise we need to handle an awkward failure mode.
|
Increasing the max trace length is only going to help if the trace is stopping too early. |
|
I think we can safely reduce the max trace length, let me do that. There were two problems with the older code:
New code has almost no slowdown on Richards, and a huge speedup on telco benchmark. Main: |
Reopen: #147966