watch cache: Deflake TestCacheLaggingWatcher #21130
Open
+83
−44
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Close issue #20852
I was able to reproduce the issue with Docker using the same resource constraint as in the CI Kubernetes specs.
Statistics (500 runs each)
Looks like memory constraint has the biggest impact here.
There are actually two types of errors (example CI failure), and they have different root causes, although they both come from the same test case.
Error Type 1:
gotEvents=X, wantEvents<1What the test expects:
buffer=0(unbuffered channel) andwindow=10What races:
generateEvents()writes 12 events to etcdstoreW.respChbroadcastEventsLocked()sends events to active watchersresyncLaggingWatchers()runs every 10mscollectAndAssertAtomicEvents()reads from watch channelThe Race Sequence:
Why non-deterministic:
Fixed in
tests/integration/cache_test.goskipCloseCheckflag since close status is non-deterministic with resyncError Type 2:
cache: stale event batch (rev 32 < latest 42)What races:
store.Apply()The Race Sequence:
Another scenario (demux-level):
Fixed in
cache/store.goandcache/demux.goResults
Both error types are fully eliminated. Would like to verify this on CI as well.