Skip to content

Conversation

@zhijun42
Copy link
Contributor

@zhijun42 zhijun42 commented Jan 11, 2026

part of issue #16839 and #19806

I'm making a couple of improvements to optimize the memory and CPU usage on the server-side watch implementation. But to enable that, I need to first simplify/clean-up the existing implementation. Otherwise, this system would be hard to reason about and maintain.

Once this pull request is merged, I will post the actual optimization PR.

Overview

The previous implementation had three places a watcher could exist:

  1. synced group - watchers caught up with current revision
  2. unsynced group - watchers that need to fetch events from DB
  3. victims slice - watchers with blocked send channels waiting to retry

This led to:

  • system hard to reason about with these 3 places;
  • complex logic in function cancelWatcher;
  • the victims field was a slice of maps []watcherBatch, making it very cumbersome to iterate/ add/ delete.

So I eliminated victims and significantly simplified the function cancelWatcher.

Now we keep all watchers in either the synced or unsynced group, using a non-nil eventBatch to track pending events.

Other changes:

  • With victims gone, functions syncVictimsLoop / moveVictims are renamed as retryLoop / retryPendingEvents, which are more intuitive.
  • Added comments to existing code to improve readability.

Testing

All existing tests pass, including:

  • TestNewWatcherCountGauge - validates gauge correctness for regular, compacted, and
    race conditions
  • Full mvcc test suite

No new tests are needed as this is a pure refactoring with no behavioral changes.

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: zhijun42
Once this PR has been reviewed and has the lgtm label, please assign serathius for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link

Hi @zhijun42. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

}

// cancelWatcher removes references of the watcher from the watchableStore
func (s *watchableStore) cancelWatcher(wa *watcher) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please try to structure the code in a way to minimize the diff. It's not clear for me why we drop the for loop and how it impacts rest of the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I rearranged the code of cancelWatcher and retryPendingEvents to play well with git diff to show clear ---- and +++.

s.mu.Unlock()
panic("watcher not victim but not in watch groups")
}
// Check if already canceled first to handle double-cancel scenarios
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it was not needed before? If you are adding a new mechanism, maybe do it first in separate mechanism.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was indeed needed before in the else if wa.ch == nil check of the for loop.

func (s *watchableStore) moveVictims() (moved int) {
// retryPendingEvents tries to send events to watchers that have pending event batches.
// Returns the number of watchers whose pending events were successfully sent.
func (s *watchableStore) retryPendingEvents() (sent int) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fail to see "eliminating victims mechanism" as you are just writing same mechanism with different name "retry pending events". What's the benefit here? Can you maybe focus on that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benefit isn't just renaming - it's simplifying the state machine to only two states: synced and unsynced.

In the old implementation,when moveVictims() runs, it does:

  s.mu.Lock()
  victims := s.victims
  s.victims = nil
  s.mu.Unlock()

During this window, a watcher being processed isn't in unsynced, synced, OR s.victims - it's in this local snapshot victims variable. If cancelWatcher runs at this moment, it can't find the watcher anywhere and would have to retry, and thus the complexity of the for loop.

With the new approach, a watcher is never "in flight" - always findable in either synced or unsynced.

Besides, the s.victims field was a slice of maps []watcherBatch, very cumbersome to iterate/ add/ delete.

// It skips watchers with pending events (non-nil eventBatch) as they are handled by retryLoop.
// It also handles compacted watchers by sending compaction responses and removing them.
// Returns the selected watchers and the minimum revision needed for DB fetch.
func (s *watchableStore) selectForSync(maxWatchers int, curRev, compactRev int64) (*watcherGroup, int64) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I appreciate refactor for the choose and chooseAll functions, I don't think it should be in the same PR as removal of victims. Cost of review grows exponentially with size/complexity. Please consider moving extraction of selectForSync function to separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call! Moved it to the new PR #21108

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants