-
Notifications
You must be signed in to change notification settings - Fork 2.3k
[WIP] feat(account-controller): Add resource suspend/resume with state restoration #6148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Whoa! Easy there, Partner!This PR is too big. Please break it up into smaller PRs. |
4f829c6 to
f0f494b
Compare
…restored to their historical state
6ee9da3 to
0da6300
Compare
413949b to
a49f1a6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds comprehensive suspend/resume functionality to the account controller for managing user resources during debt or network suspension scenarios. The implementation saves original state before suspension and restores it upon resume, handling various Kubernetes resources including Deployments, StatefulSets, ReplicaSets, CronJobs, Jobs, Devboxes, KubeBlocks Clusters, Certificates, and Ingresses.
Key changes:
- Implements state management for 10+ resource types with encode/decode functions for preserving original configurations
- Adds HPA (HorizontalPodAutoscaler) suspension/restoration logic for frontend-deployed applications
- Introduces concurrent deletion with wait mechanisms for backup resources
- Enhances error handling by collecting errors instead of failing fast
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| controllers/account/main.go | Adds devbox API scheme registration for controller access |
| controllers/account/deploy/manifests/deploy.yaml | Expands RBAC permissions for devboxes, certificates, ingresses, and HPAs |
| controllers/account/controllers/suspend_state.go | Defines state structures and encode/decode functions for all resource types |
| controllers/account/controllers/suspend_state_test.go | Comprehensive unit tests for state encode/decode functions |
| controllers/account/controllers/namespace_controller_test.go | Integration tests covering suspend/resume workflows for all resource types |
| controllers/account/controllers/namespace_controller.go | Core implementation of suspend/resume logic with state management |
| // 列出所有资源 | ||
| list, err := dynamicClient.Resource(gvr).Namespace(namespace).List(ctx, v12.ListOptions{}) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to list %s in namespace %s: %w", gvr, namespace, err) | ||
| } | ||
|
|
||
| if len(list.Items) == 0 { | ||
| return nil // 无资源需要删除 | ||
| } | ||
|
|
||
| // 并发删除:使用WaitGroup和error channel收集错误 | ||
| var wg sync.WaitGroup | ||
| errCh := make(chan error, len(list.Items)) // 缓冲channel,避免阻塞 | ||
| allErrors := []error{} | ||
|
|
||
| for _, item := range list.Items { | ||
| name := item.GetName() | ||
| wg.Add(1) | ||
| go func(resName string) { | ||
| defer wg.Done() | ||
| if deleteErr := deleteResourceAndWait(dynamicClient, gvr, namespace, resName); deleteErr != nil { | ||
| errCh <- fmt.Errorf("failed to delete %s/%s: %w", gvr, resName, deleteErr) | ||
| } | ||
| }(name) | ||
| } | ||
|
|
||
| // 等待所有Goroutine完成,并收集错误 | ||
| go func() { | ||
| wg.Wait() | ||
| close(errCh) | ||
| }() | ||
|
|
||
| for deleteErr := range errCh { | ||
| allErrors = append(allErrors, deleteErr) | ||
| } | ||
|
|
||
| if len(allErrors) > 0 { | ||
| return fmt.Errorf("failed to delete some %s resources: %v", gvr, allErrors) | ||
| } | ||
|
|
||
| return nil | ||
| } | ||
|
|
||
| func deleteResourceAndWait( | ||
| dynamicClient dynamic.Interface, | ||
| gvr schema.GroupVersionResource, | ||
| namespace, name string, | ||
| ) error { | ||
| ctx := context.Background() | ||
| deletePolicy := v12.DeletePropagationForeground // 前台删除,等待子资源 | ||
|
|
||
| // 执行删除(针对单个资源) | ||
| err := dynamicClient.Resource(gvr).Namespace(namespace).Delete(ctx, name, v12.DeleteOptions{ | ||
| PropagationPolicy: &deletePolicy, | ||
| }) | ||
| if err != nil && !errors.IsNotFound(err) { | ||
| return fmt.Errorf("failed to delete %s/%s: %w", gvr, name, err) | ||
| } | ||
| if errors.IsNotFound(err) { | ||
| return nil // 已不存在,无需等待 | ||
| } | ||
|
|
||
| // 等待删除完成:轮询Get直到NotFound | ||
| pollInterval := 5 * time.Second | ||
| timeout := 5 * time.Minute // 根据finalizer复杂度调整 | ||
| err = wait.PollUntilContextTimeout(ctx, pollInterval, timeout, true, | ||
| func(ctx context.Context) (bool, error) { | ||
| // 使用retry.Backoff可选重试Get(处理临时错误) | ||
| dErr := retry.OnError(wait.Backoff{ | ||
| Steps: 5, | ||
| Duration: 10 * time.Second, | ||
| Factor: 1.0, | ||
| Jitter: 0.1, | ||
| }, func(err error) bool { | ||
| return errors.IsServerTimeout(err) || errors.IsServiceUnavailable(err) | ||
| }, func() error { | ||
| _, getErr := dynamicClient.Resource(gvr). | ||
| Namespace(namespace). | ||
| Get(ctx, name, v12.GetOptions{}) | ||
| if errors.IsNotFound(getErr) { | ||
| return nil // 成功:资源已删除 | ||
| } | ||
| if getErr != nil { | ||
| // 其它错误:继续轮询 | ||
| return getErr | ||
| } | ||
| // 资源仍存在:继续轮询 | ||
| return errors2.New("resource still exists") | ||
| }) | ||
| return dErr == nil, dErr |
Copilot
AI
Nov 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chinese comments should be translated to English for consistency with the rest of the codebase. Found comments like "列出所有资源", "无资源需要删除", "并发删除:使用WaitGroup和error channel收集错误", "缓冲channel,避免阻塞", "等待所有Goroutine完成,并收集错误", "前台删除,等待子资源", "执行删除(针对单个资源)", "已不存在,无需等待", "等待删除完成:轮询Get直到NotFound", "使用retry.Backoff可选重试Get(处理临时错误)", "成功:资源已删除", "其它错误:继续轮询", "资源仍存在:继续轮询", and "根据finalizer复杂度调整".
| gvr schema.GroupVersionResource, | ||
| namespace string, | ||
| ) error { | ||
| ctx := context.Background() |
Copilot
AI
Nov 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Context cancellation is not propagated. The function accepts a ctx parameter but creates a new context.Background() instead of using it. This means any cancellation or timeout from the caller will be ignored, potentially causing operations to run longer than expected.
| dynamicClient dynamic.Interface, | ||
| gvr schema.GroupVersionResource, | ||
| namespace, name string, | ||
| ) error { | ||
| ctx := context.Background() |
Copilot
AI
Nov 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Context cancellation is not propagated. The function creates a new context.Background() instead of accepting and using a context from the caller. This means cancellation or timeouts cannot be properly handled.
| dynamicClient dynamic.Interface, | |
| gvr schema.GroupVersionResource, | |
| namespace, name string, | |
| ) error { | |
| ctx := context.Background() | |
| ctx context.Context, | |
| dynamicClient dynamic.Interface, | |
| gvr schema.GroupVersionResource, | |
| namespace, name string, | |
| ) error { |
This PR implements a resource suspension and restoration system that preserves original resource states when namespaces are suspended for overdue payments and restores them on resume.
Key Features
Supported Resources
Implementation