Skip to content

Improve error reporting in GitRepo status #4211

@weyfonk

Description

@weyfonk

This is a follow-up to #4167, which laid the groundwork for more transparent and consistent reporting of errors in resource statuses, by:

  • distinguishing errors between retryable and non-retryable ones:
    • retryable errors are typically those related to interacting with the Kubernetes API server. Failures in those contexts may be related to transient network failures, timeouts, resources not being in the expected state yet... Therefore retries with exponential back-off can be suitable, without errors needing to be surfaced to users who may not be able to do anything to mitigate them. When such an error occurs, Fleet should:
      • log it, to keep a trace of it somewhere, but without leading to repeated status updates
      • return a controller Result with a non-zero RequeueAfter and a nil error, allowing a reconcile of the resource to be requeued
    • non-retryable errors may come from e.g. configuration issues, invalid input data, etc, and are not expected to be resolved unless the user does something in that direction, which is why propagating such errors to a resource status is particularly important, especially in cases where users do not have access to Fleet logs. A non-retryable error should lead to:
      • the reconciler returning:
        • an empty controller Result
        • a TerminalError, instructing the reconciler not to requeue the resource, as per controller-runtime docs.
      • the error being propagated to the resource's status, making it visible to users

This logic has already been implemented in #4167, where it is being used for bundles.

This should be implemented for GitRepos as well.

Acceptance criteria:

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    Status

    🆕 New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions