This document outlines the development roadmap for the RustFS Kubernetes Operator. The roadmap is organized by release versions and includes features, improvements, and technical debt items.
Last Updated: 2026-03-28 Current Version: 0.1.0 (pre-release)
- Basic Tenant CRD with multi-pool support
- RBAC resource management (Role, ServiceAccount, RoleBinding)
- Service creation (IO, Console, Headless)
- StatefulSet generation per pool
- Persistent volume management with volume claim templates
- Per-pool scheduling configuration (nodeSelector, affinity, tolerations, resources)
- Automatic RUSTFS_VOLUMES configuration
- Required RustFS environment variables
- CRD validation rules (servers, volumes, credentials)
- Certificate and TLS utilities (RSA, ECDSA, Ed25519)
- Kubernetes events for reconciliation actions
- Test infrastructure with helper utilities
- Tenant status: conditions (
Ready,Progressing,Degraded), overall state, per-pool status from StatefulSets (successful reconcile path) - StatefulSet create/update with safe update validation and apply when spec changes
- Operator HTTP console API and
consoleCLI subcommand;console-webmanagement UI
- No integration or E2E tests in-repo (unit tests only)
- Tenant status not always updated when reconcile returns early with an error (e.g. credential/KMS validation)
- Status subresource patch: only one conflict retry; stronger backoff optional
- TLS certificate rotation not automated
- Advanced StatefulSet rollout (rollback, extra strategy options) still open
Focus: Production readiness for basic deployments
-
Secret-based credential management ✅ COMPLETED (2025-11-15)
- ✅ Support for reading credentials from Kubernetes Secrets
- ✅ Secure credential injection via
secretKeyRef(credentials never loaded into operator memory) - ✅ Validation of Secret structure:
- Secret exists in same namespace
- Contains required keys (
accesskey,secretkey) - Valid UTF-8 encoding
- Minimum 8 characters for both keys
- ✅ Backward compatibility with environment variables
- ✅ Comprehensive error messages and event recording
- ✅ Smart retry logic (60s for credential errors, 5s for API errors)
- ✅ Production-ready examples and documentation
- See:
examples/secret-credentials-tenant.yaml, Issue #41
-
Status conditions (happy path) ✅ —
Ready/Progressing/Degraded, pool-level status; seesrc/reconcile.rs -
Status on reconciliation errors — Surface failing state when reconcile exits early (credentials, validation, etc.); related to Issue #42
-
StatefulSet update (core) ✅ — Validate immutable fields, apply when
statefulset_needs_update; per-pool status from STS -
StatefulSet rollout extras — Rollback, configurable strategies, richer revision tracking (beyond current behavior)
-
Improved error handling and observability
- Prometheus metrics (reconciliation duration, error rates, pool health)
- Broader event coverage if gaps remain
- Note: structured logging (
tracing) anderror_policyrequeue tiers exist today
-
Configuration validation enhancements
- Validate storage class exists before creating PVCs
- Check node selector labels match available nodes
- Validate resource requests don't exceed node capacity
- Warn on mixing storage classes (performance implications)
-
Documentation improvements
- API reference documentation (CRD fields)
- Operator deployment guide (Helm chart, manifests)
- Troubleshooting guide with common issues
- Migration guide from direct StatefulSet deployments
-
Integration test suite
- Kind/k3s-based integration tests
- Test tenant lifecycle (create, update, delete)
- Test pool scaling operations
- Test error recovery scenarios
-
E2E tests
- Real RustFS deployment testing
- Data persistence verification
- Upgrade/downgrade scenarios
- Disaster recovery testing
Focus: Advanced lifecycle management and operational features
-
Tenant lifecycle management
- Finalizers for graceful deletion
- Orphaned resource cleanup
- Pre-deletion validation (check for data)
- Backup integration hooks
-
Pool lifecycle management
- Safe pool addition with data rebalancing awareness
- Pool removal with decommissioning checks
- Pool expansion (increase servers/volumes)
- Pool migration support
-
TLS/Certificate management
- Automatic certificate generation (cert-manager integration)
- Certificate rotation automation
- Support for custom CA certificates
- mTLS between RustFS servers
-
Monitoring and alerting
- RustFS metrics scraping and exposure
- ServiceMonitor CRD for Prometheus Operator
- Grafana dashboard templates
- Alert rules for common issues
-
Backup and disaster recovery
- Integration with Velero
- Snapshot management
- Point-in-time recovery documentation
- Multi-cluster replication guidance
-
Resource optimization
- Automatic resource right-sizing recommendations
- Storage capacity monitoring and alerts
- Cost optimization insights (spot instance viability)
- Performance profiling tools
Focus: Multi-tenancy, security, and compliance
-
Multi-tenancy enhancements
- Namespace isolation best practices
- Resource quota integration
- Network policy templates
- Tenant isolation verification
-
Security hardening
- Pod Security Standards compliance (restricted profile)
- Seccomp and AppArmor profiles
- Read-only root filesystem support
- Non-root container support
- Secrets encryption at rest
-
Compliance and audit
- Audit logging for all operator actions
- Compliance report generation (PCI, HIPAA, SOC2)
- RBAC audit tools
- Security scanning integration (Trivy, Snyk)
-
Advanced scheduling
- Cluster autoscaler integration
- Pod disruption budgets
- Priority classes for critical workloads
- Custom scheduler support
-
Networking enhancements
- Ingress/Gateway API integration
- Service mesh compatibility (Istio, Linkerd)
- Network policy generation
- External DNS integration
-
Storage enhancements
- Storage class auto-detection
- Volume expansion support
- Snapshot scheduling
- Tiering policy management (RustFS lifecycle)
Focus: Stability, documentation, and ecosystem integration
-
Stability requirements
- 3 months without critical bugs
- 95%+ test coverage
- Performance benchmarks published
- Upgrade path from all 0.x versions
-
Documentation completeness
- Complete API documentation
- Production deployment guides
- Architecture deep-dive
- Runbooks for common operations
- Video tutorials and demos
-
Ecosystem integration
- OperatorHub.io listing
- Artifact Hub listing
- Helm chart repository
- OLM (Operator Lifecycle Manager) support
- Kustomize examples
-
Community and support
- Active community channels (Slack, Discord, forum)
- Regular release cadence (monthly)
- Public roadmap with user voting
- Commercial support options documented
- GitOps integration: ArgoCD/Flux declarative configuration
- Multi-cluster management: Federated tenant deployments
- Advanced replication: Cross-cluster data replication
- AI/ML workload optimization: Specialized configurations for AI storage patterns
- Edge deployment support: Lightweight operator for edge Kubernetes
- Operator SDK migration: Consider migrating to operator-sdk framework
- Custom admission webhooks: Additional validation and mutation logic
- Backup operator integration: Dedicated backup operator with CRD
- Refactor reconciliation loop for better testability
- Extract stateful set generation into separate module
- Improve error types with more context
- Add comprehensive inline documentation
- Standardize naming conventions across codebase
- Consider using
kube-runtimefinalizers API -
k8s-openapi / kube from crates.io— Using crates.io versions (seeCargo.toml); keep pinned upgrades deliberate - Performance profiling and optimization
- Memory usage analysis and optimization
- Reduce binary size (investigate dependencies)
- Migrate build system to modern Rust practices
- Consider async runtime optimizations
- Evaluate alternative Kubernetes client libraries
- Code generation for boilerplate reduction
- CONTRIBUTING.md and contributor workflow (
make pre-commit) - Pull request template (
.github/pull_request_template.md) - GitHub issue templates and
good-first-issuelabels - Regular community meetings (monthly)
- Core developer docs (
docs/DEVELOPMENT.md,CLAUDE.md,docs/architecture-decisions.md) — expand as needed
- Collaborate with RustFS core team
- Partner with Kubernetes SIG Storage
- Engage with CNCF projects (cert-manager, external-secrets)
- Work with cloud providers for validation
- Collaborate with observability vendors (Datadog, New Relic)
- Kubernetes: v1.27+ (current target: v1.30)
- Rust: 1.91+ (edition 2024)
- RustFS: Version compatibility matrix TBD
- kube / k8s-openapi: crates.io versions in
Cargo.toml
- cert-manager: v1.12+ (for TLS automation)
- Prometheus Operator: v0.68+ (for monitoring)
- Velero: v1.12+ (for backup)
- external-secrets: v0.9+ (for secret management)
We welcome community input on this roadmap. You can:
- Vote on features: Comment on issues with 👍 for features you need
- Propose new features: Open an issue with the
enhancementlabel - Discuss priorities: Join our community meetings
- Share use cases: Help us understand your deployment scenarios
- Contribute code: Pick up items marked as
good-first-issue
Discussion Forum: https://github.com/orgs/rustfs/discussions Issue Tracker: https://github.com/rustfs/operator/issues
We track these metrics to measure progress:
- Stability: Mean time between failures (MTBF)
- Performance: Reconciliation time, resource usage
- Quality: Test coverage, bug count, security vulnerabilities
- Adoption: GitHub stars, downloads, production deployments
- Community: Contributors, PR velocity, issue resolution time
Note: This roadmap is a living document and subject to change based on community feedback, RustFS evolution, and Kubernetes ecosystem developments. Dates are estimates and may shift based on priorities and available resources.