-
Notifications
You must be signed in to change notification settings - Fork 1
Add Rook-Ceph installation procedure #32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
zdover23
wants to merge
1
commit into
cobaltcore-dev:main
Choose a base branch
from
zdover23:docs-2026-04-04-docs-architecture-rook-ceph-install
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,326 @@ | ||
| --- | ||
| title: Installing Rook-Ceph on Kubernetes | ||
| --- | ||
|
|
||
| # Installing Rook-Ceph on Kubernetes | ||
|
|
||
| ## Overview | ||
|
|
||
| This guide provides step-by-step instructions for deploying a Ceph storage | ||
| cluster using the Rook operator on Kubernetes. Rook automates the deployment, | ||
| configuration, and management of Ceph clusters within Kubernetes environments. | ||
|
|
||
| The instructions here are meant only as a general guideline. We recommend that | ||
| you use the instructions found in the [official Rook | ||
| documentation](https://rook.io/docs/rook/latest/) and the [upstream Ceph | ||
| documentation](https://docs.ceph.com/). | ||
|
|
||
|
|
||
| ## Prerequisites | ||
|
|
||
| Before beginning the installation, ensure the following requirements are met: | ||
|
|
||
| ### Kubernetes Cluster Requirements | ||
|
|
||
| - Kubernetes v1.25 or higher | ||
| - `kubectl` configured to communicate with your cluster | ||
| - Administrator access to the Kubernetes cluster | ||
| - At least 3 worker nodes for a production cluster (1 node minimum for testing) | ||
| - Verify compatibility between your Kubernetes version and the Rook version you | ||
| intend to deploy — see the [Rook releases page](https://github.com/rook/rook/releases) | ||
| for version compatibility information | ||
|
|
||
| ### Storage Requirements | ||
|
|
||
| - Raw block devices available on worker nodes (unformatted, no filesystem) | ||
| - Minimum 10 GB of storage per OSD | ||
| - Devices should not be mounted or in use by the operating system | ||
|
|
||
| ### Network Requirements | ||
|
|
||
| - Network connectivity between all cluster nodes | ||
| - Network access between pods is handled by the Kubernetes network plugin (CNI). | ||
| Ensure your CNI supports the required pod-to-pod communication. If you need | ||
| to open ports for external access to Ceph services, the typical ports are | ||
| 6789, 3300, and 6800-7300. | ||
|
|
||
| ### System Requirements | ||
|
|
||
| - Linux kernel 4.5 or higher (5.x recommended) | ||
| - LVM2 packages installed on all nodes | ||
| - Minimum 2 GB RAM per node (4 GB+ recommended) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are these recommended by the rook project? Seems oddly low for converged storage nodes. |
||
| - `helm` installed if using Helm-based deployment (optional) | ||
|
|
||
| ## Configuration Options | ||
|
|
||
| ### Customizing the Cluster | ||
|
|
||
| Edit `cluster.yaml` to customize your deployment before creating the cluster: | ||
|
|
||
| #### Storage Configuration | ||
|
|
||
| Specify which devices to use for OSDs: | ||
|
|
||
| ```yaml | ||
| storage: | ||
| useAllNodes: true | ||
| useAllDevices: false | ||
| deviceFilter: "^sd[b-z]" # Use sdb, sdc, etc. | ||
| ``` | ||
|
|
||
| Or specify devices explicitly: | ||
|
|
||
| ```yaml | ||
| storage: | ||
| nodes: | ||
| - name: "node1" | ||
| devices: | ||
| - name: "/dev/sdb" | ||
| - name: "node2" | ||
| devices: | ||
| - name: "/dev/sdc" | ||
| ``` | ||
|
|
||
| #### Resource Limits | ||
|
|
||
| Set resource limits for Ceph daemons: | ||
|
|
||
| ```yaml | ||
| resources: | ||
| mon: | ||
| limits: | ||
| cpu: "2000m" | ||
| memory: "4Gi" | ||
| requests: | ||
| cpu: "1000m" | ||
| memory: "2Gi" | ||
| osd: | ||
| limits: | ||
| cpu: "2000m" | ||
| memory: "4Gi" | ||
| requests: | ||
| cpu: "1000m" | ||
| memory: "2Gi" | ||
| ``` | ||
|
|
||
| #### Network Configuration | ||
|
|
||
| Configure network settings for client and cluster traffic: | ||
|
|
||
| ```yaml | ||
| network: | ||
| provider: host # or multus for advanced networking | ||
| # Uncomment for dual network configuration | ||
| # connections: | ||
| # encryption: | ||
| # enabled: true | ||
| ``` | ||
|
|
||
| ### Dashboard Access | ||
|
|
||
| Enable and access the Ceph dashboard: | ||
|
|
||
| ```bash | ||
| # The dashboard is enabled by default in cluster.yaml | ||
|
|
||
| # Get the dashboard password | ||
| kubectl -n rook-ceph get secret rook-ceph-dashboard-password \ | ||
| -o jsonpath="{['data']['password']}" | base64 --decode && echo | ||
|
|
||
| # Port-forward to access the dashboard | ||
| kubectl -n rook-ceph port-forward service/rook-ceph-mgr-dashboard 8443:8443 | ||
| ``` | ||
|
|
||
| Access the dashboard at: `https://localhost:8443` | ||
|
|
||
| Username: `admin` | ||
| Password: (from the command above) | ||
|
|
||
| ## Creating Storage Classes | ||
|
|
||
| ### Block Storage (RBD) | ||
|
|
||
| Create a storage class for block devices: | ||
|
|
||
| ```bash | ||
| kubectl create -f csi/rbd/storageclass.yaml | ||
| ``` | ||
|
|
||
| Test the storage class: | ||
|
|
||
| ```bash | ||
| # Create a test PVC | ||
| cat <<EOF | kubectl apply -f - | ||
| apiVersion: v1 | ||
| kind: PersistentVolumeClaim | ||
| metadata: | ||
| name: rbd-pvc | ||
| spec: | ||
| accessModes: | ||
| - ReadWriteOnce | ||
| resources: | ||
| requests: | ||
| storage: 1Gi | ||
| storageClassName: rook-ceph-block | ||
| EOF | ||
|
|
||
| # Verify PVC is bound | ||
| kubectl get pvc rbd-pvc | ||
| ``` | ||
|
|
||
| ### File Storage (CephFS) | ||
|
|
||
| Deploy the CephFS filesystem: | ||
|
|
||
| ```bash | ||
| kubectl create -f filesystem.yaml | ||
| ``` | ||
|
|
||
| Create a storage class for shared filesystem: | ||
|
|
||
| ```bash | ||
| kubectl create -f csi/cephfs/storageclass.yaml | ||
| ``` | ||
|
|
||
| ### Object Storage (RGW) | ||
|
|
||
| Deploy the object storage service: | ||
|
|
||
| ```bash | ||
| kubectl create -f object.yaml | ||
| ``` | ||
|
|
||
| Wait for the RGW pods to be ready: | ||
|
|
||
| ```bash | ||
| kubectl -n rook-ceph get pods -l app=rook-ceph-rgw | ||
| ``` | ||
|
|
||
| ## Verification | ||
|
|
||
| ### Verify All Storage Types | ||
|
|
||
| Check that all storage components are operational: | ||
|
|
||
| ```bash | ||
| # Check block storage | ||
| kubectl get storageclass rook-ceph-block | ||
|
|
||
| # Check filesystem storage | ||
| kubectl get storageclass rook-cephfs | ||
|
|
||
| # Check object storage | ||
| kubectl -n rook-ceph get cephobjectstore | ||
| ``` | ||
|
|
||
| ### Test Storage Functionality | ||
|
|
||
| Create test workloads using each storage type: | ||
|
|
||
| ```bash | ||
| # Test RBD block storage | ||
| kubectl create -f csi/rbd/pvc.yaml | ||
| kubectl create -f csi/rbd/pod.yaml | ||
|
|
||
| # Test CephFS | ||
| kubectl create -f csi/cephfs/pvc.yaml | ||
| kubectl create -f csi/cephfs/pod.yaml | ||
| ``` | ||
|
|
||
| ## Troubleshooting | ||
|
|
||
| ### Common Issues | ||
|
|
||
| **Operator not starting:** | ||
|
|
||
| ```bash | ||
| # Check operator logs | ||
| kubectl -n rook-ceph logs -l app=rook-ceph-operator | ||
| ``` | ||
|
|
||
| **OSDs not starting:** | ||
|
|
||
| ```bash | ||
| # Check OSD prepare logs | ||
| kubectl -n rook-ceph logs -l app=rook-ceph-osd-prepare | ||
|
|
||
| # Verify devices are available and unused | ||
| kubectl -n rook-ceph exec -it deployment/rook-ceph-tools -- ceph-volume inventory | ||
| ``` | ||
|
|
||
| **Cluster stuck in HEALTH_WARN:** | ||
|
|
||
| ```bash | ||
| # Check detailed cluster status | ||
| kubectl -n rook-ceph exec -it deployment/rook-ceph-tools -- ceph health detail | ||
|
|
||
| # Check for common issues | ||
| kubectl -n rook-ceph exec -it deployment/rook-ceph-tools -- ceph -s | ||
| ``` | ||
|
|
||
| ## Cleanup | ||
|
|
||
| To remove the Rook-Ceph cluster: | ||
|
|
||
| **Note:** Rook uses Kubernetes finalizers to protect resources from accidental | ||
| deletion. If `kubectl delete` commands hang, you may need to manually remove | ||
| finalizers from the relevant custom resources. See the | ||
| [Rook cleanup documentation](https://rook.io/docs/rook/latest/Storage-Configuration/ceph-teardown/) | ||
| for details. | ||
|
|
||
| ```bash | ||
| # Delete the cluster | ||
| kubectl delete -f cluster.yaml | ||
|
|
||
| # Delete object storage (if created) | ||
| kubectl delete -f object.yaml | ||
|
|
||
| # Delete filesystem (if created) | ||
| kubectl delete -f filesystem.yaml | ||
|
|
||
| # Delete the operator | ||
| kubectl delete -f operator.yaml | ||
| kubectl delete -f common.yaml | ||
| kubectl delete -f crds.yaml | ||
| ``` | ||
|
|
||
| **Cleaning up storage on nodes (CAUTION: This deletes all data):** | ||
|
|
||
| Run the following on each node that had OSDs. In addition to removing the Rook | ||
| data directory, the raw block devices used by OSDs must be wiped before they | ||
| can be reused: | ||
|
|
||
| ```bash | ||
| # Remove Rook data directory | ||
| sudo rm -rf /var/lib/rook | ||
|
|
||
| # Wipe each OSD device (replace /dev/sdX with the actual device name) | ||
| sudo sgdisk --zap-all /dev/sdX | ||
| ``` | ||
|
|
||
| ## Next Steps | ||
|
|
||
| After successful installation: | ||
|
|
||
| 1. Configure monitoring with Prometheus and Grafana | ||
| 2. Set up backup and disaster recovery procedures | ||
| 3. Implement resource quotas and limits | ||
| 4. Configure advanced networking if required | ||
| 5. Review and adjust Ceph configuration parameters | ||
| 6. Set up regular maintenance schedules | ||
|
|
||
| ## Additional Resources | ||
|
|
||
| - Official Rook documentation: https://rook.io/docs/rook/latest/ | ||
| - Ceph documentation: https://docs.ceph.com/ | ||
| - Rook GitHub repository: https://github.com/rook/rook | ||
| - Rook Slack community: https://rook-io.slack.com/ | ||
|
|
||
| ## Notes | ||
|
|
||
| - This guide provides a basic Rook-Ceph deployment. While the prerequisites | ||
| describe a production-grade setup, additional considerations apply for | ||
| production environments, including high availability, performance tuning, | ||
| and security hardening. | ||
| - Always test deployment procedures in a non-production environment first. | ||
| - Keep Rook and Ceph versions updated for security and stability improvements. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth mentioning what services these give access to, roughly? mon, osd, etc. Also, will rgw be considered in this scenario? Those would likely end up with higher order ports.