Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
326 changes: 326 additions & 0 deletions docs/architecture/cloud-storage/rook-ceph.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,326 @@
---
title: Installing Rook-Ceph on Kubernetes
---

# Installing Rook-Ceph on Kubernetes

## Overview

This guide provides step-by-step instructions for deploying a Ceph storage
cluster using the Rook operator on Kubernetes. Rook automates the deployment,
configuration, and management of Ceph clusters within Kubernetes environments.

The instructions here are meant only as a general guideline. We recommend that
you use the instructions found in the [official Rook
documentation](https://rook.io/docs/rook/latest/) and the [upstream Ceph
documentation](https://docs.ceph.com/).


## Prerequisites

Before beginning the installation, ensure the following requirements are met:

### Kubernetes Cluster Requirements

- Kubernetes v1.25 or higher
- `kubectl` configured to communicate with your cluster
- Administrator access to the Kubernetes cluster
- At least 3 worker nodes for a production cluster (1 node minimum for testing)
- Verify compatibility between your Kubernetes version and the Rook version you
  intend to deploy — see the [Rook releases page](https://github.com/rook/rook/releases)
  for version compatibility information

### Storage Requirements

- Raw block devices available on worker nodes (unformatted, no filesystem)
- Minimum 10 GB of storage per OSD
- Devices should not be mounted or in use by the operating system

### Network Requirements

- Network connectivity between all cluster nodes
- Network access between pods is handled by the Kubernetes network plugin (CNI).
  Ensure your CNI supports the required pod-to-pod communication. If you need
  to open ports for external access to Ceph services, the typical ports are
  6789, 3300, and 6800-7300.
Comment on lines +44 to +45

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth mentioning what services these give access to, roughly? mon, osd, etc. Also, will rgw be considered in this scenario? Those would likely end up with higher order ports.


### System Requirements

- Linux kernel 4.5 or higher (5.x recommended)
- LVM2 packages installed on all nodes
- Minimum 2 GB RAM per node (4 GB+ recommended)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these recommended by the rook project? Seems oddly low for converged storage nodes.

- `helm` installed if using Helm-based deployment (optional)

## Configuration Options

### Customizing the Cluster

Edit `cluster.yaml` to customize your deployment before creating the cluster:

#### Storage Configuration

Specify which devices to use for OSDs:

```yaml
storage:
  useAllNodes: true
  useAllDevices: false
  deviceFilter: "^sd[b-z]"  # Use sdb, sdc, etc.
```

Or specify devices explicitly:

```yaml
storage:
  nodes:
  - name: "node1"
    devices:
    - name: "/dev/sdb"
  - name: "node2"
    devices:
    - name: "/dev/sdc"
```

#### Resource Limits

Set resource limits for Ceph daemons:

```yaml
resources:
  mon:
    limits:
      cpu: "2000m"
      memory: "4Gi"
    requests:
      cpu: "1000m"
      memory: "2Gi"
  osd:
    limits:
      cpu: "2000m"
      memory: "4Gi"
    requests:
      cpu: "1000m"
      memory: "2Gi"
```

#### Network Configuration

Configure network settings for client and cluster traffic:

```yaml
network:
  provider: host  # or multus for advanced networking
  # Uncomment for dual network configuration
  # connections:
  #   encryption:
  #     enabled: true
```

### Dashboard Access

Enable and access the Ceph dashboard:

```bash
# The dashboard is enabled by default in cluster.yaml

# Get the dashboard password
kubectl -n rook-ceph get secret rook-ceph-dashboard-password \
  -o jsonpath="{['data']['password']}" | base64 --decode && echo

# Port-forward to access the dashboard
kubectl -n rook-ceph port-forward service/rook-ceph-mgr-dashboard 8443:8443
```

Access the dashboard at: `https://localhost:8443`

Username: `admin`
Password: (from the command above)

## Creating Storage Classes

### Block Storage (RBD)

Create a storage class for block devices:

```bash
kubectl create -f csi/rbd/storageclass.yaml
```

Test the storage class:

```bash
# Create a test PVC
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: rbd-pvc
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: rook-ceph-block
EOF

# Verify PVC is bound
kubectl get pvc rbd-pvc
```

### File Storage (CephFS)

Deploy the CephFS filesystem:

```bash
kubectl create -f filesystem.yaml
```

Create a storage class for shared filesystem:

```bash
kubectl create -f csi/cephfs/storageclass.yaml
```

### Object Storage (RGW)

Deploy the object storage service:

```bash
kubectl create -f object.yaml
```

Wait for the RGW pods to be ready:

```bash
kubectl -n rook-ceph get pods -l app=rook-ceph-rgw
```

## Verification

### Verify All Storage Types

Check that all storage components are operational:

```bash
# Check block storage
kubectl get storageclass rook-ceph-block

# Check filesystem storage
kubectl get storageclass rook-cephfs

# Check object storage
kubectl -n rook-ceph get cephobjectstore
```

### Test Storage Functionality

Create test workloads using each storage type:

```bash
# Test RBD block storage
kubectl create -f csi/rbd/pvc.yaml
kubectl create -f csi/rbd/pod.yaml

# Test CephFS
kubectl create -f csi/cephfs/pvc.yaml
kubectl create -f csi/cephfs/pod.yaml
```

## Troubleshooting

### Common Issues

**Operator not starting:**

```bash
# Check operator logs
kubectl -n rook-ceph logs -l app=rook-ceph-operator
```

**OSDs not starting:**

```bash
# Check OSD prepare logs
kubectl -n rook-ceph logs -l app=rook-ceph-osd-prepare

# Verify devices are available and unused
kubectl -n rook-ceph exec -it deployment/rook-ceph-tools -- ceph-volume inventory
```

**Cluster stuck in HEALTH_WARN:**

```bash
# Check detailed cluster status
kubectl -n rook-ceph exec -it deployment/rook-ceph-tools -- ceph health detail

# Check for common issues
kubectl -n rook-ceph exec -it deployment/rook-ceph-tools -- ceph -s
```

## Cleanup

To remove the Rook-Ceph cluster:

**Note:** Rook uses Kubernetes finalizers to protect resources from accidental
deletion. If `kubectl delete` commands hang, you may need to manually remove
finalizers from the relevant custom resources. See the
[Rook cleanup documentation](https://rook.io/docs/rook/latest/Storage-Configuration/ceph-teardown/)
for details.

```bash
# Delete the cluster
kubectl delete -f cluster.yaml

# Delete object storage (if created)
kubectl delete -f object.yaml

# Delete filesystem (if created)
kubectl delete -f filesystem.yaml

# Delete the operator
kubectl delete -f operator.yaml
kubectl delete -f common.yaml
kubectl delete -f crds.yaml
```

**Cleaning up storage on nodes (CAUTION: This deletes all data):**

Run the following on each node that had OSDs. In addition to removing the Rook
data directory, the raw block devices used by OSDs must be wiped before they
can be reused:

```bash
# Remove Rook data directory
sudo rm -rf /var/lib/rook

# Wipe each OSD device (replace /dev/sdX with the actual device name)
sudo sgdisk --zap-all /dev/sdX
```

## Next Steps

After successful installation:

1. Configure monitoring with Prometheus and Grafana
2. Set up backup and disaster recovery procedures
3. Implement resource quotas and limits
4. Configure advanced networking if required
5. Review and adjust Ceph configuration parameters
6. Set up regular maintenance schedules

## Additional Resources

- Official Rook documentation: https://rook.io/docs/rook/latest/
- Ceph documentation: https://docs.ceph.com/
- Rook GitHub repository: https://github.com/rook/rook
- Rook Slack community: https://rook-io.slack.com/

## Notes

- This guide provides a basic Rook-Ceph deployment. While the prerequisites
  describe a production-grade setup, additional considerations apply for
  production environments, including high availability, performance tuning,
  and security hardening.
- Always test deployment procedures in a non-production environment first.
- Keep Rook and Ceph versions updated for security and stability improvements.
4 changes: 4 additions & 0 deletions docs/architecture/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,7 @@ CobaltCore is built on top of OpenStack and IronCore, leveraging their capabilit
- [**HA Service**](./cluster#ha-service): The high availability service that ensures critical workloads remain operational even in the event of failures.
- [**Cortex**](./cortex): Smart initial placement and scheduling service for compute, storage, and network in cloud-native cloud environments.
- [**Cloud Storage**](./cloud-storage/): Ceph-based distributed storage stack including Rook, Chorus, Arbiter, and Prysm for lifecycle management, replication, quorum, and observability.
- [**Ceph**](./cloud-storage/ceph): An all-in-one storage system that provides object, block, and file storage and delivers extraordinary scalability.
- [**Rook-Ceph Installation**](./cloud-storage/rook-ceph.md): A procedure for
deploying the all-in-one storage system that provides object, block, and file
storage and delivers extraordinary scalability.
Loading