Skip to content

FLPATH-4176: k8s-storage-sp enhancement and cross-references#54

Open
LinskId wants to merge 1 commit into
dcm-project:mainfrom
LinskId:FLPATH-4176_k8s_storage_sp
Open

FLPATH-4176: k8s-storage-sp enhancement and cross-references#54
LinskId wants to merge 1 commit into
dcm-project:mainfrom
LinskId:FLPATH-4176_k8s_storage_sp

Conversation

@LinskId

@LinskId LinskId commented Jun 8, 2026

Copy link
Copy Markdown

Summary

Adds the Kubernetes Storage Service Provider enhancement and related DCM docs for standalone persistent storage (FLPATH-4176 / FLPATH-4113).

Introduces storage as the fifth portable service type in service-type-definitions (capacity, accessMode, optional providerHints.kubernetes for StorageClass / volumeMode).

Defines the K8s reference SP: a DCM control-plane adapter that submits and watches PVCs via the Kubernetes API — not a CSI or storage backend. One SP instance per managed cluster; cluster must already have StorageClass + CSI.

v1 operations: CREATE, READ, UPDATE (capacity expansion only), DELETE; status via CloudEvents on NATS; deployment model aligned with k8s-container-service-provider.

Also updates user flows, storage status enum (PROVISIONING, RUNNING, FAILED, …), and dcm.storage in the status-reader doc.

Flows defined
Catalog / service type — User orders a portable storage catalog item; Placement → SP Resource Manager invokes the registered K8s Storage SP.

SP registration — On startup, SP self-registers with DCM Service Provider Manager (serviceType: storage, ops CREATE/READ/UPDATE/DELETE, endpoint /api/v1alpha1/volumes).

Volume create — POST /volumes → PVC in configured namespace (SP_K8S_NAMESPACE) with DCM labels → cluster/CSI provisions → SP watches PVC phase → publishes status to NATS.

Volume expand — PATCH /volumes/{id} with larger capacity after validating allowVolumeExpansion (optional ResourceQuota check); expansion completes asynchronously on the cluster.

Volume read / delete — GET (list/get by dcm-instance-id); DELETE removes PVC.

Status reporting — SharedIndexInformer on labeled PVCs maps Pending/Bound/resize conditions to DCM storage status.

SP health — DCM polls SP /health (same pattern as other SPs).

Questions for reviewers

  1. Epic scope (FLPATH-4113): Is the epic satisfied by portable storage + K8s reference SP only, or are enhancement docs required for other platforms (VMware, OpenStack, …)?

  2. Which NATS subject format should the storage SP use? Service Provider Status Reporting and SP Resource Status Reader specify dcm.{serviceType} (e.g., dcm.storage), but k8s-container-sp documents a hierarchical subject (dcm.providers.{providerName}.container.instances.{instanceId}.status).
    Which contract should this implementation follow?

  3. Namespace: v1 single SP_K8S_NAMESPACE vs per-instance namespace (Alternative deferred to v2; multi-SP-per-tenant workaround documented).

Signed-off-by: Idit Gavra igavra@redhat.com
Assisted-By: Cursor AI

@machacekondra

Copy link
Copy Markdown
Collaborator

The DCO job fails, because the commit is missing the Signed-off-by: in the commit message, can you please add it?

@machacekondra

Copy link
Copy Markdown
Collaborator

Epic scope (FLPATH-4113): Is the epic satisfied by portable storage + K8s reference SP only, or are enhancement docs required for other platforms (VMware, OpenStack, …)?

I think we should have enhancement per platform.

Which NATS subject format should the storage SP use? Service Provider Status Reporting and SP Resource Status Reader specify dcm.{serviceType} (e.g., dcm.storage), but k8s-container-sp documents a hierarchical subject (dcm.providers.{providerName}.container.instances.{instanceId}.status).
Which contract should this implementation follow?

Currently, we more or less ignore subject, it must just have dcm. prefix. The rest of the data are send via CloudEvent, see: https://github.com/dcm-project/service-provider-manager/blob/main/internal/consumer/consumer.go#L130
We need to fix those inconsistencies in enhancement documents, per what is implemented.

  1. Namespace: v1 single SP_K8S_NAMESPACE vs per-instance namespace (Alternative deferred to v2; multi-SP-per-tenant workaround documented).

This is another subject we need to figure out. IMHO as part of another enhancement.

@LinskId

LinskId commented Jun 10, 2026

Copy link
Copy Markdown
Author

I added a comment about volumes encryption in the enhancement

Comment thread enhancements/k8s-storage-sp/k8s-storage-sp.md
Comment on lines +133 to +139
> **Question for reviewers:** Which NATS subject format should the storage SP
> use? [Service Provider Status Reporting](../state-management/service-provider-status-reporting.md)
> and [SP Resource Status Reader](../sp-resource-status-reader/sp-resource-status-reader.md)
> specify `dcm.{serviceType}` (e.g., `dcm.storage`), but
> [k8s-container-sp](k8s-container-sp.md) documents a hierarchical subject
> (`dcm.providers.{providerName}.container.instances.{instanceId}.status`).
> Which contract should this implementation follow?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest moving this into the Open Questions section (as per template ) so we won't forget to answer all the questions.

I vote for dcm.storage because that matches what the existing SPs use. @gabriel-farache wdyt?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on the topic name, I think dcm.{serviceType} is the name used in the implementations of existing SPs

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should I keep the question open?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can update the doc and remove the question according to the info in this thread :)

Comment on lines +163 to +164
Users can override StorageClass and access mode per volume via
`providerHints.kubernetes` (see POST endpoint documentation).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this pr in service-type-definitions.md defines accessMode as a top-level Storage field (not under providerHints.kubernetes). This line says users override access mode via providerHints.kubernetes, which contradicts that schema. What is the correct way?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, I think we currently have no way to leverage the providerHints data, we had the issue in the K8s Container SP witht the visibility and we ended up adding a dedicated field instead

Comment thread enhancements/k8s-storage-sp/k8s-storage-sp.md Outdated
Comment thread enhancements/k8s-storage-sp/k8s-storage-sp.md Outdated
Comment thread enhancements/k8s-storage-sp/k8s-storage-sp.md Outdated
Comment thread enhancements/k8s-storage-sp/k8s-storage-sp.md
(`allowVolumeExpansion: false` or not set)
- **400 Bad Request**: New capacity is smaller than or equal to current capacity
(shrinking not supported)
- **409 Conflict**: Namespace `ResourceQuota` would be exceeded (when quota check

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ResourceQuota handling is ambiguous: lines 384–386 say the SP may reject (optional), but here 409 Conflict “when quota check is implemented”. Can you clarify?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“when quota check is implemented” is confusing indeed, either there is a ResourceQuota and the SP should check it before proceeding and if the patch would violate it, then send the 409 or there is no ResourceQuota and the SP does nothing
But the SP should definitely implement the ResourceQuota check

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 @gabriel-farache

Maybe like "when a namespace ResourceQuota on requests.storage exists, SP must pre-check on PATCH and return 409 if exceeded; if no quota object, skip"

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated: when a namespace-wide ResourceQuota on requests.storage exists, SP pre-checks on PATCH and returns 409 if exceeded; if no such quota object, skip the check.

@gabriel-farache gabriel-farache left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments
Also, the changes on the other files are unrelated to the current PR, right? Maybe do not add them in the PR to keep it clean
We should have another PR in which we run the make targets to formats the file so we are all on the same page

Comment thread enhancements/k8s-storage-sp/k8s-storage-sp.md
Comment thread enhancements/k8s-storage-sp/k8s-storage-sp.md Outdated
@jenniferubah

jenniferubah commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

I agree with @gabriel-farache, let's move the formatting and cross reference updates into another PR to keep the current PR easier to review.

| Field | Required | Type | Description |
| :--------- | :------- | :----- | :------------------------------------------------------------- |
| capacity | Yes | string | Volume size with unit (e.g., _100Gi_, _1TB_) |
| accessMode | No | string | Access mode (_ReadWriteOnce_, _ReadOnlyMany_, _ReadWriteMany_) |

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure, if openstack, vmware etc would have accessMode same as kubernetes. Maybe this should go to providerHints as well?

@gciavarrini gciavarrini Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @machacekondra has a good point on this. But this is in contrast with @gabriel-farache opinion see #54 (comment)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right that accessMode is a k8s term, but although it's rare, it exists in other platforms:
OpenStack “multiattach=True”
VMware “multi-writer”
Ceph RBD “exclusive-lock=false”
NFS/CephFS “inherently shared”
Kubernetes accessModes (Pod-level, not host-level)

Do you think attachmentMode is better?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, after rethinking I think you are right, I will change it

@LinskId LinskId force-pushed the FLPATH-4176_k8s_storage_sp branch from 0424bb2 to d05138c Compare June 11, 2026 13:07
not hold multiple kubeconfigs or manage multiple clusters.
- Production: runs as a Kubernetes Deployment in the target cluster (using
in-cluster service account) or as an external service with kubeconfig access.
- Local development: uses the `k8s-storage` profile in

@machacekondra machacekondra Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that those scripts will be part of control-plane for now:
dcm-project/control-plane#10

But I think it's not really imporant to mention anything like this in enchacement doc.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed test plan. I also explain better the usage when Multi-Backend + Multi-Tenant Storage

@LinskId LinskId force-pushed the FLPATH-4176_k8s_storage_sp branch from d05138c to 4220917 Compare June 15, 2026 14:51
storage that can be requested through DCM catalog items and attached to
workloads (for example, a database PVC for a three-tier application). This
enhancement defines the Kubernetes Storage Service Provider that implements the
portable `storage` service type on Kubernetes clusters.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the portable storage service type

Where is this service type defined? According to https://github.com/LinskId/enhancements/blob/main/enhancements/service-type-definitions/service-type-definitions.md the services types are

  • Virtual Machine
    Virtual machines with CPU, memory, storage, and OS specifications

  • Container
    Fields common to Kubernetes, Docker, Podman, Openshift, CRI-O, containerd

  • Cluster
    Fields common to Kubernetes, OpenShift, EKS, GKE, AKS, and other distributions

  • Database
    Fields common across all database types (SQL, NoSQL, search, time-series, etc.)

service-type-definitions.md should be updated accordingly

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It used to be in service-type-definitions.md, I thought we wanted to move those changes into another PR.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I may not have followed that part.
I am fine with having it in another PR, just a matter of ordering: if this SP's enhancement is to implement the storage service types, it is odd to have the SP enhancement done before the new service type enhancement is there

@LinskId LinskId Jun 18, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, I added the type and flow back and other cross refs

@gabriel-farache gabriel-farache left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The definition of the storage service type that is missing, apart from that, all looks good to me
We can keep the providerHints in here but until we define how it works "in real life" the implementation will likely diverge from that and introduce a new static field

@LinskId LinskId force-pushed the FLPATH-4176_k8s_storage_sp branch from 6d0556b to b0de216 Compare June 18, 2026 10:20
Comment thread enhancements/user-flows/user-flows.md Outdated
SP->>SP: Map platform status → DCM status
SP->>SP: Build CloudEvent
SP->>MSG: Publish to:<br/>dcm.providers.{provider}.{serviceType}<br/>.instances.{instanceId}.status
SP->>MSG: Publish to:<br/>dcm.providers.{provider}.{serviceType}<br/>.instances.{instanceId}.status<br/>(or e.g. dcm.storage)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this change?

@LinskId LinskId Jun 18, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to align with the storage NATs subject suggested in k8s-storage-sp, which is dcm.storage

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove it and only refer to this path in enhancement for now

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

userflow.md (sections 5.3 and 6.5) still use the hierarchical NATS subject.
k8s-storage-sp.md and service-provider-status-reporting.md define storage status on dcm.storage. It's confusing, can you please double check?

Comment thread enhancements/user-flows/user-flows.md Outdated
A --> B[Map platform status<br/>to DCM status enum]
B --> C[Build CloudEvent v1.0]
C --> D[Publish to NATS<br/>dcm.providers.provider.serviceType<br/>.instances.instanceId.status]
C --> D[Publish to NATS<br/>dcm.providers.{provider}.{serviceType}<br/>.instances.{instanceId}.status<br/>(or e.g. dcm.storage)]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@LinskId LinskId Jun 18, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is related to the question (I can ignore it for now):

Which NATS subject format should the storage SP use? Service Provider Status Reporting and SP Resource Status Reader specify dcm.{serviceType} (e.g., dcm.storage), but k8s-container-sp documents a hierarchical subject (dcm.providers.{providerName}.container.instances.{instanceId}.status).
Which contract should this implementation follow?

Currently, we more or less ignore subject, it must just have dcm. prefix. The rest of the data are send via CloudEvent, see: https://github.com/dcm-project/service-provider-manager/blob/main/internal/consumer/consumer.go#L130
We need to fix those inconsistencies in enhancement documents, per what is implemented.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to fix those inconsistencies in enhancement documents, per what is implemented.

Agree but I think we should do that in another PR once and for all

I am not against defining here the name of the topic it's just that it is odd to have in the flow diagram instead of in a table or something like that

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this in contrast with my #54 (comment) ?
@gabriel-farache it's better to clarify what we expect so @LinskId can understand the direction.

@gabriel-farache gabriel-farache left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apart from the weird change in the user-flows, LGTM

@LinskId LinskId force-pushed the FLPATH-4176_k8s_storage_sp branch from b0de216 to a0aeff7 Compare June 18, 2026 11:17
@LinskId LinskId force-pushed the FLPATH-4176_k8s_storage_sp branch from a0aeff7 to f0681b0 Compare June 23, 2026 14:14
Signed-off-by: igavra <igavra@redhat.com>

Assisted-By: Cursor AI
@LinskId LinskId force-pushed the FLPATH-4176_k8s_storage_sp branch from f0681b0 to b2f2188 Compare June 23, 2026 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants