Skip to content

Conversation

@jbauer12
Copy link

@jbauer12 jbauer12 commented Jan 27, 2026

What this PR does / why we need it:

Adjustments to support ADLS Gen2 Azure Storage in Ray Offline Store.

Which issue(s) this PR fixes:

Fixes##5844


Open with Devin

@jbauer12 jbauer12 requested a review from a team as a code owner January 27, 2026 10:52
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View issue and 3 additional flags in Devin Review.

Open in Devin Review

destination_path = storage.file_options.uri
if not destination_path.startswith(("s3://", "gs://", "hdfs://")):
if not destination_path.startswith(
("s3://", "gs://", "hdfs://", "abfs://", "abfss://")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# Remote storage URI schemes supported by the Ray offline store
# S3: Amazon S3
# GCS: Google Cloud Storage
# HDFS: Hadoop Distributed File System
# Azure: Azure Storage Gen2
REMOTE_STORAGE_SCHEMES = ("s3://", "gs://", "hdfs://", "abfs://", "abfss://")

Can we define a constant for supported remote storage URI schemes at top and use it later at all three locations?

@ntkathole
Copy link
Member

Also, let's update doc https://github.com/feast-dev/feast/blob/master/docs/reference/offline-stores/ray.md?plain=1#L30

@jbauer12 jbauer12 force-pushed the ray_azure_integration branch from a77ca9e to 58d8df3 Compare January 27, 2026 14:41
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

🔴 1 issue in files not directly in the diff

🔴 Ray compute engine job.py not updated to support Azure storage schemes (sdk/python/feast/infra/compute_engines/ray/job.py:208)

The PR adds Azure storage support (abfs://, abfss://) to the Ray offline store by introducing a REMOTE_STORAGE_SCHEMES constant, but fails to update the RayDAGRetrievalJob.persist() method in job.py which still uses a hardcoded tuple ("s3://", "gs://", "hdfs://").

Click to expand

Impact

When users try to persist datasets to Azure storage using the Ray compute engine (via RayDAGRetrievalJob), the code will incorrectly:

  1. Try to check if the Azure path exists as a local file (os.path.exists(destination_path))
  2. Try to create local directories (os.makedirs(os.path.dirname(destination_path), exist_ok=True))

This happens because Azure paths like abfss://container@account.dfs.core.windows.net/path are not recognized as remote storage.

Actual vs Expected

  • Actual: job.py:208 uses ("s3://", "gs://", "hdfs://") which doesn't include Azure schemes
  • Expected: Should use REMOTE_STORAGE_SCHEMES from ray.py which includes ("s3://", "gs://", "hdfs://", "abfs://", "abfss://")

Code comparison

ray.py:70 defines:

REMOTE_STORAGE_SCHEMES = ("s3://", "gs://", "hdfs://", "abfs://", "abfss://")

But job.py:208 still uses:

if not destination_path.startswith(("s3://", "gs://", "hdfs://")):

Recommendation: Import and use REMOTE_STORAGE_SCHEMES from feast.infra.offline_stores.contrib.ray_offline_store.ray or define a shared constant in a common module that both files can import.

View issue and 6 additional flags in Devin Review.

Open in Devin Review

@jbauer12 jbauer12 changed the title feat: adjust ray offline store to support abfs(s) ADLS Azure Storage feat: Adjust ray offline store to support abfs(s) ADLS Azure Storage Jan 27, 2026
@ntkathole
Copy link
Member

Ray compute engine job.py not updated to support Azure storage schemes (sdk/python/feast/infra/compute_engines/ray/job.py:208)

The PR adds Azure storage support (abfs://, abfss://) to the Ray offline store by introducing a REMOTE_STORAGE_SCHEMES constant, but fails to update the RayDAGRetrievalJob.persist() method in job.py which still uses a hardcoded tuple ("s3://", "gs://", "hdfs://").
Click to expand

@jbauer12 I think this comment make sense, we need to update job.py as well

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

🔴 1 issue in files not directly in the diff

🔴 Ray compute engine job.py not updated to support Azure storage schemes (sdk/python/feast/infra/compute_engines/ray/job.py:208)

The PR adds Azure storage support (abfs://, abfss://) to the Ray offline store by introducing a REMOTE_STORAGE_SCHEMES constant, but fails to update the RayDAGRetrievalJob.persist() method in job.py which still uses a hardcoded tuple ("s3://", "gs://", "hdfs://").

Click to expand

Impact

When users try to persist datasets to Azure storage using the Ray compute engine (via RayDAGRetrievalJob), the code will incorrectly:

  1. Try to check if the Azure path exists as a local file (os.path.exists(destination_path))
  2. Try to create local directories (os.makedirs(os.path.dirname(destination_path), exist_ok=True))

This happens because Azure paths like abfss://container@account.dfs.core.windows.net/path are not recognized as remote storage.

Actual vs Expected

  • Actual: job.py:208 uses ("s3://", "gs://", "hdfs://") which doesn't include Azure schemes
  • Expected: Should use REMOTE_STORAGE_SCHEMES from ray.py which includes ("s3://", "gs://", "hdfs://", "abfs://", "abfss://")

Code comparison

ray.py:70 defines:

REMOTE_STORAGE_SCHEMES = ("s3://", "gs://", "hdfs://", "abfs://", "abfss://")

But job.py:208 still uses:

if not destination_path.startswith(("s3://", "gs://", "hdfs://")):

Recommendation: Import and use REMOTE_STORAGE_SCHEMES from feast.infra.offline_stores.contrib.ray_offline_store.ray or define a shared constant in a common module that both files can import.

View issues and 8 additional flags in Devin Review.

Open in Devin Review

@jbauer12 jbauer12 force-pushed the ray_azure_integration branch from bb4d75a to 698bd1a Compare January 29, 2026 15:48
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View issue and 9 additional flags in Devin Review.

Open in Devin Review

Jonas Bauer added 8 commits January 29, 2026 22:25
Signed-off-by: Jonas Bauer <jbauer@easy2parts.com>
Signed-off-by: Jonas Bauer <jbauer@easy2parts.com>
Signed-off-by: Jonas Bauer <jbauer@easy2parts.com>
Signed-off-by: Jonas Bauer <jbauer@easy2parts.com>
Signed-off-by: Jonas Bauer <jbauer@easy2parts.com>
Signed-off-by: Jonas Bauer <jbauer@easy2parts.com>
Signed-off-by: Jonas Bauer <jbauer@easy2parts.com>
Signed-off-by: Jonas Bauer <jbauer@easy2parts.com>
@ntkathole ntkathole force-pushed the ray_azure_integration branch from 177d70d to 764be5a Compare January 29, 2026 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants