Skip to content

Conversation

@github-actions
Copy link
Contributor

@github-actions github-actions bot commented Oct 9, 2025

🚀 Version bump to Version(__version__).major.Version(__version__).minor.Version(__version__).microrc1.dev0

pablo-garay and others added 28 commits August 19, 2025 11:22
Signed-off-by: Pablo Garay <palenq@gmail.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Remove ray deprecated dashboard-grpo-port arg

Signed-off-by: Charlie Truong <chtruong@nvidia.com>

* Fix nemo run ray cluster tests

Signed-off-by: Charlie Truong <chtruong@nvidia.com>

* Remove DASHBOARD_GRPC_PORT

Signed-off-by: Charlie Truong <chtruong@nvidia.com>

---------

Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
* add a grace for Jobs that may start in Unknown

Signed-off-by: Prekshi Vyas <prekshivyas@gmail.com>

* add a grace for Jobs that may start in Unknown

Signed-off-by: Prekshi Vyas <prekshivyas@gmail.com>

* add a grace for Jobs that may start in Unknown

Signed-off-by: Prekshi Vyas <prekshivyas@gmail.com>

* fix linting

Signed-off-by: Prekshi Vyas <prekshivyas@gmail.com>

* make the handling of Unknown job status better by polling

Signed-off-by: prekshivyas <prekhsivyas@gmail.com>

---------

Signed-off-by: Prekshi Vyas <prekshivyas@gmail.com>
Signed-off-by: prekshivyas <prekhsivyas@gmail.com>
Co-authored-by: prekshivyas <prekhsivyas@gmail.com>
* add image pull secrets for lepton

Signed-off-by: Pablo Garay <pagaray@nvidia.com>

* update format

Signed-off-by: Pablo Garay <pagaray@nvidia.com>

---------

Signed-off-by: Pablo Garay <pagaray@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Allow users to specify an existing node reservation with the
LeptonExecutor to be able to run on dedicated resources.

Signed-off-by: Robert Clark <roclark@nvidia.com>
Signed-off-by: Romil Bhardwaj <romil.bhardwaj@gmail.com>
Signed-off-by: Romil Bhardwaj <romil.bhardwaj@gmail.com>
…c cloud sync (#335)

* fix: support for SkyPilot Storage configurations in file_mounts

- Modified SkypilotExecutor to handle both string paths and dict configs in file_mounts
- Dictionary configs are automatically converted to sky.Storage objects
- Enables automatic cloud storage mounting (GCS, S3, etc.) for outputs

This change allows users to specify cloud storage backends directly in
file_mounts, enabling automatic synchronization of training outputs to
cloud storage without manual rsync operations.

Signed-off-by: Andy Lee <andylizf@outlook.com>

* refactor: Separate storage_mounts from file_mounts for cleaner API

Signed-off-by: Andy Lee <andylizf@outlook.com>

* test: Add unit tests for storage_mounts functionality

- Test storage_mounts parameter initialization
- Test to_task() method with storage_mounts configurations
- Test combined file_mounts and storage_mounts usage
- Verify Storage.from_yaml_config() integration
- Ensure backward compatibility when storage_mounts is None

Signed-off-by: Andy Lee <andylizf@outlook.com>

* fix tests

Signed-off-by: Hemil Desai <hemild@nvidia.com>

---------

Signed-off-by: Andy Lee <andylizf@outlook.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Co-authored-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: Romil Bhardwaj <romil.bhardwaj@gmail.com>
Signed-off-by: Pablo Garay <palenq@gmail.com>
* Create SkypilotJobsExecutor to allow running managed jobs with Skypilot API

Signed-off-by: Rahim Dharssi <rahimftd@cisco.com>

* Remove unnecessary comments

Signed-off-by: Rahim Dharssi <rahimftd@cisco.com>

* fix lints

Signed-off-by: Rahim Dharssi <rahimftd@cisco.com>

* Add comment for suppressing import error

Signed-off-by: Rahim Dharssi <rahimftd@cisco.com>

* Write unit tests for _save_job_dir and _get_job_dirs

Signed-off-by: Rahim Dharssi <rahimftd@cisco.com>

* Fix lints

Signed-off-by: Rahim Dharssi <rahimftd@cisco.com>

---------

Signed-off-by: Rahim Dharssi <rahimftd@cisco.com>
* Refactor tar packaging logic for improved performance and simplicity

Signed-off-by: smajumdar <titu1994@gmail.com>

* Clarify tar repacking logic to avoid issues with concatenating tar files

Signed-off-by: smajumdar <titu1994@gmail.com>

* Remove redundant test for concatenating tar files on Linux

Signed-off-by: smajumdar <titu1994@gmail.com>

* spell check fix

Signed-off-by: Hemil Desai <hemild@nvidia.com>

---------

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Co-authored-by: Hemil Desai <hemild@nvidia.com>
* Fixing documentation layout

Signed-off-by: Andrew Schilling <aschilling@nvidia.com>

* documentation.md

Signed-off-by: Andrew Schilling <aschilling@nvidia.com>

* Removing live-server

Signed-off-by: Andrew Schilling <aschilling@nvidia.com>

* Correctin .vscode

Signed-off-by: Andrew Schilling <aschilling@nvidia.com>

---------

Signed-off-by: Andrew Schilling <aschilling@nvidia.com>
Signed-off-by: Andrew Schilling <aschilling@nvidia.com>
Signed-off-by: Pablo Garay <pagaray@nvidia.com>
* fix: Emit exit-code of docker runs

Signed-off-by: oliver könig <okoenig@nvidia.com>

* fix test

Signed-off-by: oliver könig <okoenig@nvidia.com>

* fixes

Signed-off-by: oliver könig <okoenig@nvidia.com>

* refactor

Signed-off-by: oliver könig <okoenig@nvidia.com>

* cleanup

Signed-off-by: oliver könig <okoenig@nvidia.com>

* add scheduler test

Signed-off-by: oliver könig <okoenig@nvidia.com>

* more scheduler tests

Signed-off-by: oliver könig <okoenig@nvidia.com>

* test executor

Signed-off-by: oliver könig <okoenig@nvidia.com>

* formatting

Signed-off-by: oliver könig <okoenig@nvidia.com>

---------

Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: Pablo Garay <pagaray@nvidia.com>
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…_version__).minor.Version(__version__).microrc1.dev0` !

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@github-actions github-actions bot force-pushed the ci/bump-Version(__version__).major.Version(__version__).minor.Version(__version__).microrc1.dev0 branch from b4a3020 to 0ba0b24 Compare December 3, 2025 23:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.