Skip to content

Directory names conflicting with file names#12407

Open
jo-pol wants to merge 2 commits into
IQSS:developfrom
DANS-KNAW-jp:directory-name-conflict
Open

Directory names conflicting with file names#12407
jo-pol wants to merge 2 commits into
IQSS:developfrom
DANS-KNAW-jp:directory-name-conflict

Conversation

@jo-pol
Copy link
Copy Markdown
Contributor

@jo-pol jo-pol commented May 26, 2026

What this PR does / why we need it:

Downloads result in unzip problems for datasets with full file-paths duplicating directories. Note that directory does not just mean the directoryLabel, but also the parents in the directoryLabel.

  • EditDatafilesPage: message shows all conflicting files, not just the first
  • a file with a directory conflicting with an existing full path is rejected
  • files with a full path that conflicts with an existing directory will get a sequence number added
  • additional unit test
  • manual test script
  • scripts to detect (latest version of) datasets with conflicting directory paths

Which issue(s) this PR closes:

  • Closes #

Special notes for your reviewer:

Suggestions on how to test this:

  • Adjust constants at the start of test-apis.py to meet your system and an empty dataset. Output with dashed lines show expectations after deploy. See also screenshots below.
  • Download all files resulting from test-apis.py before this fix, unzip fails.
  • Deploy the fix - try to add a non-conflicting file to the dataset with conflicts - saving the dataset fails
  • Try to add files with files conflicting with existing directories to a healthy dataset: they will get a sequence number
  • Try ta add a file with a (parent) directory that exists as a file: saving the dataset will fail.
  • Adding files to a directory that already exists should succeed.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

See the changed dataset-management.rst file, the other changed rst file had a typo.

Is there a release notes update needed for this change?:

Existing datasets with the new type of duplicate names should be identified, see: scripts/issues/dirs-duplicating-files/find_duplicates.py The reported datasets should be fixed manually, best before deploying this fix. Depending on your preferences and the size of your database you might want a variation of the scripts.

Additional documentation:

Screenshots for the results of test-apis.py before and after deploy. Before depoly we see conflicts on foo and foo/bar), after deploy some request return 400-bad-request.
image image

jo-pol added 2 commits May 26, 2026 14:21
- EditDatafilesPage: message shows all conflicting files, not just the first
- a file with a directory conflicting with an existing full path is rejected
- files with a full path that conflicts with an existing directory will get a sequence number added
- additional unit test
- manual test script
- scripts to detect (latest version of) datasets with conflicting directory paths

Note that directory does not just mean directoryLabel, but also the parents in the directoryLabel
@jo-pol jo-pol changed the title squash of DANS PR242 up to e24bc721 Directory names conflicting with file names May 26, 2026
@jo-pol jo-pol marked this pull request as ready for review May 26, 2026 13:30
@pdurbin pdurbin moved this to Ready for Triage in IQSS Dataverse Project May 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Ready for Triage

Development

Successfully merging this pull request may close these issues.

2 participants