diff --git a/doc/release-notes/11744-cors-echo-origin-vary.md b/doc/release-notes/11744-cors-echo-origin-vary.md new file mode 100644 index 00000000000..48eaa3b96f9 --- /dev/null +++ b/doc/release-notes/11744-cors-echo-origin-vary.md @@ -0,0 +1,41 @@ +# 11744: CORS handling improvements + +Modernizes CORS so browser integrations (previewers, external tools, JS clients) work correctly with multiple origins and proper caching. + +## Highlights + +- Echoes the request origin (`Access-Control-Allow-Origin`) when it matches `dataverse.cors.origin`. +- Adds `Vary: Origin` for per-origin responses (not for wildcard). +- Supports comma‑separated origin list; any `*` in the list = wildcard mode. +- CORS now only enabled when `dataverse.cors.origin` is set (removed `:AllowCors` no longer enables it). +- All comma-separated configuration settings (database properties and MicroProfile config) now ignore spaces around commas; tokens remain unchanged (no quote parsing). Examples: `dataverse.cors.methods`, `dataverse.cors.headers.allow`, `dataverse.cors.headers.expose`. See "Comma-separated configuration values" in the Installation Guide. +- Docs updated (Installation, Big Data Support, External Tools, File Previews); new tests cover edge cases. + +## Admin Action + +Set `dataverse.cors.origin` explicitly (required). Use explicit origins (not `*`) for credentialed requests. Ensure proxies keep `Vary: Origin`. + +Examples: + +``` +dataverse.cors.origin=https://example.org +dataverse.cors.origin=https://libis.github.io,https://gdcc.github.io +dataverse.cors.origin=* +``` + +Optional (unquoted): + +``` +dataverse.cors.methods=GET, POST, OPTIONS, PUT, DELETE +``` + +## Compatibility + +- Must configure `dataverse.cors.origin`; `:AllowCors` was deprecated and has now been removed. +- Any `*` triggers wildcard (no per-origin echo / no Vary header). + +## Docs + +See updated `dataverse.cors.origin` section and related notes in Big Data Support (S3), External Tools, and File Previews. + + diff --git a/doc/sphinx-guides/source/api/external-tools.rst b/doc/sphinx-guides/source/api/external-tools.rst index 389519318db..57a98a0c7c2 100644 --- a/doc/sphinx-guides/source/api/external-tools.rst +++ b/doc/sphinx-guides/source/api/external-tools.rst @@ -11,6 +11,9 @@ Introduction External tools are additional applications the user can access or open from your Dataverse installation to preview, explore, and manipulate data files and datasets. The term "external" is used to indicate that the tool is not part of the main Dataverse Software. +.. note:: + Browser-based tools must have CORS explicitly enabled via :ref:`dataverse.cors.origin `. List every origin that will host your tool (or use ``*`` when a wildcard is acceptable). If an origin is not listed, the browser will block that tool's API requests even if the tool page itself loads. + Once you have created the external tool itself (which is most of the work!), you need to teach a Dataverse installation how to construct URLs that your tool needs to operate. For example, if you've deployed your tool to fabulousfiletool.com your tool might want the ID of a file and the siteUrl of the Dataverse installation like this: https://fabulousfiletool.com?fileId=42&siteUrl=https://demo.dataverse.org In short, you will be creating a manifest in JSON format that describes not only how to construct URLs for your tool, but also what types of files your tool operates on, where it should appear in the Dataverse installation web interfaces, etc. diff --git a/doc/sphinx-guides/source/developers/big-data-support.rst b/doc/sphinx-guides/source/developers/big-data-support.rst index 75a50e2513d..ef13143be02 100644 --- a/doc/sphinx-guides/source/developers/big-data-support.rst +++ b/doc/sphinx-guides/source/developers/big-data-support.rst @@ -57,6 +57,15 @@ Allow CORS for S3 Buckets **IMPORTANT:** One additional step that is required to enable direct uploads via a Dataverse installation and for direct download to work with previewers and direct upload to work with dvwebloader (:ref:`folder-upload`) is to allow cross site (CORS) requests on your S3 store. The example below shows how to enable CORS rules (to support upload and download) on a bucket using the AWS CLI command line tool. Note that you may want to limit the AllowedOrigins and/or AllowedHeaders further. https://github.com/gdcc/dataverse-previewers/wiki/Using-Previewers-with-download-redirects-from-S3 has some additional information about doing this. +Dataverse itself will only emit the necessary ``Access-Control-*`` headers to browsers when CORS has been explicitly enabled via the JVM/MicroProfile setting :ref:`dataverse.cors.origin `. You must both: + +* Configure an appropriate ``dataverse.cors.origin`` value (single origin, comma-separated list, or ``*``) on the Dataverse application server; and +* Configure a matching/compatible CORS policy on each S3 bucket (and any CDN/proxy in front of it) that will be used for direct upload or for redirect (download-redirect) operations consumed by previewers. + +If you specify multiple origins in ``dataverse.cors.origin`` Dataverse will echo back the requesting origin (when it matches) and will include ``Vary: Origin`` so that shared caches do not serve one origin's response to another. If you configure ``*`` Dataverse will respond with ``Access-Control-Allow-Origin: *`` (note that browsers will not allow credentialed requests with a wildcard). + +Make sure the bucket CORS configuration ``AllowedOrigins`` is at least as permissive as the origins you configure in ``dataverse.cors.origin``. If the bucket allows ``*`` but the Dataverse application only allows a subset, the browser will still enforce the more restrictive application response. + If you'd like to check the CORS configuration on your bucket before making changes: ``aws s3api get-bucket-cors --bucket `` diff --git a/doc/sphinx-guides/source/installation/config.rst b/doc/sphinx-guides/source/installation/config.rst index e7d7b9f2592..6c19464489d 100644 --- a/doc/sphinx-guides/source/installation/config.rst +++ b/doc/sphinx-guides/source/installation/config.rst @@ -10,6 +10,27 @@ Once you have finished securing and configuring your Dataverse installation, you .. contents:: |toctitle| :local: +.. _comma-separated-config-values: + +Comma-separated configuration values +------------------------------------ + +Many configuration options (both MicroProfile/JVM settings and database settings) accept comma-separated lists. For all such settings, Dataverse applies consistent, lightweight parsing: + +- Whitespace immediately around commas is ignored (e.g., ``GET, POST`` is equivalent to ``GET,POST``). +- Tokens are otherwise preserved exactly as typed. There is no quote parsing and no escape processing. +- Embedded commas within a token are not supported. + +Examples include (but are not limited to): + +- :ref:`dataverse.cors.origin ` +- :ref:`dataverse.cors.methods ` +- :ref:`dataverse.cors.headers.allow ` +- :ref:`dataverse.cors.headers.expose ` +- :ref:`:UploadMethods` + +This behavior is implemented centrally and applies across all Dataverse settings that accept comma-separated values. + .. _securing-your-installation: Securing Your Installation @@ -3704,10 +3725,9 @@ The following settings control Cross-Origin Resource Sharing (CORS) for your Dat dataverse.cors.origin +++++++++++++++++++++ -Allowed origins for CORS requests. The default with no value set is to not include CORS headers. However, if the deprecated :AllowCors setting is explicitly set to true the default is "\*" (all origins). -When the :AllowsCors setting is not used, you must set this setting to "\*" or a list of origins to enable CORS headers. +Allowed origins for CORS requests. If this setting is not defined, CORS headers are not added. Set to ``*`` to allow all origins (note that browsers will not allow credentialed requests with ``*``) or provide a comma-separated list of explicit origins. -Multiple origins can be specified as a comma-separated list. +Multiple origins can be specified as a comma-separated list (whitespace is ignored): Example: @@ -3715,6 +3735,11 @@ Example: Can also be set via any `supported MicroProfile Config API source`_, e.g. the environment variable ``DATAVERSE_CORS_ORIGIN``. +Behavior: + +* When a list of origins is configured, Dataverse echoes the single matching request ``Origin`` value in ``Access-Control-Allow-Origin`` and adds ``Vary: Origin`` to support correct proxy/CDN caching. +* When ``*`` is configured, ``Access-Control-Allow-Origin: *`` is sent and ``Vary`` is not modified. + .. _dataverse.cors.methods: dataverse.cors.methods @@ -5028,20 +5053,6 @@ This can be helpful in situations where multiple organizations are sharing one D or ``curl -X PUT -d '*' http://localhost:8080/api/admin/settings/:InheritParentRoleAssignments`` -:AllowCors (Deprecated) -+++++++++++++++++++++++ - -.. note:: - This setting is deprecated. Please use the JVM settings above instead. - This legacy setting will only be used if the newer JVM settings are not set. - -Enable or disable support for Cross-Origin Resource Sharing (CORS) by setting ``:AllowCors`` to ``true`` or ``false``. - -``curl -X PUT -d true http://localhost:8080/api/admin/settings/:AllowCors`` - -.. note:: - New values for this setting will only be used after a server restart. - :ChronologicalDateFacets ++++++++++++++++++++++++ diff --git a/doc/sphinx-guides/source/user/dataset-management.rst b/doc/sphinx-guides/source/user/dataset-management.rst index 0802d1255b6..ff8cbe79a46 100755 --- a/doc/sphinx-guides/source/user/dataset-management.rst +++ b/doc/sphinx-guides/source/user/dataset-management.rst @@ -175,6 +175,9 @@ File Previews Dataverse installations can add previewers for common file types uploaded by their research communities. The previews appear on the file page. If a preview tool for a specific file type is available, the preview will be created and will display automatically, after terms have been agreed to or a guestbook entry has been made, if necessary. File previews are not available for restricted files unless they are being accessed using a Preview URL. See also :ref:`previewUrl`. When the dataset license is not the default license, users will be prompted to accept the license/data use agreement before the preview is shown. See also :ref:`license-terms`. +.. note:: + Some previewers run purely in the browser and make direct (JavaScript) requests back to the Dataverse API endpoints to retrieve file contents, metadata, or signed URLs. For these previewers to function when hosted on a different origin (e.g., a CDN or a separate previewer service), the Dataverse installation must have CORS enabled via :ref:`dataverse.cors.origin `. Administrators should configure the list of allowed origins to include the host serving the previewers. + Previewers are available for the following file types: - Text diff --git a/src/main/java/edu/harvard/iq/dataverse/DatasetFieldServiceBean.java b/src/main/java/edu/harvard/iq/dataverse/DatasetFieldServiceBean.java index ebee7c20ba2..e6b2711b443 100644 --- a/src/main/java/edu/harvard/iq/dataverse/DatasetFieldServiceBean.java +++ b/src/main/java/edu/harvard/iq/dataverse/DatasetFieldServiceBean.java @@ -53,6 +53,7 @@ import org.apache.http.protocol.HttpContext; import org.apache.http.util.EntityUtils; import edu.harvard.iq.dataverse.settings.SettingsServiceBean; +import edu.harvard.iq.dataverse.util.ListSplitUtil; /** * @@ -908,12 +909,12 @@ public String getFieldLanguage(String languages, String localeCode) { // If the fields list of supported languages contains the current locale (e.g. // the lang of the UI, or the current metadata input/display lang (tbd)), use // that. Otherwise, return the first in the list - String[] langStrings = languages.split("\\s*,\\s*"); - if (langStrings.length > 0) { - if (Arrays.asList(langStrings).contains(localeCode)) { + final List langStrings = ListSplitUtil.split(languages); + if (!langStrings.isEmpty()) { + if (langStrings.contains(localeCode)) { return localeCode; } else { - return langStrings[0]; + return langStrings.get(0); } } return null; diff --git a/src/main/java/edu/harvard/iq/dataverse/FileMetadata.java b/src/main/java/edu/harvard/iq/dataverse/FileMetadata.java index 932bbd60be6..b60b5afedd3 100644 --- a/src/main/java/edu/harvard/iq/dataverse/FileMetadata.java +++ b/src/main/java/edu/harvard/iq/dataverse/FileMetadata.java @@ -49,6 +49,7 @@ import edu.harvard.iq.dataverse.datavariable.VarGroup; import edu.harvard.iq.dataverse.datavariable.VariableMetadata; import edu.harvard.iq.dataverse.util.DateUtil; +import edu.harvard.iq.dataverse.util.ListSplitUtil; import edu.harvard.iq.dataverse.util.StringUtil; import java.util.HashSet; import java.util.Set; @@ -605,18 +606,18 @@ public int compare(FileMetadata o1, FileMetadata o2) { } }; - static Map categoryMap=null; + static Map categoryMap = null; public static void setCategorySortOrder(String categories) { - categoryMap=new HashMap(); - long i=1; - for(String cat: categories.split(",\\s*")) { - categoryMap.put(cat.toUpperCase(), i); - i++; - } + categoryMap = new HashMap(); + long i = 1; + for (String cat : ListSplitUtil.split(categories)) { + categoryMap.put(cat.toUpperCase(), i); + i++; + } } - public static Map getCategorySortOrder() { + public static Map getCategorySortOrder() { return categoryMap; } diff --git a/src/main/java/edu/harvard/iq/dataverse/SettingsWrapper.java b/src/main/java/edu/harvard/iq/dataverse/SettingsWrapper.java index 653632ba719..23a26a8cf2c 100644 --- a/src/main/java/edu/harvard/iq/dataverse/SettingsWrapper.java +++ b/src/main/java/edu/harvard/iq/dataverse/SettingsWrapper.java @@ -14,6 +14,7 @@ import edu.harvard.iq.dataverse.settings.SettingsServiceBean; import edu.harvard.iq.dataverse.settings.SettingsServiceBean.Key; import edu.harvard.iq.dataverse.util.BundleUtil; +import edu.harvard.iq.dataverse.util.ListSplitUtil; import edu.harvard.iq.dataverse.util.StringUtil; import edu.harvard.iq.dataverse.util.SystemConfig; import edu.harvard.iq.dataverse.UserNotification.Type; @@ -50,8 +51,7 @@ public class SettingsWrapper implements java.io.Serializable { static final Logger logger = Logger.getLogger(SettingsWrapper.class.getCanonicalName()); - public static final String COMMA_BETWEEN_OPTIONAL_WHITE_SPACE = "\\s*,\\s*"; - + @EJB SettingsServiceBean settingsService; @@ -393,10 +393,12 @@ public boolean isRsyncOnly() { rsyncOnly = false; } else { String uploadMethods = getValueForKey(SettingsServiceBean.Key.UploadMethods); - if (uploadMethods==null){ + if (uploadMethods == null) { rsyncOnly = false; } else { - rsyncOnly = Arrays.asList(uploadMethods.toLowerCase().split(COMMA_BETWEEN_OPTIONAL_WHITE_SPACE)).size() == 1 && uploadMethods.toLowerCase().equals(SystemConfig.FileUploadMethods.RSYNC.toString()); + String normalizedUploadMethods = uploadMethods.toLowerCase(); + rsyncOnly = ListSplitUtil.split(normalizedUploadMethods).size() == 1 + && normalizedUploadMethods.equals(SystemConfig.FileUploadMethods.RSYNC.toString()); } } } @@ -424,11 +426,11 @@ public String getSupportTeamEmail() { public Integer getUploadMethodsCount() { if (uploadMethodsCount == null) { - String uploadMethods = getValueForKey(SettingsServiceBean.Key.UploadMethods); - if (uploadMethods==null){ + String uploadMethods = getValueForKey(SettingsServiceBean.Key.UploadMethods); + if (uploadMethods == null) { uploadMethodsCount = 0; } else { - uploadMethodsCount = Arrays.asList(uploadMethods.toLowerCase().split(COMMA_BETWEEN_OPTIONAL_WHITE_SPACE)).size(); + uploadMethodsCount = ListSplitUtil.split(uploadMethods).size(); } } return uploadMethodsCount; @@ -502,7 +504,7 @@ public boolean shouldBeAnonymized(DatasetField df) { if (anonymizedFieldTypes == null) { anonymizedFieldTypes = new ArrayList(); String names = get(SettingsServiceBean.Key.AnonymizedFieldTypeNames.toString(), ""); - anonymizedFieldTypes.addAll(Arrays.asList(names.split(COMMA_BETWEEN_OPTIONAL_WHITE_SPACE))); + anonymizedFieldTypes.addAll(ListSplitUtil.split(names)); } return anonymizedFieldTypes.contains(df.getDatasetFieldType().getName()); } @@ -826,11 +828,11 @@ public String getMetricsUrl() { } private Boolean getUploadMethodAvailable(String method){ - String uploadMethods = getValueForKey(SettingsServiceBean.Key.UploadMethods); - if (uploadMethods==null){ + String uploadMethods = getValueForKey(SettingsServiceBean.Key.UploadMethods); + if (uploadMethods == null) { return false; } else { - return Arrays.asList(uploadMethods.toLowerCase().split(COMMA_BETWEEN_OPTIONAL_WHITE_SPACE)).contains(method); + return ListSplitUtil.splitToLowerCaseSet(uploadMethods).contains(method); } } diff --git a/src/main/java/edu/harvard/iq/dataverse/api/Admin.java b/src/main/java/edu/harvard/iq/dataverse/api/Admin.java index 245b76b2cdb..75aedb038dc 100644 --- a/src/main/java/edu/harvard/iq/dataverse/api/Admin.java +++ b/src/main/java/edu/harvard/iq/dataverse/api/Admin.java @@ -115,6 +115,7 @@ import edu.harvard.iq.dataverse.util.ArchiverUtil; import edu.harvard.iq.dataverse.util.BundleUtil; import edu.harvard.iq.dataverse.util.FileUtil; +import edu.harvard.iq.dataverse.util.ListSplitUtil; import edu.harvard.iq.dataverse.util.SystemConfig; import edu.harvard.iq.dataverse.util.URLTokenUtil; import edu.harvard.iq.dataverse.util.UrlSignerUtil; @@ -2243,7 +2244,7 @@ public Response addRoleAssignementsToChildren(@Context ContainerRequestContext c boolean inheritAllRoles = false; String rolesString = settingsSvc.getValueForKey(SettingsServiceBean.Key.InheritParentRoleAssignments, ""); if (rolesString.length() > 0) { - ArrayList rolesToInherit = new ArrayList(Arrays.asList(rolesString.split("\\s*,\\s*"))); + ArrayList rolesToInherit = new ArrayList<>(ListSplitUtil.split(rolesString)); if (!rolesToInherit.isEmpty()) { if (rolesToInherit.contains("*")) { inheritAllRoles = true; diff --git a/src/main/java/edu/harvard/iq/dataverse/api/Datasets.java b/src/main/java/edu/harvard/iq/dataverse/api/Datasets.java index df292762353..4b3db65556c 100644 --- a/src/main/java/edu/harvard/iq/dataverse/api/Datasets.java +++ b/src/main/java/edu/harvard/iq/dataverse/api/Datasets.java @@ -5317,7 +5317,8 @@ public Response getPrivateUrlDatasetVersion(@PathParam("privateUrlToken") String } JsonObjectBuilder responseJson; if (isAnonymizedAccess) { - List anonymizedFieldTypeNamesList = new ArrayList<>(Arrays.asList(anonymizedFieldTypeNames.split(SettingsWrapper.COMMA_BETWEEN_OPTIONAL_WHITE_SPACE))); + // Use ListSplitUtil for consistent CSV parsing + List anonymizedFieldTypeNamesList = new ArrayList<>(ListSplitUtil.split(anonymizedFieldTypeNames)); responseJson = json(dsv, anonymizedFieldTypeNamesList, true, returnOwners); } else { responseJson = json(dsv, null, true, returnOwners); @@ -5343,7 +5344,8 @@ public Response getPreviewUrlDatasetVersion(@PathParam("previewUrlToken") String } JsonObjectBuilder responseJson; if (isAnonymizedAccess) { - List anonymizedFieldTypeNamesList = new ArrayList<>(Arrays.asList(anonymizedFieldTypeNames.split(SettingsWrapper.COMMA_BETWEEN_OPTIONAL_WHITE_SPACE))); + // Use ListSplitUtil for consistent CSV parsing + List anonymizedFieldTypeNamesList = new ArrayList<>(ListSplitUtil.split(anonymizedFieldTypeNames)); responseJson = json(dsv, anonymizedFieldTypeNamesList, true, returnOwners); } else { responseJson = json(dsv, null, true, returnOwners); diff --git a/src/main/java/edu/harvard/iq/dataverse/batch/jobs/importer/filesystem/FileRecordReader.java b/src/main/java/edu/harvard/iq/dataverse/batch/jobs/importer/filesystem/FileRecordReader.java index 9ce30683a87..175683bbb16 100644 --- a/src/main/java/edu/harvard/iq/dataverse/batch/jobs/importer/filesystem/FileRecordReader.java +++ b/src/main/java/edu/harvard/iq/dataverse/batch/jobs/importer/filesystem/FileRecordReader.java @@ -25,6 +25,7 @@ import edu.harvard.iq.dataverse.authorization.users.AuthenticatedUser; import edu.harvard.iq.dataverse.batch.jobs.importer.ImportMode; import edu.harvard.iq.dataverse.settings.JvmSettings; +import edu.harvard.iq.dataverse.util.ListSplitUtil; import org.apache.commons.io.filefilter.NotFileFilter; import org.apache.commons.io.filefilter.WildcardFileFilter; @@ -43,7 +44,6 @@ import java.io.FileFilter; import java.io.Serializable; import java.util.ArrayList; -import java.util.Arrays; import java.util.HashMap; import java.util.Iterator; import java.util.List; @@ -152,8 +152,13 @@ public File readItem() { * @return list of files */ private List getFiles(final File directory) { - // create filter from job xml excludes property - FileFilter excludeFilter = new NotFileFilter(new WildcardFileFilter(Arrays.asList(excludes.split("\\s*,\\s*")))); + // create filter from job xml excludes property using builder to avoid deprecated constructors + final String[] excludedPatterns = ListSplitUtil.split(excludes).toArray(new String[0]); + FileFilter excludeFilter = new NotFileFilter( + WildcardFileFilter.builder() + .setWildcards(excludedPatterns) + .get() + ); List files = new ArrayList<>(); File[] filesList = directory.listFiles(excludeFilter); if (filesList != null) { diff --git a/src/main/java/edu/harvard/iq/dataverse/dataaccess/GlobusAccessibleStore.java b/src/main/java/edu/harvard/iq/dataverse/dataaccess/GlobusAccessibleStore.java index 8bed60d8302..032ec1cfe48 100644 --- a/src/main/java/edu/harvard/iq/dataverse/dataaccess/GlobusAccessibleStore.java +++ b/src/main/java/edu/harvard/iq/dataverse/dataaccess/GlobusAccessibleStore.java @@ -1,5 +1,6 @@ package edu.harvard.iq.dataverse.dataaccess; +import edu.harvard.iq.dataverse.util.ListSplitUtil; import jakarta.json.Json; import jakarta.json.JsonArray; import jakarta.json.JsonArrayBuilder; @@ -38,10 +39,10 @@ public static String getTransferPath(String driverId) { } public static JsonArray getReferenceEndpointsWithPaths(String driverId) { - String[] endpoints = StorageIO.getConfigParamForDriver(driverId, AbstractRemoteOverlayAccessIO.REFERENCE_ENDPOINTS_WITH_BASEPATHS).split("\\s*,\\s*"); JsonArrayBuilder builder = Json.createArrayBuilder(); - for(int i=0;i allowedEndpoints = ListSplitUtil.split(rawEndpoints); + if (allowedEndpoints.isEmpty()) { + throw new IOException("dataverse.files." + driverId + ".base-url is required"); } - return allowedEndpoints; + return allowedEndpoints.toArray(new String[0]); } diff --git a/src/main/java/edu/harvard/iq/dataverse/dataaccess/RemoteOverlayAccessIO.java b/src/main/java/edu/harvard/iq/dataverse/dataaccess/RemoteOverlayAccessIO.java index bca70259cb7..1613d1ec7cc 100644 --- a/src/main/java/edu/harvard/iq/dataverse/dataaccess/RemoteOverlayAccessIO.java +++ b/src/main/java/edu/harvard/iq/dataverse/dataaccess/RemoteOverlayAccessIO.java @@ -5,6 +5,7 @@ import edu.harvard.iq.dataverse.Dataverse; import edu.harvard.iq.dataverse.DvObject; import edu.harvard.iq.dataverse.datavariable.DataVariable; +import edu.harvard.iq.dataverse.util.ListSplitUtil; import edu.harvard.iq.dataverse.util.UrlSignerUtil; import java.io.FileNotFoundException; @@ -33,10 +34,10 @@ */ /* * Remote Overlay Driver - * + * * StorageIdentifier format: * ://// - * + * * baseUrl: http(s):// */ public class RemoteOverlayAccessIO extends AbstractRemoteOverlayAccessIO { @@ -48,7 +49,7 @@ public class RemoteOverlayAccessIO extends AbstractRemoteOve public RemoteOverlayAccessIO() { super(); } - + public RemoteOverlayAccessIO(T dvObject, DataAccessRequest req, String driverId) throws IOException { super(dvObject, req, driverId); this.setIsLocalFile(false); @@ -124,10 +125,10 @@ public void open(DataAccessOption... options) throws IOException { logger.fine("Setting size"); this.setSize(retrieveSizeFromMedia()); } - if (dataFile.getContentType() != null + if (dataFile.getContentType() != null && dataFile.getContentType().equals("text/tab-separated-values") - && dataFile.isTabularData() - && dataFile.getDataTable() != null + && dataFile.isTabularData() + && dataFile.getDataTable() != null && (!this.noVarHeader()) && (!dataFile.getDataTable().isStoredWithVariableHeader())) { @@ -317,7 +318,7 @@ protected void configureRemoteEndpoints() throws IOException { baseUrl = getConfigParam(BASE_URL); if (baseUrl == null) { //Will accept the first endpoint using the newer setting - baseUrl = getConfigParam(REFERENCE_ENDPOINTS_WITH_BASEPATHS).split("\\s*,\\s*")[0]; + baseUrl = ListSplitUtil.split(getConfigParam(REFERENCE_ENDPOINTS_WITH_BASEPATHS)).stream().findFirst().orElse(baseUrl); if (baseUrl == null) { throw new IOException("dataverse.files." + this.driverId + ".base-url is required"); } diff --git a/src/main/java/edu/harvard/iq/dataverse/datacapturemodule/DataCaptureModuleUtil.java b/src/main/java/edu/harvard/iq/dataverse/datacapturemodule/DataCaptureModuleUtil.java index 094d3976133..de2aa0aaee8 100644 --- a/src/main/java/edu/harvard/iq/dataverse/datacapturemodule/DataCaptureModuleUtil.java +++ b/src/main/java/edu/harvard/iq/dataverse/datacapturemodule/DataCaptureModuleUtil.java @@ -5,8 +5,8 @@ import edu.harvard.iq.dataverse.Dataset; import edu.harvard.iq.dataverse.DatasetVersion; import edu.harvard.iq.dataverse.authorization.users.AuthenticatedUser; +import edu.harvard.iq.dataverse.util.ListSplitUtil; import edu.harvard.iq.dataverse.util.SystemConfig; -import java.util.Arrays; import java.util.logging.Logger; import jakarta.json.Json; import jakarta.json.JsonObject; @@ -19,11 +19,11 @@ public class DataCaptureModuleUtil { @Deprecated(forRemoval = true, since = "2024-07-07") public static boolean rsyncSupportEnabled(String uploadMethodsSettings) { - logger.fine("uploadMethodsSettings: " + uploadMethodsSettings);; + logger.fine("uploadMethodsSettings: " + uploadMethodsSettings);; if (uploadMethodsSettings==null){ return false; } else { - return Arrays.asList(uploadMethodsSettings.toLowerCase().split("\\s*,\\s*")).contains(SystemConfig.FileUploadMethods.RSYNC.toString()); + return ListSplitUtil.splitToLowerCaseSet(uploadMethodsSettings).contains(SystemConfig.FileUploadMethods.RSYNC.toString()); } } diff --git a/src/main/java/edu/harvard/iq/dataverse/dataset/DatasetUtil.java b/src/main/java/edu/harvard/iq/dataverse/dataset/DatasetUtil.java index 46f458c5403..2ce5471a523 100644 --- a/src/main/java/edu/harvard/iq/dataverse/dataset/DatasetUtil.java +++ b/src/main/java/edu/harvard/iq/dataverse/dataset/DatasetUtil.java @@ -13,6 +13,7 @@ import edu.harvard.iq.dataverse.dataaccess.ImageThumbConverter; import edu.harvard.iq.dataverse.util.BundleUtil; import edu.harvard.iq.dataverse.util.FileUtil; +import edu.harvard.iq.dataverse.util.ListSplitUtil; import java.awt.image.BufferedImage; import java.io.ByteArrayInputStream; import java.io.File; @@ -531,7 +532,7 @@ public static String[] getDatasetSummaryFieldNames(String customFieldNames) { } else { summaryFieldNames = customFieldNames; } - return summaryFieldNames.split("\\s*,\\s*"); + return ListSplitUtil.split(summaryFieldNames).toArray(new String[0]); } public static boolean isRsyncAppropriateStorageDriver(Dataset dataset){ diff --git a/src/main/java/edu/harvard/iq/dataverse/engine/command/impl/CreateDataverseCommand.java b/src/main/java/edu/harvard/iq/dataverse/engine/command/impl/CreateDataverseCommand.java index b28302ba861..3071cfaea8f 100644 --- a/src/main/java/edu/harvard/iq/dataverse/engine/command/impl/CreateDataverseCommand.java +++ b/src/main/java/edu/harvard/iq/dataverse/engine/command/impl/CreateDataverseCommand.java @@ -12,10 +12,10 @@ import edu.harvard.iq.dataverse.engine.command.exception.IllegalCommandException; import edu.harvard.iq.dataverse.settings.SettingsServiceBean; import edu.harvard.iq.dataverse.util.BundleUtil; +import edu.harvard.iq.dataverse.util.ListSplitUtil; import java.sql.Timestamp; import java.util.ArrayList; -import java.util.Arrays; import java.util.Date; import java.util.List; @@ -107,7 +107,7 @@ protected Dataverse innerExecute(CommandContext ctxt) throws IllegalCommandExcep // Add additional role assignments if inheritance is set boolean inheritAllRoles = false; String rolesString = ctxt.settings().getValueForKey(SettingsServiceBean.Key.InheritParentRoleAssignments, ""); - ArrayList rolesToInherit = new ArrayList(Arrays.asList(rolesString.split("\\s*,\\s*"))); + ArrayList rolesToInherit = new ArrayList<>(ListSplitUtil.split(rolesString)); if (rolesString.length() > 0) { if (!rolesToInherit.isEmpty()) { if (rolesToInherit.contains("*")) { diff --git a/src/main/java/edu/harvard/iq/dataverse/filter/CorsFilter.java b/src/main/java/edu/harvard/iq/dataverse/filter/CorsFilter.java index 7d99d9ee4d2..d7f14fff245 100644 --- a/src/main/java/edu/harvard/iq/dataverse/filter/CorsFilter.java +++ b/src/main/java/edu/harvard/iq/dataverse/filter/CorsFilter.java @@ -1,20 +1,30 @@ package edu.harvard.iq.dataverse.filter; -import jakarta.inject.Inject; -import jakarta.servlet.*; -import jakarta.servlet.annotation.WebFilter; -import jakarta.servlet.http.HttpServletResponse; import java.io.IOException; +import java.util.Collections; +import java.util.HashSet; +import java.util.List; +import java.util.Set; +import java.util.stream.Collectors; import edu.harvard.iq.dataverse.settings.JvmSettings; -import edu.harvard.iq.dataverse.settings.SettingsServiceBean; +import edu.harvard.iq.dataverse.util.ListSplitUtil; +import jakarta.servlet.Filter; +import jakarta.servlet.FilterChain; +import jakarta.servlet.FilterConfig; +import jakarta.servlet.ServletException; +import jakarta.servlet.ServletRequest; +import jakarta.servlet.ServletResponse; +import jakarta.servlet.annotation.WebFilter; +import jakarta.servlet.http.HttpServletRequest; +import jakarta.servlet.http.HttpServletResponse; /** * CorsFilter is a servlet filter that handles Cross-Origin Resource Sharing (CORS) for the Dataverse application. * It configures and applies CORS headers to HTTP responses based on application settings. * * This filter: - * 1. Reads CORS configuration from JVM settings or (deprecated) the SettingsServiceBean. See the Dataverse Configuration Guide for more details. + * 1. Reads CORS configuration from JVM settings (dataverse.cors.*). See the Dataverse Configuration Guide for more details. * 2. Determines whether CORS should be allowed based on these settings. * 3. If CORS is allowed, it adds the appropriate CORS headers to all HTTP responses. The JVMSettings allow customization of the header contents if desired. * @@ -24,32 +34,33 @@ @WebFilter("/*") public class CorsFilter implements Filter { - @Inject - private SettingsServiceBean settingsSvc; - private boolean allowCors; - private String origin; + private boolean allowAllOrigins; + private Set allowedOrigins = Collections.emptySet(); private String methods; private String allowHeaders; private String exposeHeaders; @Override public void init(FilterConfig filterConfig) throws ServletException { - origin = JvmSettings.CORS_ORIGIN.lookupOptional().orElse(null); - boolean corsSetting = settingsSvc.isTrueForKey(SettingsServiceBean.Key.AllowCors, true); - - if (origin == null && !corsSetting) { - allowCors = false; - } else { - allowCors = true; - origin = (origin != null) ? origin : "*"; - } + List origins = JvmSettings.CORS_ORIGIN.lookupSplittedListOptional().orElse(List.of()); + allowCors = !origins.isEmpty(); if (allowCors) { - methods = JvmSettings.CORS_METHODS.lookupOptional().orElse("PUT, GET, POST, DELETE, OPTIONS"); - allowHeaders = JvmSettings.CORS_ALLOW_HEADERS.lookupOptional() + if (origins.contains("*")) { + allowAllOrigins = true; + } else { + allowedOrigins = Set.copyOf(origins); + } + + methods = JvmSettings.CORS_METHODS.lookupSplittedListOptional() + .map(values -> String.join(", ", values)) + .orElse("GET, POST, OPTIONS, PUT, DELETE"); + allowHeaders = JvmSettings.CORS_ALLOW_HEADERS.lookupSplittedListOptional() + .map(values -> String.join(", ", values)) .orElse("Accept, Content-Type, X-Dataverse-key, Range"); - exposeHeaders = JvmSettings.CORS_EXPOSE_HEADERS.lookupOptional() + exposeHeaders = JvmSettings.CORS_EXPOSE_HEADERS.lookupSplittedListOptional() + .map(values -> String.join(", ", values)) .orElse("Accept-Ranges, Content-Range, Content-Encoding"); } } @@ -58,12 +69,35 @@ public void init(FilterConfig filterConfig) throws ServletException { public void doFilter(ServletRequest servletRequest, ServletResponse servletResponse, FilterChain chain) throws IOException, ServletException { if (allowCors) { + HttpServletRequest request = (HttpServletRequest) servletRequest; HttpServletResponse response = (HttpServletResponse) servletResponse; - response.addHeader("Access-Control-Allow-Origin", origin); - response.addHeader("Access-Control-Allow-Methods", methods); - response.addHeader("Access-Control-Allow-Headers", allowHeaders); - response.addHeader("Access-Control-Expose-Headers", exposeHeaders); + + String originHeader = request.getHeader("Origin"); + String requestOrigin = originHeader == null ? null : originHeader.trim(); + + if (allowAllOrigins) { + response.setHeader("Access-Control-Allow-Origin", "*"); + } else if (requestOrigin != null && allowedOrigins.contains(requestOrigin)) { + response.setHeader("Access-Control-Allow-Origin", requestOrigin); + response.setHeader("Vary", appendVary(response.getHeader("Vary"), "Origin")); + } + + response.setHeader("Access-Control-Allow-Methods", methods); + response.setHeader("Access-Control-Allow-Headers", allowHeaders); + response.setHeader("Access-Control-Expose-Headers", exposeHeaders); } chain.doFilter(servletRequest, servletResponse); } + + private String appendVary(String existing, String value) { + if (existing == null || existing.isEmpty()) { + return value; + } + Set tokens = ListSplitUtil.split(existing).stream() + .map(String::trim) + .filter(token -> !token.isEmpty()) + .collect(Collectors.toCollection(HashSet::new)); + tokens.add(value); + return String.join(", ", tokens); + } } diff --git a/src/main/java/edu/harvard/iq/dataverse/pidproviders/AbstractPidProvider.java b/src/main/java/edu/harvard/iq/dataverse/pidproviders/AbstractPidProvider.java index 0b5b49fc52d..0affd32eb99 100644 --- a/src/main/java/edu/harvard/iq/dataverse/pidproviders/AbstractPidProvider.java +++ b/src/main/java/edu/harvard/iq/dataverse/pidproviders/AbstractPidProvider.java @@ -7,6 +7,7 @@ import edu.harvard.iq.dataverse.DatasetVersion; import edu.harvard.iq.dataverse.DvObject; import edu.harvard.iq.dataverse.GlobalId; +import edu.harvard.iq.dataverse.util.ListSplitUtil; import edu.harvard.iq.dataverse.util.SystemConfig; import jakarta.json.Json; import jakarta.json.JsonObject; @@ -60,10 +61,10 @@ protected AbstractPidProvider(String id, String label, String protocol, String a this.identifierGenerationStyle = identifierGenerationStyle; this.datafilePidFormat = datafilePidFormat; if(!managedList.isEmpty()) { - this.managedSet.addAll(Arrays.asList(managedList.split(",\\s"))); + this.managedSet.addAll(ListSplitUtil.split(managedList)); } if(!excludedList.isEmpty()) { - this.excludedSet.addAll(Arrays.asList(excludedList.split(",\\s"))); + this.excludedSet.addAll(ListSplitUtil.split(excludedList)); } if (logger.isLoggable(Level.FINE)) { Iterator iter = managedSet.iterator(); diff --git a/src/main/java/edu/harvard/iq/dataverse/pidproviders/PidProviderFactoryBean.java b/src/main/java/edu/harvard/iq/dataverse/pidproviders/PidProviderFactoryBean.java index 1bd49bc7f6e..267cbab3edd 100644 --- a/src/main/java/edu/harvard/iq/dataverse/pidproviders/PidProviderFactoryBean.java +++ b/src/main/java/edu/harvard/iq/dataverse/pidproviders/PidProviderFactoryBean.java @@ -12,7 +12,6 @@ import java.util.HashMap; import java.util.List; import java.util.Map; -import java.util.NoSuchElementException; import java.util.Optional; import java.util.ServiceLoader; import java.util.logging.Level; @@ -23,11 +22,9 @@ import jakarta.ejb.Singleton; import jakarta.ejb.Startup; import jakarta.inject.Inject; -import jakarta.json.JsonObject; import edu.harvard.iq.dataverse.settings.JvmSettings; import edu.harvard.iq.dataverse.settings.SettingsServiceBean; import edu.harvard.iq.dataverse.util.SystemConfig; -import edu.harvard.iq.dataverse.DatasetFieldServiceBean; import edu.harvard.iq.dataverse.DataverseServiceBean; import edu.harvard.iq.dataverse.DvObjectServiceBean; import edu.harvard.iq.dataverse.GlobalId; @@ -121,14 +118,12 @@ private void loadProviderFactories() { } private void loadProviders() { - Optional providers = JvmSettings.PID_PROVIDERS.lookupOptional(String[].class); - if (!providers.isPresent()) { + Optional> providersOpt = JvmSettings.PID_PROVIDERS.lookupSplittedListOptional(); + if (!providersOpt.isPresent() || providersOpt.get().isEmpty()) { logger.warning( "No PidProviders configured via dataverse.pid.providers. Please consider updating as older PIDProvider configuration mechanisms will be removed in a future version of Dataverse."); } else { - for (String id : providers.get()) { - //Allows spaces in PID_PROVIDERS setting - id=id.trim(); + for (String id : providersOpt.get()) { Optional type = JvmSettings.PID_PROVIDER_TYPE.lookupOptional(id); if (!type.isPresent()) { logger.warning("PidProvider " + id diff --git a/src/main/java/edu/harvard/iq/dataverse/settings/JvmSettings.java b/src/main/java/edu/harvard/iq/dataverse/settings/JvmSettings.java index 87123801a3e..07dc417ba1f 100644 --- a/src/main/java/edu/harvard/iq/dataverse/settings/JvmSettings.java +++ b/src/main/java/edu/harvard/iq/dataverse/settings/JvmSettings.java @@ -309,6 +309,7 @@ public enum JvmSettings { private final String key; private final String scopedKey; + @SuppressWarnings("unused") private final JvmSettings parent; private final List oldNames; private final int placeholders; @@ -608,4 +609,73 @@ public String insert(String... arguments) { return String.format(this.getScopedKey(), (Object[]) arguments); } + /** + * Lookup an optional comma-separated value and return the tokens as an immutable list. + * MicroProfile Config removes zero-length segments when it converts to {@code String[]}, but + * it leaves any leading or trailing whitespace on the surviving tokens (including tokens that + * contain only spaces). This convenience overload trims each token; after trimming, any token + * that becomes empty (because it consisted solely of whitespace) is discarded so callers still + * receive a list that is free of empty strings. Use the boolean overload with {@code false} if + * you need the exact whitespace that MicroProfile provided. + * + * @return an {@link Optional} containing the list of tokens when the setting is present; + * an empty {@link Optional} if the setting is not configured + */ + public Optional> lookupSplittedListOptional() { + return lookupSplittedListOptional(true); + } + + /** + * Lookup an optional comma-separated value and return the tokens as an immutable list. + * + * @param trimSpaces when {@code true}, individual elements are trimmed; tokens that become empty after + * trimming (because they were all whitespace) are removed to preserve MicroProfile's + * "no empty entries" guarantee; when {@code false}, the tokens are returned exactly as + * produced by MicroProfile Config + * @return an {@link Optional} containing the list of tokens when the setting is present; + * an empty {@link Optional} if the setting is not configured + */ + public Optional> lookupSplittedListOptional(boolean trimSpaces) { + return lookupOptional(String[].class) + .map(values -> Arrays.stream(values) + .map(s -> trimSpaces ? s.trim() : s) + .filter(s -> trimSpaces ? !s.isEmpty() : true) + .toList()); + } + + /** + * Lookup a required comma-separated value and return the tokens as an immutable list. + * MicroProfile Config removes zero-length segments when it converts to {@code String[]}, but it + * leaves any leading or trailing whitespace on the surviving tokens (including tokens that contain + * only spaces). This convenience overload trims each token; after trimming, any token that becomes + * empty (because it consisted solely of whitespace) is discarded so callers still receive a list that + * is free of empty strings. Use the boolean overload with {@code false} if you need the exact whitespace + * that MicroProfile provided. + * + * @return the list of tokens for the configured setting + * @throws java.util.NoSuchElementException if the setting is missing or blank + * @throws IllegalArgumentException if conversion to {@code String[]} fails + */ + public List lookupSplittedList() { + return lookupSplittedList(true); + } + + /** + * Lookup a required comma-separated value and return the tokens as an immutable list. + * + * @param trimSpaces when {@code true}, individual elements are trimmed; tokens that become empty after + * trimming (because they were all whitespace) are removed to preserve MicroProfile's + * "no empty entries" guarantee; when {@code false}, the tokens are returned exactly as + * produced by MicroProfile Config + * @return the list of tokens for the configured setting + * @throws java.util.NoSuchElementException if the setting is missing or blank + * @throws IllegalArgumentException if conversion to {@code String[]} fails + */ + public List lookupSplittedList(boolean trimSpaces) { + return Arrays.stream(lookup(String[].class)) + .map(s -> trimSpaces ? s.trim() : s) + .filter(s -> trimSpaces ? !s.isEmpty() : true) + .toList(); + } + } diff --git a/src/main/java/edu/harvard/iq/dataverse/util/CSLUtil.java b/src/main/java/edu/harvard/iq/dataverse/util/CSLUtil.java index fe9e00bd837..213737ffeeb 100644 --- a/src/main/java/edu/harvard/iq/dataverse/util/CSLUtil.java +++ b/src/main/java/edu/harvard/iq/dataverse/util/CSLUtil.java @@ -87,7 +87,7 @@ public static List getSupportedStyles(String localeCode) { * Adapted from private retrieveStyle method in de.undercouch.citeproc.CSL * Retrieves a CSL style from the classpath. For example, if the given name is * ieee this method will load the file /ieee.csl - * + * * @param styleName the style's name * @return the serialized XML representation of the style * @throws IOException if the style could not be loaded @@ -119,8 +119,9 @@ public static String getCitationFormat(String styleName) throws IOException { private static String[] getCommonStyles() { if (commonStyles == null) { - commonStyles = JvmSettings.CSL_COMMON_STYLES.lookupOptional().orElse("chicago-author-date, ieee") - .split("\\s*,\\s*"); + commonStyles = ListSplitUtil.split( + JvmSettings.CSL_COMMON_STYLES.lookupOptional().orElse("chicago-author-date, ieee") + ).toArray(new String[0]); } return commonStyles; } diff --git a/src/main/java/edu/harvard/iq/dataverse/util/ListSplitUtil.java b/src/main/java/edu/harvard/iq/dataverse/util/ListSplitUtil.java new file mode 100644 index 00000000000..793eef1db7b --- /dev/null +++ b/src/main/java/edu/harvard/iq/dataverse/util/ListSplitUtil.java @@ -0,0 +1,45 @@ +package edu.harvard.iq.dataverse.util; + +import java.util.Arrays; +import java.util.Collections; +import java.util.List; +import java.util.Set; +import java.util.regex.Pattern; + +/** + * Helpers for simple admin settings that accept comma-separated lists (origins, methods, headers, etc.). + *

+ * Behavior: + * - Leading/trailing whitespace of the whole input is ignored. + * - Whitespace immediately around commas is ignored ("GET, POST" == "GET,POST"). + * - Tokens are otherwise preserved exactly as typed (no quote stripping, no escape processing). + * Not a full CSV parser: embedded commas, quoted fields with separators, and newlines inside tokens are NOT supported. + */ +public final class ListSplitUtil { + /** Split on commas, trimming any adjacent to comma whitespace. */ + private static final Pattern SPLIT = Pattern.compile("\\s*,\\s*"); + + /** + * Split a comma-separated string into tokens preserving user input (beyond removing cosmetic + * whitespace around commas and overall leading/trailing whitespace). Returns an empty list for + * null or blank input. + */ + public static List split(final String rawCsv) { + if (rawCsv == null) { + return Collections.emptyList(); + } + final String trimmedCsv = rawCsv.trim(); + if (trimmedCsv.isEmpty()) { + return Collections.emptyList(); + } + return Arrays.asList(SPLIT.split(trimmedCsv)); + } + + /** Convenience: split into a lowercase set. */ + public static Set splitToLowerCaseSet(final String rawCsv) { + if (rawCsv == null || rawCsv.trim().isEmpty()) { + return Collections.emptySet(); + } + return Set.copyOf(split(rawCsv.toLowerCase())); + } +} diff --git a/src/main/java/edu/harvard/iq/dataverse/util/SystemConfig.java b/src/main/java/edu/harvard/iq/dataverse/util/SystemConfig.java index 71f24b1fe3a..c1d61378f42 100644 --- a/src/main/java/edu/harvard/iq/dataverse/util/SystemConfig.java +++ b/src/main/java/edu/harvard/iq/dataverse/util/SystemConfig.java @@ -1001,11 +1001,12 @@ public boolean isRsyncOnly(){ return false; } String uploadMethods = settingsService.getValueForKey(SettingsServiceBean.Key.UploadMethods); - if (uploadMethods==null){ + if (uploadMethods == null) { return false; - } else { - return Arrays.asList(uploadMethods.toLowerCase().split("\\s*,\\s*")).size() == 1 && uploadMethods.toLowerCase().equals(SystemConfig.FileUploadMethods.RSYNC.toString()); } + String normalizedUploadMethods = uploadMethods.toLowerCase(); + return ListSplitUtil.split(normalizedUploadMethods).size() == 1 + && normalizedUploadMethods.equals(SystemConfig.FileUploadMethods.RSYNC.toString()); } @Deprecated(forRemoval = true, since = "2024-07-07") @@ -1035,18 +1036,16 @@ private Boolean getMethodAvailable(String method, boolean upload) { upload ? SettingsServiceBean.Key.UploadMethods : SettingsServiceBean.Key.DownloadMethods); if (methods == null) { return false; - } else { - return Arrays.asList(methods.toLowerCase().split("\\s*,\\s*")).contains(method); } + return ListSplitUtil.split(methods.toLowerCase()).contains(method); } public Integer getUploadMethodCount(){ String uploadMethods = settingsService.getValueForKey(SettingsServiceBean.Key.UploadMethods); - if (uploadMethods==null){ + if (uploadMethods == null) { return 0; - } else { - return Arrays.asList(uploadMethods.toLowerCase().split("\\s*,\\s*")).size(); - } + } + return ListSplitUtil.split(uploadMethods.toLowerCase()).size(); } public boolean isAllowCustomTerms() { diff --git a/src/main/java/edu/harvard/iq/dataverse/workflow/internalspi/LDNAnnounceDatasetVersionStep.java b/src/main/java/edu/harvard/iq/dataverse/workflow/internalspi/LDNAnnounceDatasetVersionStep.java index d96c4db1305..49ca77573da 100644 --- a/src/main/java/edu/harvard/iq/dataverse/workflow/internalspi/LDNAnnounceDatasetVersionStep.java +++ b/src/main/java/edu/harvard/iq/dataverse/workflow/internalspi/LDNAnnounceDatasetVersionStep.java @@ -5,6 +5,7 @@ import edu.harvard.iq.dataverse.DatasetFieldType; import edu.harvard.iq.dataverse.DatasetVersion; import edu.harvard.iq.dataverse.branding.BrandingUtil; +import edu.harvard.iq.dataverse.util.ListSplitUtil; import static edu.harvard.iq.dataverse.settings.SettingsServiceBean.Key.LDNAnnounceRequiredFields; import static edu.harvard.iq.dataverse.settings.SettingsServiceBean.Key.LDNTarget; import edu.harvard.iq.dataverse.util.SystemConfig; @@ -48,7 +49,7 @@ * anounce new dataset versions to the Harvard DASH preprint repository so that * a DASH admin can create a backlink for any dataset versions that reference a * DASH deposit or a paper with a DOI where DASH has a preprint copy. - * + * * @author qqmyers */ @@ -76,7 +77,7 @@ public WorkflowStepResult run(WorkflowContext context) { CloseableHttpClient client = HttpClients.createDefault(); // build method - + HttpPost announcement; try { announcement = buildAnnouncement(false, context, target); @@ -126,8 +127,7 @@ HttpPost buildAnnouncement(boolean qb, WorkflowContext ctxt, JsonObject target) DatasetVersion dv = ctxt.getDataset().getReleasedVersion(); List dvf = dv.getDatasetFields(); Map fields = new HashMap(); - String[] requiredFields = ((String) ctxt.getSettings().getOrDefault(REQUIRED_FIELDS, "")).split(",\\s*"); - for (String field : requiredFields) { + for (String field : ListSplitUtil.split((String) ctxt.getSettings().getOrDefault(REQUIRED_FIELDS, ""))) { fields.put(field, null); } Set reqFields = fields.keySet(); diff --git a/src/test/java/edu/harvard/iq/dataverse/export/ddi/DdiExportUtilTest.java b/src/test/java/edu/harvard/iq/dataverse/export/ddi/DdiExportUtilTest.java index 360e9dfbafe..03000d55b5a 100644 --- a/src/test/java/edu/harvard/iq/dataverse/export/ddi/DdiExportUtilTest.java +++ b/src/test/java/edu/harvard/iq/dataverse/export/ddi/DdiExportUtilTest.java @@ -19,8 +19,6 @@ import java.nio.charset.StandardCharsets; import java.nio.file.Files; import java.nio.file.Path; -import java.nio.file.Paths; -import java.util.Arrays; import java.util.HashMap; import java.util.List; import java.util.Map; @@ -93,7 +91,7 @@ public static void setUpClass() throws Exception { PidUtil.clearPidProviders(); //Read list of providers to add - List providers = Arrays.asList(JvmSettings.PID_PROVIDERS.lookup().split(",\\s")); + List providers = JvmSettings.PID_PROVIDERS.lookupSplittedList(); //Iterate through the list of providers and add them using the PidProviderFactory of the appropriate type for (String providerId : providers) { System.out.println("Loading provider: " + providerId); diff --git a/src/test/java/edu/harvard/iq/dataverse/filter/CorsFilterTest.java b/src/test/java/edu/harvard/iq/dataverse/filter/CorsFilterTest.java new file mode 100644 index 00000000000..8db5d43e14d --- /dev/null +++ b/src/test/java/edu/harvard/iq/dataverse/filter/CorsFilterTest.java @@ -0,0 +1,224 @@ +package edu.harvard.iq.dataverse.filter; + +import jakarta.servlet.FilterChain; +import jakarta.servlet.ServletRequest; +import jakarta.servlet.ServletResponse; +import jakarta.servlet.http.HttpServletRequest; +import jakarta.servlet.http.HttpServletResponse; +import org.junit.jupiter.api.AfterEach; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; +import org.mockito.ArgumentCaptor; + +import java.util.HashMap; +import java.util.Map; + +import static org.junit.jupiter.api.Assertions.*; +import static org.mockito.ArgumentMatchers.any; +import static org.mockito.ArgumentMatchers.anyString; +import static org.mockito.ArgumentMatchers.argThat; +import static org.mockito.ArgumentMatchers.contains; +import static org.mockito.ArgumentMatchers.eq; +import static org.mockito.Mockito.*; + +class CorsFilterTest { + + private final Map sysPropsBackup = new HashMap<>(); + + @BeforeEach + void setUp() { + // backup potentially touched props + backupAndClear("dataverse.cors.origin"); + backupAndClear("dataverse.cors.methods"); + backupAndClear("dataverse.cors.headers.allow"); + backupAndClear("dataverse.cors.headers.expose"); + } + + @AfterEach + void tearDown() { + restore("dataverse.cors.origin"); + restore("dataverse.cors.methods"); + restore("dataverse.cors.headers.allow"); + restore("dataverse.cors.headers.expose"); + } + + @Test + void wildcardOrigin_allowsAny_noVary() throws Exception { + System.setProperty("dataverse.cors.origin", "*"); + + CorsFilter sut = new CorsFilter(); + sut.init(null); + + HttpServletRequest req = mock(HttpServletRequest.class); + when(req.getHeader("Origin")).thenReturn("https://a.example"); + HttpServletResponse res = mock(HttpServletResponse.class); + FilterChain chain = mock(FilterChain.class); + + sut.doFilter(req, res, chain); + + verify(res).setHeader("Access-Control-Allow-Origin", "*"); + // By design, Vary not required for wildcard + verify(res, never()).setHeader(eq("Vary"), anyString()); + verify(chain).doFilter(any(ServletRequest.class), any(ServletResponse.class)); + } + + @Test + void singleOrigin_echoesAndAddsVary() throws Exception { + System.setProperty("dataverse.cors.origin", "https://libis.github.io"); + + CorsFilter sut = new CorsFilter(); + sut.init(null); + + HttpServletRequest req = mock(HttpServletRequest.class); + when(req.getHeader("Origin")).thenReturn("https://libis.github.io"); + HttpServletResponse res = mock(HttpServletResponse.class); + when(res.getHeader("Vary")).thenReturn(null); + FilterChain chain = mock(FilterChain.class); + + sut.doFilter(req, res, chain); + + verify(res).setHeader("Access-Control-Allow-Origin", "https://libis.github.io"); + + ArgumentCaptor varyVal = ArgumentCaptor.forClass(String.class); + verify(res).setHeader(eq("Vary"), varyVal.capture()); + assertTrue(varyVal.getValue().contains("Origin")); + verify(chain).doFilter(any(ServletRequest.class), any(ServletResponse.class)); + } + + @Test + void multipleOrigins_echoesMatch_onlyWhenAllowed() throws Exception { + // Comma-separated list as set via JVM options/Microprofile + System.setProperty("dataverse.cors.origin", "https://a.example, https://b.example"); + + CorsFilter sut = new CorsFilter(); + sut.init(null); + + // allowed origin + HttpServletRequest reqAllowed = mock(HttpServletRequest.class); + when(reqAllowed.getHeader("Origin")).thenReturn("https://b.example"); + HttpServletResponse resAllowed = mock(HttpServletResponse.class); + FilterChain chain = mock(FilterChain.class); + + sut.doFilter(reqAllowed, resAllowed, chain); + verify(resAllowed).setHeader("Access-Control-Allow-Origin", "https://b.example"); + verify(resAllowed).setHeader(eq("Vary"), contains("Origin")); + + // not allowed origin -> no ACAO header set + HttpServletRequest reqDenied = mock(HttpServletRequest.class); + when(reqDenied.getHeader("Origin")).thenReturn("https://c.example"); + HttpServletResponse resDenied = mock(HttpServletResponse.class); + + sut.doFilter(reqDenied, resDenied, chain); + verify(resDenied, never()).setHeader(eq("Access-Control-Allow-Origin"), anyString()); + } + + @Test + void whitespaceAndMixedCasingParsing() throws Exception { + System.setProperty("dataverse.cors.origin", + " https://one.example ,\n\t https://two.example , https://three.example "); + + CorsFilter sut = new CorsFilter(); + sut.init(null); + + HttpServletRequest req = mock(HttpServletRequest.class); + when(req.getHeader("Origin")).thenReturn("https://two.example"); + HttpServletResponse res = mock(HttpServletResponse.class); + when(res.getHeader("Vary")).thenReturn("Accept-Encoding"); + + sut.doFilter(req, res, mock(FilterChain.class)); + + verify(res).setHeader("Access-Control-Allow-Origin", "https://two.example"); + // ensure existing Vary preserved and Origin added + verify(res).setHeader(eq("Vary"), argThat(v -> v.contains("Origin") && v.contains("Accept-Encoding"))); + } + + @Test + void wildcardAmongOthersTreatsAsWildcard() throws Exception { + System.setProperty("dataverse.cors.origin", "https://a.example,*,https://b.example"); + + CorsFilter sut = new CorsFilter(); + sut.init(null); + + HttpServletRequest req = mock(HttpServletRequest.class); + when(req.getHeader("Origin")).thenReturn("https://random.example"); + HttpServletResponse res = mock(HttpServletResponse.class); + + sut.doFilter(req, res, mock(FilterChain.class)); + + verify(res).setHeader("Access-Control-Allow-Origin", "*"); + verify(res, never()).setHeader(eq("Vary"), anyString()); + } + + @Test + void existingVaryMergedWithoutDuplication() throws Exception { + System.setProperty("dataverse.cors.origin", "https://merge.example"); + + CorsFilter sut = new CorsFilter(); + sut.init(null); + + HttpServletRequest req = mock(HttpServletRequest.class); + when(req.getHeader("Origin")).thenReturn("https://merge.example"); + HttpServletResponse res = mock(HttpServletResponse.class); + when(res.getHeader("Vary")).thenReturn("Accept-Encoding, Origin"); + + sut.doFilter(req, res, mock(FilterChain.class)); + + // Origin should not be duplicated + verify(res).setHeader(eq("Vary"), argThat(v -> v.indexOf("Origin") == v.lastIndexOf("Origin"))); + } + + @Test + void quotedHeaderListsPreserved() throws Exception { + System.setProperty("dataverse.cors.origin", "https://x.example"); + System.setProperty("dataverse.cors.headers.allow", "\"Accept, X-Dataverse-key\""); + System.setProperty("dataverse.cors.headers.expose", "\"Accept-Ranges, Content-Range\""); + System.setProperty("dataverse.cors.methods", "GET, POST, OPTIONS"); + + CorsFilter sut = new CorsFilter(); + sut.init(null); + + HttpServletRequest req = mock(HttpServletRequest.class); + when(req.getHeader("Origin")).thenReturn("https://x.example"); + HttpServletResponse res = mock(HttpServletResponse.class); + + sut.doFilter(req, res, mock(FilterChain.class)); + + // With simplified CsvUtil we now preserve surrounding quotes provided by admin config. + verify(res).setHeader("Access-Control-Allow-Headers", "\"Accept, X-Dataverse-key\""); + verify(res).setHeader("Access-Control-Expose-Headers", "\"Accept-Ranges, Content-Range\""); + verify(res).setHeader("Access-Control-Allow-Methods", "GET, POST, OPTIONS"); + } + + @Test + void disabledCors_skipsHeaders() throws Exception { + // no origin set -> CORS disabled + CorsFilter sut = new CorsFilter(); + sut.init(null); + + HttpServletRequest req = mock(HttpServletRequest.class); + when(req.getHeader("Origin")).thenReturn("https://any.example"); + HttpServletResponse res = mock(HttpServletResponse.class); + + sut.doFilter(req, res, mock(FilterChain.class)); + + verify(res, never()).setHeader(eq("Access-Control-Allow-Origin"), anyString()); + verify(res, never()).setHeader(eq("Access-Control-Allow-Methods"), anyString()); + verify(res, never()).setHeader(eq("Access-Control-Allow-Headers"), anyString()); + verify(res, never()).setHeader(eq("Access-Control-Expose-Headers"), anyString()); + } + + private void backupAndClear(String key) { + String old = System.getProperty(key); + if (old != null) { + sysPropsBackup.put(key, old); + } + System.clearProperty(key); + } + + private void restore(String key) { + System.clearProperty(key); + if (sysPropsBackup.containsKey(key)) { + System.setProperty(key, sysPropsBackup.get(key)); + } + } +} diff --git a/src/test/java/edu/harvard/iq/dataverse/pidproviders/PidUtilTest.java b/src/test/java/edu/harvard/iq/dataverse/pidproviders/PidUtilTest.java index 3f8c198b0fe..201d3c6c25d 100644 --- a/src/test/java/edu/harvard/iq/dataverse/pidproviders/PidUtilTest.java +++ b/src/test/java/edu/harvard/iq/dataverse/pidproviders/PidUtilTest.java @@ -153,7 +153,7 @@ public static void setUpClass() throws Exception { PidUtil.clearPidProviders(); //Read list of providers to add - List providers = Arrays.asList(JvmSettings.PID_PROVIDERS.lookup().split(",\\s")); + List providers = JvmSettings.PID_PROVIDERS.lookupSplittedList(); //Iterate through the list of providers and add them using the PidProviderFactory of the appropriate type for (String providerId : providers) { System.out.println("Loading provider: " + providerId); diff --git a/src/test/java/edu/harvard/iq/dataverse/util/ListSplitUtilTest.java b/src/test/java/edu/harvard/iq/dataverse/util/ListSplitUtilTest.java new file mode 100644 index 00000000000..9535cb4cca2 --- /dev/null +++ b/src/test/java/edu/harvard/iq/dataverse/util/ListSplitUtilTest.java @@ -0,0 +1,31 @@ +package edu.harvard.iq.dataverse.util; + +import org.junit.jupiter.api.DisplayName; +import org.junit.jupiter.api.Test; + +import java.util.List; +import java.util.Set; + +import static org.junit.jupiter.api.Assertions.*; + +class ListSplitUtilTest { + + @Test + @DisplayName("split preserves empty tokens and quotes") + void testSplitBasic() { + List tokens = ListSplitUtil.split(" a , b, \"c\" , , d "); + assertEquals(List.of("a", "b", "\"c\"", "", "d"), tokens); + } + + @Test + @DisplayName("splitToLowerCaseSet lowercases and de-dups (order not asserted)") + void testSplitToLowerCaseSet() { + assertTrue(ListSplitUtil.splitToLowerCaseSet(null).isEmpty(), "null should yield empty set"); + assertTrue(ListSplitUtil.splitToLowerCaseSet(" ").isEmpty(), "blank should yield empty set"); + Set set = ListSplitUtil.splitToLowerCaseSet("B, a, b, A, C"); + assertEquals(Set.of("b", "a", "c"), set); + + Set quoted = ListSplitUtil.splitToLowerCaseSet("\"A\" , \"b\" , \"A\""); + assertEquals(Set.of("\"a\"", "\"b\""), quoted); + } +}