restored suppressing host dataverse field#11865
Conversation
qqmyers
left a comment
There was a problem hiding this comment.
This PR updates #11301 and removes the significant performance issue there. I think that, to move forward, this just needs to have a release note, and to get rid of the unused imports. I've made a couple notes about other possible minor performance improvements - I think unless @jo-pol wants to make those or someone on the team thinks they're needed, they probably aren't needed (I'll poll people at standup/triage tomorrow).
|
|
||
| import org.apache.commons.lang3.StringUtils; | ||
| import org.joda.time.DateTime; | ||
| import org.joda.time.Duration; |
| } | ||
|
|
||
| public boolean isHasDataversesToChoose() { | ||
| this.hasDataversesToChoose = dataverseService.getDataverseCount() > 1; |
There was a problem hiding this comment.
FWIW (and so we don't lose it): the original PR has a suggestion that one could query ~ "SELECT 1 FROM Dataverse d limit 2" which would be more efficient that getting a count and still address the issue (is there more than one dataverse so the user can actually make a choice). I'll also note that this form might be easier to extend to look for dataverses where a user has permission to add a dataset in the future. That said, just getting the count is orders of magnitude faster than the problematic query in the original PR, so I think this is fine unless @jo-pol wants to go further.
There was a problem hiding this comment.
to look for dataverses where a user has permission to add a dataset in the future
That thought crossed my mind too. We now still might show the field while there is nothing to choose.
| } | ||
|
|
||
| public boolean isHasDataversesToChoose() { | ||
| // TODO we actually need to look for dataverses where a user has permission to add a dataset |
There was a problem hiding this comment.
Tried to create a new dataset with a user that had no permission to any of the 100K numbered test dataverses. It took a few seconds when searching with 2 digits.
So a variant of DataverseServiceBean.filterDataversesForHosting (on github) without a pattern for the query and exiting after one (or two?) successfull authourisation checks would take way too long for users with permission for too few dataverses.
A more efficient query might be something like
SELECT dv
FROM roleassignment ra
JOIN dataverse dv
on dv.id = ra.definitionpoint_id
where ra.assigneeidentifier in ('@user001', ':authenticated-users');
However, RoleAssignment.findAssigneesWithPermissionOnDvObject (on github) is dazzling me.
There was a problem hiding this comment.
Yeah - that query is more complex because it has to look for filedownloads configured on the dataset owning a file, etc. I haven't fully thought it through, but this custom code from QDR might be a reasonable model: https://github.com/QualitativeDataRepository/dataverse/blob/b9c0bdc374518cc499eba564e8ad04ec059bc1ff/src/main/java/edu/harvard/iq/dataverse/PermissionServiceBean.java#L1014-L1051 - it checks for superuser first, and if, the user isn't one, gets the list of roles that include the relevant permission (AddDataset in this case) and looks for a case where the user has that role on something. I think you could further constrain by looking for the roleassignment's dvobject having dtype='Dataverse'. Setting the limit to 2 would then let you see if that user has a choice somewhere. I think this is reasonably performant, but I haven't tested on a very large database (as with the case in this PR, it was orders of magnitude faster than the code I replaced).
Do you want to try that? And maybe cache the result so it isn't called 7 times and call this PR ready to move forward?
There was a problem hiding this comment.
Looking at the dazzling query PermissionServiceBean.LIST_ALL_DATAVERSES_USER_HAS_PERMISSION I might have to do something with groups too. That might be more work than @janvanmansum baragained for.
Anyway, even on a simplified solution I get:
jakarta.ejb.EJBException: An exception occurred while creating a query in EntityManager:
Exception Description: Syntax error parsing [SELECT dv FROM roleassignment ra JOIN dataverse dv ON dv.id = ra.definitionpoint_id WHERE ra.assigneeidentifier = '@user001' OR ra.assigneeidentifier = ':authenticated-users' limit 2].
[90, 182] The expression is not a valid conditional expression.
The query works in psql.
There was a problem hiding this comment.
The simplified attempt: DANS-KNAW-jp@4fcf8e1
There was a problem hiding this comment.
Hacked PermissionServiceBean.findPermittedCollections with "limit 2" for when it should have returned 50K instances. That reacted quickly. So I need
permissionService.findPermittedCollections(req, user, Set.of(Permission.CreateDataset))
with a variant of the query in the simplified attempt.
There was a problem hiding this comment.
@jo-pol - Not sure where my comment went - so ~retyping.
The main issue with the 'simplified attempt' is that it's a native sql query and not JPA. That could be addressed by using createNativeQuery() or using JPA. I suggest the following code (have not tested). It handles all the groups the user might be in and limits which roles are checked, along with keeping your check that the definitionpoints are Dataverses.
I'm not sure if it is more efficient that the findPermittedCollections - certainly looks simpler. The code below does break finding the roles and groups out into separate queries (less efficient?), but it may be more efficient to be checking the role ids (which hopefully are indexed) rather than doing the bit-level or on the permissions. I'm not sure how well postgres can optimize in either case.
Set<RoleAssignee> ras = new HashSet<>(groupService.groupsFor(req));
ras.add(user);
List<String> raIds = ras.stream().map(roas -> roas.getIdentifier()).collect(Collectors.toList());
List<Long> roleIds = new ArrayList<>();
for (DataverseRole role : roleService.findAll()) {
if (role.permissions().contains(Permission.AddDataset)) {
roleIds.add(role.getId());
}
}
// Just check for more than one matching record
// More efficient than counting all records
try {
return em.createQuery(
"SELECT ra.id FROM RoleAssignment ra " +
"JOIN Dataverse dv ON dv.id = ra.definitionPoint.id " +
"WHERE ra.assigneeIdentifier IN :assigneeIdentifiers " +
"AND ra.role.id IN :roleIds",
Long.class)
.setParameter("assigneeIdentifiers", raIds)
.setParameter("roleIds", roleIds)
.setMaxResults(2)
.getResultList().size() > 1;
} catch (NoResultException e) {
return false;
}
There was a problem hiding this comment.
That could be addressed by using createNativeQuery()
That is how the permissionService executes the query anyway and how I reused it with limit 2 added. Reusing existing logic reduces maintenance risks IMHO. Updated the cover message with how I tested.
|
discussed during triage - Jim will follow up |
| @@ -0,0 +1,9 @@ | |||
| ### Restored suppression of the host dataverse field | |||
There was a problem hiding this comment.
Since the release note is what users see, I'd suggest changing the title to "Suppression of the Host Dataverse field" and removing the second (for developers) and third (no longer true) paragraphs.
# Conflicts: # src/main/java/edu/harvard/iq/dataverse/PermissionServiceBean.java
| } | ||
| } | ||
| return null; | ||
| } |
There was a problem hiding this comment.
I'm puzzled how to assemble the URL for testing linkingDataverses with the API.
There was a problem hiding this comment.
Weird, Intellij does not blame me for the code inside getIpRange and setSearchParamValues but github does blame me.
There was a problem hiding this comment.
The following command looks more like the intellij result
git blame -w -M -C --minimal -L 954,1040 PermissionServiceBean.java
|
@jo-pol - are you done with changes? If so, we can remove your assignment and let it go into QA. |
I am done with the changes and unassigned myself. I'd recommend to test linkingDataverses with the API. Don't have a clue to do it myself. |
cherry pick stopped at merge IQSS/development
|
Hey @jo-pol, I wasn't able to reproduce the issue on demo where we have trouble editing the host DV - could you please try and reproduce the issue in demo (running v6.8 at the moment) - Thanks! |
The following scenario still blocks the deposit page. The top bar and footer are accessible, the deposit form becomes faint and no new draft appears when selecting "My Data".
|
|
Thank you! |




See #11301
and #11700
How I tested the solution:
On our own VM
Tried cases
Saw no performance difference in these cases. The query was executed quickly and only once in both cases.
After solving the merge conflict
Did not retest with the debugger, but performance still was OK.
Also tried
/api/mydata/retrieve/collectionListwhich returned the expected number.