[WIP] Allow Bulkrax to assign queue based on number of rows in CSV#671
[WIP] Allow Bulkrax to assign queue based on number of rows in CSV#671thenapking wants to merge 8 commits into
Conversation
Basic way of switching queues
Working Tests Set bulk on collections
Most tests working Import Mode Tests
9950d29 to
0f5520c
Compare
| return super if non_tenant_job? | ||
| switch do | ||
| return super unless Flipflop.enabled?(:import_mode) | ||
| puts "Job: #{self.class}. Bulk is #{bulk?}" |
There was a problem hiding this comment.
These logger/puts statements are very useful for debugging but must be removed before this is merged.
| @@ -0,0 +1,15 @@ | |||
| # frozen_string_literal: true | |||
| # rubocop:disable Rails/Output | |||
There was a problem hiding this comment.
This disable statement should be removed, along with the puts statement before merge
| @@ -0,0 +1,15 @@ | |||
| # frozen_string_literal: true | |||
| # rubocop:disable Rails/Output | |||
There was a problem hiding this comment.
This disable statement should be removed, along with the puts statement before merge
| @@ -0,0 +1,14 @@ | |||
| # frozen_string_literal: true | |||
| # rubocop:disable Rails/Output | |||
There was a problem hiding this comment.
This disable statement should be removed, along with the puts statement before merge
| @@ -0,0 +1,18 @@ | |||
| # frozen_string_literal: true | |||
| # rubocop:disable Rails/Output | |||
There was a problem hiding this comment.
This disable statement should be removed, along with the puts statement before merge
|
|
||
| private | ||
|
|
||
| def works |
There was a problem hiding this comment.
These changes clean up the class a little and allow the new code to share logic with the existing code. However they are not strictly needed for this update. Only the total_entries method is needed and that could be refactored if you do not wish to add all these refactors.
Note: In addition to this description a Wiki article has been created to expand on this information
This PR replaces the
import_modefeature flipper with code that checks how many works are being imported as part of an import run and switches to bulk processing queues if a threshold has passed.#Methodology
This adapts the currently logic for import mode, and essentially flips the feature on based on a database lookup. The
ImportModemodule is included onActiveJob::Base.The
portable_objectFor any arbitrary job we need a way of fetch the work or collection the job is operating on (which we call the
portable_object) and looking up the parser class associated with that object in order to fetch the total number of objects being imported. There are a number of different ways a job may refer to aportable_object. Some jobs will be initialised with the UUID of a work or collection. From that UUID we would be able to find whether it is currently being updated by Bulkrax, and if so, how many objects are in the importer run. In some cases the work or collection may not yet exist (perhaps because the job will create it), but aBulkrax::Entrywill exist. Once we have theBulkrax::Entrywe can again fetch the parser and count the total number of objectsThe strategy classes
There are four strategies the jobs use to identify the
portable_object. The most simple is thePortableBulkraxEntryBehaviorwhich finds theBulkrax::Entryfrom the id passed to the job. ThePortableActiveFedoraBehaviorfetches anActiveFedorarecord and finds an associatedBulkrax::Entryeither from thesource_idor theiditself. ThePortableGenericBehavioris used when the job is passed the ActiveFedora object itself rather than itsid. The final strategy is thePortableBulkraxImporterBehavior. This strategy applies only to the job which kicks off the import (Bulkrax::ImportJob), which is called on neither a work nor a collection, but a table internal to Bulkrax.Import mode
Once we have identified the
portable_object, we look up its parser and return thetotalattribute. If this is above a threshold then import mode is triggered. The tests for this can proceed as previously. We create a generic job class and includeImportMode. All we need to add is some mocks which will return the count of the number of works being imported. Which mocks we setup depends on which strategy is used. The spec loops through the four strategies and sets up slightly different mocks.TODO
Whilst the tests demonstrate in principle that the queues are being assigned correctly in practice it appears that the web process picks up most of the Bulkrax jobs, at least in the development environment. This is because the jobs are not being called asynchronously at all, but are called with perform now. Thus it is difficult to tell whether this PR will do what we expect without manual testing on a separate test environment.
Warning: This PR needs to be thoroughly tested before it is merged, and the change will need to be monitored for several weeks after deployment. It risks breaking import mode which will severely impact the business's ability to onboard new clients.