refactor(io): use the registry pattern for IO schemes#709
refactor(io): use the registry pattern for IO schemes#709alessandro-nori wants to merge 9 commits intoapache:mainfrom
Conversation
3ed4082 to
24273ec
Compare
io/registry.go
Outdated
| if factory == nil { | ||
| panic("io: Register factory is nil") | ||
| } | ||
| defaultRegistry.set(scheme, factory) |
There was a problem hiding this comment.
Should we have something here to handle a case where there's already something registered for a given scheme? In this current version, it looks like a user importing two packages registering themselves on the same scheme could result in a surprising behavior.
For reference, database/sql actually checks for prior existence of a driver and panics in case of a duplicate registration.
There was a problem hiding this comment.
That's a fair point. I just mirrored what was done in the catalog package but I think this is worth adding here
There was a problem hiding this comment.
This makes sense, currently the stated semantics of the catalog registry is that if something is registered it will overwrite anything that is already registered with that catalog type. but for file IO it might makes sense to be a bit more strict
There was a problem hiding this comment.
Thanks Alex and Matt!
I implemented the change in ec4790b
There was a problem hiding this comment.
Does it make sense to handle it atomically? Because right now, two goroutines can register same scheme while skipping panic
There was a problem hiding this comment.
One solution would be for the mutex to be in the Register and Unregister functions rather than in the set and remove methods of the registry object.
| gocloud.S3Region: "us-east-1", | ||
| gocloud.S3AccessKeyID: "admin", | ||
| gocloud.S3SecretAccessKey: "password", |
There was a problem hiding this comment.
since these are properties that tend to be shared across any IO implementation, we should probably leave these constants in the io package rather than moving them down to the gocloud package. the assumption being that the properties being looked for in the catalog should not be dependant on the IO implementation being used.
| icebergio.Register("mem", func(ctx context.Context, parsed *url.URL, props map[string]string) (icebergio.IO, error) { | ||
| bucket := memblob.OpenBucket(nil) | ||
|
|
||
| return createBlobFS(ctx, bucket, defaultKeyExtractor(parsed.Host)), nil | ||
| }) |
There was a problem hiding this comment.
I don't think we had any tests for mem, can we add some?
There was a problem hiding this comment.
Sure, I added some basic tests to create, write, read and delete an in-memory file
068d8cf
5411291 to
068d8cf
Compare
zeroshade
left a comment
There was a problem hiding this comment.
Just the one nitpick on atomically handling register/unregister
Otherwise this looks good to me!
io/registry.go
Outdated
| type registry map[string]SchemeFactory | ||
|
|
||
| var ( | ||
| regMutex sync.Mutex |
There was a problem hiding this comment.
it might be better to use sync.RWMutex
io/io.go
Outdated
| } | ||
|
|
||
| var ( | ||
| ErrIONotFound = errors.New("io scheme not registered") |
io/registry.go
Outdated
| regMutex.Unlock() | ||
|
|
||
| if !ok { | ||
| return nil, fmt.Errorf("%w: %s", ErrIONotFound, parsed.Scheme) |
There was a problem hiding this comment.
Previously path was added into error but now it's removed. Is this a change we want? It could be useful for debugging
There was a problem hiding this comment.
I agree, let's keep the path in the error
| S3EndpointURL = "s3.endpoint" | ||
| S3ProxyURI = "s3.proxy-uri" | ||
| S3ConnectTimeout = "s3.connect-timeout" | ||
| S3SignerUri = "s3.signer.uri" |
There was a problem hiding this comment.
| S3SignerUri = "s3.signer.uri" | |
| S3SignerURI = "s3.signer.uri" |
| GCSKeyPath = "gcs.keypath" | ||
| GCSJSONKey = "gcs.jsonkey" | ||
| GCSCredType = "gcs.credtype" | ||
| GCSUseJsonAPI = "gcs.usejsonapi" // set to anything to enable |
There was a problem hiding this comment.
| GCSUseJsonAPI = "gcs.usejsonapi" // set to anything to enable | |
| GCSUseJSONAPI = "gcs.usejsonapi" // set to anything to enable |
| AdlsSasTokenPrefix = "adls.sas-token." | ||
| AdlsConnectionStringPrefix = "adls.connection-string." | ||
| AdlsSharedKeyAccountName = "adls.auth.shared-key.account.name" | ||
| AdlsSharedKeyAccountKey = "adls.auth.shared-key.account.key" | ||
| AdlsEndpoint = "adls.endpoint" | ||
| AdlsProtocol = "adls.protocol" |
There was a problem hiding this comment.
ADLS...
nit: these were here before so feel free to ignore but noting while we're on it
Related to #696
This PR introduces a registry pattern for IO implementations, similar to the existing catalog package pattern.
Moved all cloud storage implementations to
io/gocloud.Extra notes
I decided to use a single subpackage because all the existing implementations use the same dependency and it's easier to import just one package to register all of them. However I think in most of the integration tests only use
s3so multiple subpackages would also work fine.