Skip to content

user interface #92

@e-kotov

Description

@e-kotov

What I mean with materialization as a vector with new_column = NULL, is to keep the exact same behavior as the sf package.

Yes, I got that. I was just saying that this seems confusing to me that new_column = NULL means 'return a vector'. It would definitely work, but I don't think this is good API design.

Another option I would see is that output = "sf" would return the same type of object as the sf package does, and not inherently an sf object. That is, if st_is_simple() returns a logical vector, then ddbs_is_simple(..., output = "sf") would return a logical vector, and not the sf version of duckspatial_df. Is this what you meant?

Yes, exactly.

Do you mean to remove "raw", "geoarrow" here, and keep the other two (duckspatial_df and sf)?

Not really, as we may need these outputs too.

I think we are doing most things correct now, it's just that we are not packaging them behind a well designed intuitive API, and the main problem is mostly semantics. That is, if we were to just rename output to mode it might immediately become better, as output='sf' indeed suggests that the output is 'sf' object, but mode='sf' is easier to understand as 'oh, I will get an object same as or similar to what I would expect form 'sf'. Though if we rename output to mode, I'm not sure other existing options will work well ("raw", "geoarrow") as we don't really have any support for them in any downstream workflows. So perhaps we need to split mode and output. I'm not sure what exactly to do right now in terms of editing code. I would take some time to experiment with the codebase that we have right now and perhaps split this discussion into a separate issue where we could agree on how the final API should look like.

e.g. in sf:

library(sf)

# 1. st_is_valid returns a raw logical vector (TRUE/FALSE/NA)
valid_vector <- st_is_valid(my_sf_object)

# 2. To use it for filtering, you often attach it back or index directly
my_sf_object$is_valid <- valid_vector
clean_data <- my_sf_object[my_sf_object$is_valid == TRUE, ]

in duckspatial

library(duckspatial)

# 1. ddbs_is_valid returns a 'tbl_lazy' (pointer to DB) with a NEW column added
my_db_object <- ddbs_is_valid(my_db_object, new_column = "is_valid")

# or just this if we assume 'duckspatial_df` mode/output is the default
my_db_object <- ddbs_is_valid(my_db_object)

# 2. Filtering happens inside the database
clean_data <- my_db_object |>
  filter(is_valid == TRUE)

So actually we are already winning in my opinion, as there is no extra step of assigning a vector manually. But, there might be other use cases, so we could take a bit of time pooling our collective experience of working with spatial data thinking what other workflows we might have issues with if we keep things as they are designed now. Or perhaps what user facing helpers we might need to add for those cases.

Originally posted by @e-kotov in #83

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions