Skip to content

Snapshotting and distributing the database in parquet format #3

@kuzdogan

Description

@kuzdogan

Description

This issue outlines the task of creating regular snapshots of the Verifier Alliance database and distributing them in Parquet format. The database is currently hosted on Google Cloud SQL.

Goals

  • Create periodic snapshots of the Verifier Alliance database.
  • Distribute these snapshots as Parquet files for potential downstream processing and analysis.

Discussion

  • Snapshotting Approach: We need to decide on the best approach for snapshotting the Cloud SQL database. Two options are (according to GPT):
    • Utilize BigQuery's federated queries and export functionality (preview) to export the data to Cloud Storage in Parquet format. (related Stackoverflow question)
    • Develop a solution using Cloud Functions or Cloud Dataflow to extract data from Cloud SQL, convert it to Parquet format, and write it to Cloud Storage.
  • Distribution Needs: Determine the destination for the distributed Parquet files (Cloud Storage etc.).
  • Scheduling: Define the scheduling requirements for creating these snapshots (daily, weekly, etc.).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions