d-portal database dump #393
Replies: 8 comments 2 replies
-
|
Update: |
Beta Was this translation helpful? Give feedback.
-
|
@wakibi @akmiller01 what do you think? |
Beta Was this translation helpful? Give feedback.
-
|
Unpacking the file right now. It's about 38 GB fully extracted, so we might not want to run a sync daily. Coincidentally we're both on Postgres 12.5, so there should be no compatibility issues. We would probably need to edit the SQL before importing, as ownership is given to a user |
Beta Was this translation helpful? Give feedback.
-
|
There are options to import schema, and then table-by-table import data @wakibi https://thequantitative.medium.com/restoring-individual-tables-from-postgresql-pg-dump-using-pg-restore-options-ef3ce2b41ab6 Any other thoughts on how we could selectively import things into the proper schema/table names? Maybe we could even do string replacement in Bash e.g. |
Beta Was this translation helpful? Give feedback.
-
|
Tested an import over the weekend. It took 1750 minutes to fully load the .sql file. Unfortunately it's not a binary pg_dump, the file is just raw SQL, so we cannot use pg_restore to selectively import bits and pieces. With a full import taking about 30 hours (which would also significantly slow down all queries against the database during those 30 hours), I don't think this particular file is feasible. If we want a d-portal mirror, I would suggest we reach back out to Kriss and Shi and ask for a streamlined export of just what we need. Otherwise it would be much more efficient to extract budget data ourselves. |
Beta Was this translation helpful? Give feedback.
-
|
With that performance it can’t be of much use to anyone. Worth a chat with Kris?
Get Outlook for iOS<https://aka.ms/o0ukef>
…________________________________
From: Alex Miller <notifications@github.com>
Sent: Monday, January 25, 2021 3:12:48 PM
To: devinit/ddw-analyst-ui <ddw-analyst-ui@noreply.github.com>
Cc: Bill Anderson <Bill.Anderson@devinit.org>; Author <author@noreply.github.com>
Subject: Re: [devinit/ddw-analyst-ui] d-portal database dump (#393)
Tested an import over the weekend. It took 1750 minutes to fully load the .sql file. Unfortunately it's not a binary pg_dump, the file is just raw SQL, so we cannot use pg_restore to selectively import bits and pieces.
With a full import taking about 30 hours (which would also significantly slow down all queries against the database during those 30 hours), I don't think this particular file is feasible.
If we want a d-portal mirror, I would suggest we reach back out to Kriss and Shi and ask for a streamlined export of just what we need. Otherwise it would be much more efficient to extract budget data ourselves.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#393 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AA2UN2XA75KEIBXOBWKUCHTS3WC7BANCNFSM4WDYOSLA>.
______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________
______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________
|
Beta Was this translation helpful? Give feedback.
-
|
If the main reason we want this is for budget data, I think it might be better to work with an IATI tool that we know has long term support eg the registry. |
Beta Was this translation helpful? Give feedback.
-
|
Hello, Thanks for having a go at this and sharing your insights at the process. So as well as the file we've mentioned, there are also a couple more options, all updated nightly: pg_dump custom format so pg_restore can be used with its various options A zip of all the raw cached xml For us, a full restore of the database takes about 7 hours or 2 hours if you use multithreading with pg_restore. Almost all of the time is spent rebuilding indexes so it's not the data itself that takes the time. We create a lot of indexes because, as a public facing site, we need the database queries to run as fast as possible. @akmiller01 I'm wondering if most of the 30 hours import you experienced was from slow disk access.
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Is this worth importing so that we have access to all non-transaction data?
From Shi on Discord
Would a database dump like this be useful to anyone? It's about 3GB in size and is the entire d-portal database which you have live access to via http://d-portal.org/dquery/ It can be imported locally and queried using the same sql code that is used on the web interface so you can run large queries on it without clogging up d-portal.
http://d-portal.org/db/dstore.sql.gz
This is currently a test file but could be updated nightly after each import. We've been getting lots of specific (and expensive) queries from people about getting data out of d-portal in many unique ways (like IATI/D-Portal#589) and thought this might be useful.
Let us know if it is and we can make it better o/
This is a PostgreSQL dump though and uses the json data type in postgresql. Technically, you could spin up a postgresql server and import it just as is and start querying.
@akmiller01 @dean-breed @wakibi @k8hughes
Beta Was this translation helpful? Give feedback.
All reactions