-
Notifications
You must be signed in to change notification settings - Fork 3
Total_Scrim_Stats_90d #69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
||
| WHERE sm."createdAt" >= current_date - interval '90 days' | ||
|
|
||
| GROUP BY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SQL Error [42601]: ERROR: syntax error at or near "GROUP"
Position: 1
Error position: line: 81
Please make sure your queries actually run as part of the PR process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original error was caused by an incomplete WHERE clause.
I’ve fixed it. It now uses
WHERE sm."createdAt" >= (NOW() - INTERVAL '90 days').
This version should actually run cleanly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I actually request that we modify this to look back 365 days instead of 90? I ran some quick numbers on it using an interval of 365 instead:
rows: 37174
scrims: 8846
Obviously this included an extended downtime, but even if we double the row count we're going to be in under 100k rows and I'd imagine that's fine? @gankoji this is assuming we don't have OOM errors anymore (which I thought you said we won't now that we're on cloud).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
365d would work for my project as well. 37k rows is manageable in sheets compared to the old 490k+ rows. Even if that number grows considerably due to no extended down times it would not impact me on my end. Whatever the maintainers think is best I'll be happy with <3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're not on "the cloud". Well, I guess in some sense we are: this job runs as a GitHub action, right here in the repository. But it's not on a cloud VM or other SaaS product we pay for through the normal cloud compute providers like Azure.
Anyway, we don't currently hit any sort of OOM or disk space issues with the job, so I don't see the harm in extending the lookback period.
| sm."createdAt" AS scrim_created_at, | ||
|
|
||
| -- Privacy-friendly stable scrim identifier (hashed) | ||
| md5(sm.id::text) AS scrim_key, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to hash this for privacy? Leaving the id as normal allows us to join the table for Evidence queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, in my prompt I expressed a concern for sensitive information. I can update the sql, remove the hashing and push the look back to 365d from 90d if you all would like.
Just a spiritual successor to the old Total_Scrim_Stats dataset. I did not make this, I had ChatGPT do it. Please let me know if there are issues.