Skip to content

Conversation

@CornerJockeys
Copy link

Just a spiritual successor to the old Total_Scrim_Stats dataset. I did not make this, I had ChatGPT do it. Please let me know if there are issues.


WHERE sm."createdAt" >= current_date - interval '90 days'

GROUP BY
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQL Error [42601]: ERROR: syntax error at or near "GROUP"
Position: 1

Error position: line: 81

Please make sure your queries actually run as part of the PR process.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original error was caused by an incomplete WHERE clause.
I’ve fixed it. It now uses
WHERE sm."createdAt" >= (NOW() - INTERVAL '90 days').
This version should actually run cleanly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I actually request that we modify this to look back 365 days instead of 90? I ran some quick numbers on it using an interval of 365 instead:

rows: 37174

scrims: 8846

Obviously this included an extended downtime, but even if we double the row count we're going to be in under 100k rows and I'd imagine that's fine? @gankoji this is assuming we don't have OOM errors anymore (which I thought you said we won't now that we're on cloud).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

365d would work for my project as well. 37k rows is manageable in sheets compared to the old 490k+ rows. Even if that number grows considerably due to no extended down times it would not impact me on my end. Whatever the maintainers think is best I'll be happy with <3

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not on "the cloud". Well, I guess in some sense we are: this job runs as a GitHub action, right here in the repository. But it's not on a cloud VM or other SaaS product we pay for through the normal cloud compute providers like Azure.

Anyway, we don't currently hit any sort of OOM or disk space issues with the job, so I don't see the harm in extending the lookback period.

sm."createdAt" AS scrim_created_at,

-- Privacy-friendly stable scrim identifier (hashed)
md5(sm.id::text) AS scrim_key,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to hash this for privacy? Leaving the id as normal allows us to join the table for Evidence queries.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, in my prompt I expressed a concern for sensitive information. I can update the sql, remove the hashing and push the look back to 365d from 90d if you all would like.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants