Skip to content

bci_repo_publish.py: Return 1 if the last published build is over a week old#2998

Open
Vogtinator wants to merge 1 commit into
openSUSE:masterfrom
Vogtinator:bcirepopublisher
Open

bci_repo_publish.py: Return 1 if the last published build is over a week old#2998
Vogtinator wants to merge 1 commit into
openSUSE:masterfrom
Vogtinator:bcirepopublisher

Conversation

@Vogtinator

Copy link
Copy Markdown
Member

This way it's easy to spot whether a manual look is required.

I'm not entirely sure whether this is a good idea. Ideally someone looks directly when builds or tests fail and fixes this before a week expires, but as additional safe guard it's probably worth it.

@Vogtinator Vogtinator requested a review from dirkmueller July 26, 2023 07:52
@codecov-commenter

codecov-commenter commented Jul 26, 2023

Copy link
Copy Markdown

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (master@2a0b30a). Learn more about missing BASE report.
Report is 447 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##             master    #2998   +/-   ##
=========================================
  Coverage          ?   28.45%           
=========================================
  Files             ?       85           
  Lines             ?    14716           
  Branches          ?        0           
=========================================
  Hits              ?     4188           
  Misses            ?    10528           
  Partials          ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…eek old

This way it's easy to spot whether a manual look is required.
@dirkmueller

Copy link
Copy Markdown
Member

I think we should simply stop publishing when there are failed builds? I'm not sure what hte cutoff of 7 days is solving?

@Vogtinator

Copy link
Copy Markdown
Member Author

I'm not sure what hte cutoff of 7 days is solving?

Currently the bot fails only if some API call actually fails, but not if a build failed or openQA test failed, as this is just expected behaviour. This means that if the repo is unresolvable for a month, the only sign is that there haven't been any openQA runs recently. Or if openQA fails for a month because nobody had a look, the jobs are still green.

With this change it's visible in botmaster as a pipeline failure if there hasn't been a new build published for a week.

Comment thread gocd/bci_repo_publish.py
# Trigger publishing
if token is None:
self.logger.warning('Would publish now, but no token specified')
self.last_publish_mtime = packages[0]['built_mtime']

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we setting last_publish_mtime in this error case?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really an error case, more of a dry run case

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, but there are plenty of other returns before hand that are not setting last_publish_mtime. so this is a bit inconsistent.

Comment thread gocd/bci_repo_publish.py
self.logger.info(f'{build_prj}/images not successfully built')
pkg = packages[0]
self.last_publish_mtime = \
self.mtime_of_product(pkg['publish_prj'], pkg['name'], 'images', 'local')

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here? we're not actually publishing, we're aborting because it wasn't successfully built?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, which means we have to fetch the last published mtime from OBS to know how long ago that was

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a nit pick, people usually avoid escaping newlines in python. you can write this in a more pythonic way:

     self.last_publish_mtime = self.mtime_of_product(
           pkg['publish_prj'], pkg['name'], 'images', 'local')

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO harder to read and flake8 didn't complain 🤷

@dirkmueller

Copy link
Copy Markdown
Member

so if I understand this correctly. the only goal of the whole logic is to make the checkbox "red" rather than "green" in the botmaster listing/graph.

how about we change the method to return True/False on whether it actually did a publish or not and then simply convert that to an exit code of 0 / 1?

then it would be red if it had something to publish but didn't publish (because tests failed or no matching run found or publish failed), and otherwise green?

does the UI support tripple state? then we could encode the "still waiting" states with another color..

@Vogtinator

Copy link
Copy Markdown
Member Author

so if I understand this correctly. the only goal of the whole logic is to make the checkbox "red" rather than "green" in the botmaster listing/graph.

Yes.

how about we change the method to return True/False on whether it actually did a publish or not and then simply convert that to an exit code of 0 / 1?
then it would be red if it had something to publish but didn't publish (because tests failed or no matching run found or publish failed), and otherwise green?

This wouldn't be able to catch if there was no new build for a week, e.g. due to unresolvable or blocked.

does the UI support tripple state? then we could encode the "still waiting" states with another color..

There's only orange for "pipeline is running"

@dirkmueller

Copy link
Copy Markdown
Member

then it would be red if it had something to publish but didn't publish (because tests failed or no matching run found or publish failed), and otherwise green?
This wouldn't be able to catch if there was no new build for a week, e.g. due to unresolvable or blocked.

well, unresolvable or build failed would lead to an exit 1, as well as a missing openqa run. unfinished openqa run or unclean build status would not.

does the UI support tripple state? then we could encode the "still waiting" states with another color..
There's only orange for "pipeline is running"

ok, too bad.

Well, I don't have strong feelings either way. I think the 7 days to wait is arbitrary and feels a bit too long. maybe 3 days?
Other than that I still think trying to determine the status "this looks erratic" or "this can still be normal" and then just exiting 1/0 right away is the better approach. we don't want to wait for a week to notice that openqa has lost an event or the like. that is not gonna help anyone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants