Fix SIGFPE crash when Cluster directive used with DSCONV layers#40
Open
velcroapple wants to merge 1 commit into
Open
Fix SIGFPE crash when Cluster directive used with DSCONV layers#40velcroapple wants to merge 1 commit into
velcroapple wants to merge 1 commit into
Conversation
- Skip pushing empty iter_state_list for Cluster directives in DFA_iteration-analysis.hpp to prevent num_total_cases=0 - Add zero-guard for computation_delay before division in CA_cost-analysis-engine.hpp line 344 - Reproducer: ./maestro with Resnet50_rs.m mapping crashes on DSCONV layers that use Cluster(3,P) with K=1 dimensions
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bug
Running MAESTRO with mappings that apply a Cluster directive to DSCONV
layers (e.g. Resnet50_rs.m) causes a floating point exception (SIGFPE)
and core dump.
Root Cause
Two issues work together to cause the crash:
In
DFA_iteration-analysis.hpp, Cluster directives push an emptyiter_state_listintovalid_iteration_states_because there is nohandler for
DirectiveClass::Cluster. This causesnum_total_cases = 0and empty sub-cluster results.
In
CA_cost-analysis-engine.hpp,computation_delayis divided by online 344 before the existing zero-guard on line 365 kicks in. When
sub-cluster results are empty,
computation_delaystays 0, causingdivision by zero.
Note:
Cluster::GetOfs()returns 0 by design inDFA_directives.hpp,and the else branch in
DFA_cluster-unit.hpphas a//TODO: Handle this errorcomment indicating this case was known but unhandled.Fix
iter_state_listfor Cluster directives inDFA_iteration-analysis.hppcomputation_delayto before the firstdivision in
CA_cost-analysis-engine.hppReproducer
./maestro --HW_file='data/hw/accelerator_1.m'
--Mapping_file='data/mapping/Resnet50_rs.m'
--print_res=true
Crashes with SIGFPE before this fix, runs cleanly after.