Code Review Questions
The following questions and observations were raised during automated review of bes-dev/MobileStyleGAN.pytorch.
- In
train.py (~line 46), raise "Unknown export format." raises a plain string, not an exception — this is a TypeError in Python 3 and silently swallows the intended error message. Should be raise ValueError(...).
- In
core/models/synthesis_network.py SynthesisNetwork.forward, _style = style[..., 3*i+1:3*i+4, :] is computed but m(hidden, style, _noise) passes the full style tensor — the per-layer slice is never used, breaking W+ style mixing in the teacher network.
- In
core/models/utils.py NoiseManager.__init__, the guard if not None in noise checks whether any element is None and skips the entire list if so — likely intended to be a per-element check (if noise[i] is not None), causing all cached noise to be silently dropped whenever one entry is None.
- In
core/distiller.py validation_step, self.student(style, noise=gt["noise"]) is called twice — the first result is discarded and the student is run again for loss. One forward pass is redundant and doubles inference cost during validation.
- In
core/distiller.py validation_epoch_end, KID is computed by concatenating local-device outputs only with a # TODO: add all_gather for distributed mode comment. With multi-GPU DDP training the metric is incorrect and the TODO has no associated tracking issue.
- In
core/distiller.py, teacher networks mapping_net and synthesis_net are only set to .eval() but never have requires_grad_(False) applied. Gradients still flow into teacher parameters during student training, wasting memory and compute.
- In
core/distiller.py compute_mean_style, the batch_size parameter is accepted but ignored — torch.randn(4096, ...) is hardcoded. The method always uses 4096 samples regardless of the argument.
- In
core/loss/distiller_loss.py loss_g, gt["img"] and gt["freq"] tensors from the teacher are not detached before computing L1/L2/perceptual losses, potentially letting gradients flow into the frozen teacher synthesis network.
- In
core/models/modules/modulated_conv2d.py, ModulatedConv2d.get_demodulation uses self.style_inv (a randomly-initialised buffer, never updated) instead of the actual style vector. Demodulation is therefore style-independent, diverging from the StyleGAN2 weight-demodulation spec.
- In
core/model_zoo.py, json.load(open(zoo_path)) opens a file handle that is never closed. Should use with open(zoo_path) as f: json.load(f) to avoid resource leaks.
- In
core/models/modules/idwt_upsample.py, the class is named IDWTUpsaplme (typo — letters transposed). It is referenced by this name in mobile_synthesis_block.py, making refactoring error-prone.
- In
core/loss/non_saturating_gan_loss.py loss_d, both fake.detach() and real.detach() are called even though fake is already detached by the caller in Distiller.discriminator_step. The redundant .detach() on real could silently prevent R1 gradient computation if reg_d is mistakenly called through loss_d.
- In
core/distiller.py make_sample, the else branch (pure noise path) constructs style = self.mapping_net(var).view(1, self.wsize, ...) with batch-size 1 unconditionally, regardless of self.cfg.batch_size, so training occasionally runs with mismatched batch sizes without error.
- In
core/models/modules/noise_injection.py NoiseInjection.forward, the noise buffer is registered on first call only (if not hasattr(self, "noise") and self.trace_model). Subsequent CoreML/ONNX traces reuse stale noise, but the buffer is never reset between exports, risking silent reuse of wrong-shape noise across multiple export calls.
- In
requirements.txt, pytorch-lightning==1.0.2 is pinned to a 2020 release. ModelCheckpoint's filepath kwarg was deprecated and removed in later PL versions; the pinned version also predates security fixes. There is no upper bound or lock file to prevent breakage from transitive dependency updates.
Generated by automated code review pipeline.
Code Review Questions
The following questions and observations were raised during automated review of bes-dev/MobileStyleGAN.pytorch.
train.py(~line 46),raise "Unknown export format."raises a plain string, not an exception — this is aTypeErrorin Python 3 and silently swallows the intended error message. Should beraise ValueError(...).core/models/synthesis_network.pySynthesisNetwork.forward,_style = style[..., 3*i+1:3*i+4, :]is computed butm(hidden, style, _noise)passes the fullstyletensor — the per-layer slice is never used, breaking W+ style mixing in the teacher network.core/models/utils.pyNoiseManager.__init__, the guardif not None in noisechecks whether any element isNoneand skips the entire list if so — likely intended to be a per-element check (if noise[i] is not None), causing all cached noise to be silently dropped whenever one entry isNone.core/distiller.pyvalidation_step,self.student(style, noise=gt["noise"])is called twice — the first result is discarded and the student is run again for loss. One forward pass is redundant and doubles inference cost during validation.core/distiller.pyvalidation_epoch_end, KID is computed by concatenating local-device outputs only with a# TODO: add all_gather for distributed modecomment. With multi-GPU DDP training the metric is incorrect and the TODO has no associated tracking issue.core/distiller.py, teacher networksmapping_netandsynthesis_netare only set to.eval()but never haverequires_grad_(False)applied. Gradients still flow into teacher parameters during student training, wasting memory and compute.core/distiller.pycompute_mean_style, thebatch_sizeparameter is accepted but ignored —torch.randn(4096, ...)is hardcoded. The method always uses 4096 samples regardless of the argument.core/loss/distiller_loss.pyloss_g,gt["img"]andgt["freq"]tensors from the teacher are not detached before computing L1/L2/perceptual losses, potentially letting gradients flow into the frozen teacher synthesis network.core/models/modules/modulated_conv2d.py,ModulatedConv2d.get_demodulationusesself.style_inv(a randomly-initialised buffer, never updated) instead of the actual style vector. Demodulation is therefore style-independent, diverging from the StyleGAN2 weight-demodulation spec.core/model_zoo.py,json.load(open(zoo_path))opens a file handle that is never closed. Should usewith open(zoo_path) as f: json.load(f)to avoid resource leaks.core/models/modules/idwt_upsample.py, the class is namedIDWTUpsaplme(typo — letters transposed). It is referenced by this name inmobile_synthesis_block.py, making refactoring error-prone.core/loss/non_saturating_gan_loss.pyloss_d, bothfake.detach()andreal.detach()are called even thoughfakeis already detached by the caller inDistiller.discriminator_step. The redundant.detach()onrealcould silently prevent R1 gradient computation ifreg_dis mistakenly called throughloss_d.core/distiller.pymake_sample, theelsebranch (pure noise path) constructsstyle = self.mapping_net(var).view(1, self.wsize, ...)with batch-size 1 unconditionally, regardless ofself.cfg.batch_size, so training occasionally runs with mismatched batch sizes without error.core/models/modules/noise_injection.pyNoiseInjection.forward, the noise buffer is registered on first call only (if not hasattr(self, "noise") and self.trace_model). Subsequent CoreML/ONNX traces reuse stale noise, but the buffer is never reset between exports, risking silent reuse of wrong-shape noise across multiple export calls.requirements.txt,pytorch-lightning==1.0.2is pinned to a 2020 release.ModelCheckpoint'sfilepathkwarg was deprecated and removed in later PL versions; the pinned version also predates security fixes. There is no upper bound or lock file to prevent breakage from transitive dependency updates.Generated by automated code review pipeline.