Code Review Questions
Automated code review findings for bes-dev/MobileStyleGAN.pytorch.
train.py:47 — raise "Unknown export format." raises a string literal rather than an exception instance; in Python 3 this causes a TypeError instead of the intended error.
train.py:22 — ModelCheckpoint is constructed with the deprecated filepath keyword argument (renamed to dirpath/filename in PyTorch Lightning ≥1.2), which will break on current library versions.
core/distiller.py:76 — In validation_step, self.student is called twice with identical inputs; the first call's result is immediately discarded, doubling unnecessary inference cost.
core/distiller.py:121 — compute_mean_style accepts a batch_size parameter but always uses the hardcoded literal 4096 when sampling, so passing a different value has no effect.
core/distiller.py:100 — The else branch of make_sample generates a noise tensor of shape (self.wsize, style_dim) reshaped to (1, wsize, style_dim), always producing a batch of exactly 1 regardless of cfg.batch_size, causing a shape mismatch when the generator/discriminator steps expect batch_size samples.
core/distiller.py:84 — A TODO comment acknowledges that validation_epoch_end does not all_gather tensors across GPUs, meaning multi-GPU KID/loss aggregation is silently incorrect in distributed training.
core/models/synthesis_network.py:72 — Inside the layer loop, _style is correctly computed as the per-block W+ style slice, but style (the full un-sliced tensor) is passed to m(hidden, style, _noise) instead of _style, effectively ignoring W+ style mixing for the teacher synthesis network.
core/models/modules/mobile_synthesis_block.py:41 — Both the up (IDWTUpsample) and conv1 layers index the same style slot (style[:, 0, :]), so the upsampling modulation and the first conv share an identical style vector; given wsize() returns 3 this looks like a copy-paste oversight.
core/models/modules/idwt_upsample.py:9 — The class is named IDWTUpsaplme — a typo for IDWTUpsample — which makes the public API confusing and error-prone.
core/models/modules/multichannel_image.py:6 — The class is named MultichannelIamge — a typo for MultichannelImage — propagated throughout imports.
core/models/modules/ops/fused_act.py:30 — raise NotImplemented raises the built-in singleton (not an exception) instead of raise NotImplementedError(); on non-CUDA, non-CPU devices this silently produces a TypeError rather than the intended error.
core/models/modules/ops/upfirdn2d.py:18 — Same raise NotImplemented bug as in fused_act.py: should be raise NotImplementedError().
core/model_zoo.py:6 — json.load(open(zoo_path)) opens the file without a with statement, leaving the file descriptor unclosed if an exception is raised.
core/loss/perceptual_loss.py:14 — VGG16 is loaded with the deprecated pretrained=True kwarg; newer torchvision versions require weights=VGG16_Weights.DEFAULT and will emit warnings or break.
core/models/inception_v3.py:118 — models.inception_v3(pretrained=True) and models.inception_v3(pretrained=False, ...) use the deprecated pretrained keyword, which is removed in recent torchvision versions.
core/loss/non_saturating_gan_loss.py:54 — real.requires_grad = True mutates the incoming tensor in-place inside reg_d, which can silently corrupt the gradient graph of the caller and cause unexpected side-effects in subsequent training steps.
core/utils.py:11 — In tensor_to_img, if to_numpy=False is passed with the default rgb2bgr=True, the code attempts img[:, :, ::-1] on a PyTorch tensor, which raises a RuntimeError because negative-step slicing is unsupported on tensors.
evaluate_fid.py:102 — A BatchNorm2d layer is trained on the dataset and then used to normalize images before Inception embedding, which is a non-standard normalisation method that will produce FID scores incomparable to any published benchmark.
core/models/modules/noise_injection.py:12 — In trace_model mode, self.register_buffer("noise", noise) is called every forward pass if the buffer does not yet exist, but re-running register_buffer with a new value after the first call silently overwrites the buffer — the guard not hasattr(self, "noise") is insufficient because hasattr will be True after the first call even if the buffer was registered on a different input size.
core/distiller.py:155 — In to_onnx, after tracing the mapping network with var of shape (wsize, style_dim) for W+ mode, the Wrapper is initialised with self.student(style_tmp) inside its __init__, which runs a full forward pass (including noise registration) before the ONNX trace starts, potentially capturing stale buffer state.
Code Review Questions
Automated code review findings for
bes-dev/MobileStyleGAN.pytorch.train.py:47—raise "Unknown export format."raises a string literal rather than an exception instance; in Python 3 this causes aTypeErrorinstead of the intended error.train.py:22—ModelCheckpointis constructed with the deprecatedfilepathkeyword argument (renamed todirpath/filenamein PyTorch Lightning ≥1.2), which will break on current library versions.core/distiller.py:76— Invalidation_step,self.studentis called twice with identical inputs; the first call's result is immediately discarded, doubling unnecessary inference cost.core/distiller.py:121—compute_mean_styleaccepts abatch_sizeparameter but always uses the hardcoded literal4096when sampling, so passing a different value has no effect.core/distiller.py:100— Theelsebranch ofmake_samplegenerates a noise tensor of shape(self.wsize, style_dim)reshaped to(1, wsize, style_dim), always producing a batch of exactly 1 regardless ofcfg.batch_size, causing a shape mismatch when the generator/discriminator steps expectbatch_sizesamples.core/distiller.py:84— A TODO comment acknowledges thatvalidation_epoch_enddoes notall_gathertensors across GPUs, meaning multi-GPU KID/loss aggregation is silently incorrect in distributed training.core/models/synthesis_network.py:72— Inside the layer loop,_styleis correctly computed as the per-block W+ style slice, butstyle(the full un-sliced tensor) is passed tom(hidden, style, _noise)instead of_style, effectively ignoring W+ style mixing for the teacher synthesis network.core/models/modules/mobile_synthesis_block.py:41— Both theup(IDWTUpsample) andconv1layers index the same style slot (style[:, 0, :]), so the upsampling modulation and the first conv share an identical style vector; givenwsize()returns 3 this looks like a copy-paste oversight.core/models/modules/idwt_upsample.py:9— The class is namedIDWTUpsaplme— a typo forIDWTUpsample— which makes the public API confusing and error-prone.core/models/modules/multichannel_image.py:6— The class is namedMultichannelIamge— a typo forMultichannelImage— propagated throughout imports.core/models/modules/ops/fused_act.py:30—raise NotImplementedraises the built-in singleton (not an exception) instead ofraise NotImplementedError(); on non-CUDA, non-CPU devices this silently produces aTypeErrorrather than the intended error.core/models/modules/ops/upfirdn2d.py:18— Sameraise NotImplementedbug as infused_act.py: should beraise NotImplementedError().core/model_zoo.py:6—json.load(open(zoo_path))opens the file without awithstatement, leaving the file descriptor unclosed if an exception is raised.core/loss/perceptual_loss.py:14— VGG16 is loaded with the deprecatedpretrained=Truekwarg; newer torchvision versions requireweights=VGG16_Weights.DEFAULTand will emit warnings or break.core/models/inception_v3.py:118—models.inception_v3(pretrained=True)andmodels.inception_v3(pretrained=False, ...)use the deprecatedpretrainedkeyword, which is removed in recent torchvision versions.core/loss/non_saturating_gan_loss.py:54—real.requires_grad = Truemutates the incoming tensor in-place insidereg_d, which can silently corrupt the gradient graph of the caller and cause unexpected side-effects in subsequent training steps.core/utils.py:11— Intensor_to_img, ifto_numpy=Falseis passed with the defaultrgb2bgr=True, the code attemptsimg[:, :, ::-1]on a PyTorch tensor, which raises aRuntimeErrorbecause negative-step slicing is unsupported on tensors.evaluate_fid.py:102— ABatchNorm2dlayer is trained on the dataset and then used to normalize images before Inception embedding, which is a non-standard normalisation method that will produce FID scores incomparable to any published benchmark.core/models/modules/noise_injection.py:12— Intrace_modelmode,self.register_buffer("noise", noise)is called every forward pass if the buffer does not yet exist, but re-runningregister_bufferwith a new value after the first call silently overwrites the buffer — the guardnot hasattr(self, "noise")is insufficient becausehasattrwill beTrueafter the first call even if the buffer was registered on a different input size.core/distiller.py:155— Into_onnx, after tracing the mapping network withvarof shape(wsize, style_dim)for W+ mode, theWrapperis initialised withself.student(style_tmp)inside its__init__, which runs a full forward pass (including noise registration) before the ONNX trace starts, potentially capturing stale buffer state.