r""" Decide whether the mini-batch stats should be used for normalization rather than the buffers. Mini-batch stats are used in training mode, and in eval mode when buffers are None. """ if self.training: bn_training = True else: bn_training = (self.running_mean isNone) and (self.running_var isNone)
r""" Buffers are only updated if they are to be tracked and we are in training mode. Thus they only need to be passed when the update should occur (i.e. in training mode when they are tracked), or when buffer stats are used for normalization (i.e. in eval mode when buffers are not None). """ return F.batch_norm( input, # If buffers are not to be tracked, ensure that they won't be updated self.running_mean ifnot self.training or self.track_running_stats elseNone, self.running_var ifnot self.training or self.track_running_stats elseNone, self.weight, self.bias, bn_training, exponential_average_factor, self.eps, )
简单来说,BatchNorm 的 E(x) 和 Var(x) 的来源有三种:
当你特意地关闭更新开关时:来自于当前的 mini-batch。
当你没做什么特别的事情:来自于一个可学习参数,这个参数在所有的 mini-batch 上平滑更新。
x^←(1−m)x^+mx
当你没做什么特别的事情,并且在推理阶段时:来自于训练阶段的可学习参数。
所以 BatchNorm 并不是简单的将数据以当前 batch 做标准化,通常情况会在相对更 global 的均值方差上归一化。
我们是如何锁住 Backbone 的
一般我们是这么做的:
1 2 3 4 5 6 7 8 9 10 11
# set lr as 0 opti=Optimizer([ { "params": "<params you want to freeze>", "lr": 0, } ])
# or set require_grad, which is more common in practice. for param in model.parameters(): param.requires_grad_(False)