A naive question about negative weights and labeling in ML

Hi All,

During a discussion on treating events with negative weights in ML I started wondering whether there is an intrinsic connection between event weights and the labeling.

Say if I label the signal events with negative weights as “background”, should I expect to have similar results as the ones obtained by labeling all signal events as “signal” with proper weights (both negative and positive)?

My naive thinking is that it should give me results along the same directions but I am not sure whether this is always the case. I can try things out and see what will happen but I wonder whether there are simple arguments (maybe for certain algorithms this is true but not for ours?).

Any insights or thoughts?

Thanks a lot!

Bing (A machine learning noob)

1 Like

The short answer is yes, it will do something similar, but not exactly the same.

The event weight is just a multiplicative factor that’s applied to the loss function gradient before error propagation, so if we define x as the network output, y as the target value, and y' \equiv 1-y as the inverted target value, you’re asking whether

\nabla_{x} L(x, y') \stackrel{?}{=} - \nabla_{x} L(x, y)

The binary cross-entropy loss for one event is

L(x,y) = - y \log(x) - (1-y) \log(1-x)

and the gradient is

\nabla_{x} L(x, y) = - \frac{y}{x} + \frac{1-y}{1-x}

So your question is if

\nabla_{x} L(x, y') \equiv \nabla_{x} L(x, 1-y) = - \frac{1-y}{x} + \frac{y}{1-x} \stackrel{?}{=} \frac{y}{x} - \frac{1-y}{1-x} = - \nabla_{x} L(x, y)

You can do the same thing for other loss functions, but in general answer is that it’s not really the same thing.


The bigger question is why you’d want to use this trick, rather than just giving the NN the negative weights directly.

I haven’t tinkered around with giving an NN negative weights, but it should work, and I can convince myself that negative weights will do roughly what they are supposed to do by canceling the gradient in regions where the unweighted samples might overestimate the signal to background ratio.

Hopefully someone who has actually fought with this problem can tell me why training with negative weights is a terrible idea.

1 Like

Isn’t this very common due to negative weights from generator? It seems we have some troubles with that and it would be nice to have some suggestions how to handle them.

It’s a very common problem. I edited my post above to clarify that I’m questioning whether we need to use @biliu’s trick when we can just give the network negative weights.

I’ve heard of a few people who just gave their NN negative weights and saw no big problems, but this is something someone can test. Just treat the weighting (for the training sample) as a hyperparameter and see what gives the best discrimination on the validation set (the weights are needed on the validation set of course, otherwise it’s cheating).

Has anybody been testing this?

I have tried training with several setups of weights in my neural network.
Especially using absolute weights yields good results.
Using negative weights in validation leads to terrible agreement though.
I have very few signal events and I assume that using negative weights when plotting the response can have significant impact.
Did anybody have the same experience or even found a recommendable way to handle negative weights for model training and validation?

You might want to open a new post for this specific problem.

It might also help to clarify which setups yield what kind of results: you have a training and validation sample, and you have a few ways to treat the weights in both.

Anyway, a sample with absolute weights is modeling different physics from a sample with the generator assigned one, so I’d expect poor agreement if you aren’t looking at the same signal between training and validation. But even if you use absolute weights consistently, I doubt that’s actually the signal you’re looking for: if you “cheat” and train and validate the NN on a signal that is easier to classify, you’re going to end up with a lousy classifier for the real signal.

Thank you!

I will open a new post