CrossEntropyLoss vs BCELoss

Machine Learning/ETC 2022. 11. 1. 23:00

1. Difference in Purpose

CrossEntropyLoss는 일반적으로 multi-class classification을 위해 사용되며, binary classifacation도 가능함.
BCE(Binary Cross Entropy)는 binary classification에 사용됨.
그렇다면 모든 경우에 CrossEntrophyLoss 를 사용하면 안되는걸까? 답은 No (3. 참조)

2. Diffrence in detailed implementation

CrossEntropyLoss 가 binary classification으로 사용될때, 2개의 output features을 필요로 한다.

Logits=[-2.34, 3.45], Argmax(logits) -> class 1

BCELoss가 binary classifciation에 사용될 때, 1개의 output features를 필요로 한다.

logits=[-2.34] < 0 -> class 0
logits=[3.45] > 0 -> class 1

3. Difference in purpose (in practice)

가장 큰 차이점은 만약 output을 확률(probability)의 형태로 생성하고자 한다면, BCE를 사용해야 한다는 점이다.
- 그 이유는 BCELoss를 loss function으로 사용하는 동안 single output을 처리하기 위해 sigmoid를 사용할 수 있기 때문이다.

σ(-2.34)= p_1 = 8.79% (probability of ouput being class 1)
$p_0$ = 1-8.79% = 91.21%
-> likely to class 0

σ(3.45)= $p_1$ = 96.9% (likely to class 1)

CrossEntropyLoss를 사용하는 동안 2 output features에 대해서는 sigmoid를 사용할 수 없다.

σ([-2.34, 3.45])=[8.79%, 96.9%] -> not makse sence.
이는 $p_0$ = 8.79%를 의미하지 않으며 (이 경우 $p_1$ = 1-8.79%=91.21%), $p_1$=96.9%를 의미하지도 않음

CrossEntropyLoss의 경우, 확률을 얻기 위해서는 softmax가 더 적합하다. 하지만, 오직 2개의 값만이 존재하는 binary classification의 경우에는, softmax의 output은 수식에 근거해서 보면 대부분의 경우 [0.1%, 99.9%]와 같이 나온다. 이는 의미있는 확률을 나타내지 않는다.

softmax([-2.34, 3.45]) = [99.9%, 0.1%].

따라서, softmax는 오직 multi-class classification(multi-label X)에 적합하다.

결론: Sigmoid vs Softmax

Output에 Sigmoid를 적용하는건 binary classification에 적합하다. (multi-label을 포함)

Output에 Softmax를 적용하는건 multi-class classificaition에 적합하다.

For BCE, use BCEWithLogitsLoss(); for CE, use CrossEntropyLoss()

BCELoss() 대신 BCEWithLogitsLoss()를 사용하라, BCEWithLogitsLoss()는 내부적으로 sigmoid layer를 포함하고 있기 때문에, logits을 직접적으로 loss에 전달할 수있다.
- Technical reason [reference]

This version is more numerically stable than using a plain Sigmoid followed by a BCELoss as, by combining the operations into one layer, we take advantage of the log-sum-exp trick for numerical stability.
CrossEntropyLoss() 는 내부적으로 softmax layer를 포함한다.

pos_weight

torch.nn.BCEWithLogitsLoss(weight=None, size_average=None, reduce=None, reduction='mean', pos_weight=None)

Weight of the Positive Answer.
$p_c > 1$ 은 recall을 증가시키고, $p_c < 1$ 은 precision을 증가 시킨다.
예를 들어, 하나의 class에 대해 100개의 positive와 300개의 negative samples 이 있다면, 그 class에 적합한 pose_weight은 $\frac{300}{100} = 3$이다. 이렇게 하면 loss는 마치 dataset이 3 x 100 = 300 positive sample을 가진 것처럼 동작할 것이다.

Eg. 1. 
>>> target = torch.ones([10, 64], dtype=torch.float32)  # 64 classes, batch size = 10
>>> output = torch.full([10, 64], 1.5)  # A prediction (logit)
>>> pos_weight = torch.ones([64])  # All weights are equal to 1
>>> criterion = torch.nn.BCEWithLogitsLoss(pos_weight=pos_weight)
>>> criterion(output, target)  # -log(sigmoid(1.5))
tensor(0.20...)

Eg. 2.
criterion_weighted = nn.BCEWithLogitsLoss(pos_weight=(y==0.).sum()/y.sum())

Referecne

https://medium.com/dejunhuang/learning-day-57-practical-5-loss-function-crossentropyloss-vs-bceloss-in-pytorch-softmax-vs-bd866c8a0d23

https://pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html

'Machine Learning > ETC' 카테고리의 다른 글

What is Congestion Control ( 혼잡 제어 )? (0)	2020.01.19
[Review] Pcc vivace: Online-learning congestion control. (0)	2020.01.08
[Review] Congestion-Control Throwdown (0)	2020.01.07
[Review] PCC: Re-architecting Congestion Control for Consistent High Performance. (0)	2020.01.05
[Review] A Deep Reinforcement Learning Perspective on Internet Congestion Control - 인터넷 혼잡제어 관점에서의 강화학습 (0)	2019.12.30

ABOUT ME

엘사 테크 블로그 엘사 테크 블로그

1. Difference in Purpose

2. Diffrence in detailed implementation

3. Difference in purpose (in practice)

결론: Sigmoid vs Softmax

For BCE, use BCEWithLogitsLoss(); for CE, use CrossEntropyLoss()

pos_weight

Referecne

'Machine Learning > ETC' 카테고리의 다른 글

티스토리툴바

ABOUT ME

1. Difference in Purpose

2. Diffrence in detailed implementation

3. Difference in purpose (in practice)

결론: Sigmoid vs Softmax

For BCE, use BCEWithLogitsLoss(); for CE, use CrossEntropyLoss()

pos_weight

Referecne

'Machine Learning > ETC' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바