【AI】Softmax函数

 

Softmax

Advantage: Map the original output vector to a space of [0,1] Key word: Activation Function Usage: Normalize the output of a network to a probability distribution over predicted output classes. Used in multiclass classification problems.

  • Form
\[\begin{equation} P(y=j|x) = \frac{e^{x^Tw_j}}{\sum^K_{k=1}e^{x^Tw_k}} \end{equation}\]

This is a fully-connected layer. Output would be a probability of a category.

The reason to use it instead of sigmoid is that sigmoid would have a higher probability to classify the problem to either end.