Skip to content Skip to sidebar Skip to footer

Tensorflow Sampled Softmax Loss Correct Usage

In a classification problem with many classes, tensorflow docs suggests using sampled_softmax_loss over a simple softmax to reduce training runtime. According to the docs and sourc

Solution 1:

In your softmax layer you are multiplying your network predictions, which have dimension (num_classes,) by your w matrix which has dimension (num_classes, num_hidden_1), so you end up trying to compare your target labels of size (num_classes,) to something that is now size (num_hidden_1,). Change your tiny perceptron to output layer_1 instead, then change the definition of your cost. The code below might do the trick.

def tiny_perceptron(x, weights, biases):
    layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
    layer_1 = tf.nn.relu(layer_1)
    return layer_1

layer_1 = tiny_perceptron(x, weights, biases)
loss_function = tf.reduce_mean(tf.nn.sampled_softmax_loss(
                     weights=weights['h1'],
                     biases=biases['b1'],
                     labels=labels,
                     inputs=layer_1,
                     num_sampled=num_sampled,
                     num_true=num_true,
                     num_classes=num_classes))

When you train your network with some optimizer, you will tell it to minimize loss_function, which should mean that it will adjust both sets of weights and biases.


Solution 2:

The key point is to pass right shape of weight, bias, input and label. The shape of weight passed to sampled_softmax is not the the same with the general situation. For example, logits = xw + b, call sampled_softmax like this: sampled_softmax(weight=tf.transpose(w), bias=b, inputs=x), NOT sampled_softmax(weight=w, bias=b, inputs=logits)!! Besides, label is not one-hot representation. if your labels are one-hot represented, pass labels=tf.reshape(tf.argmax(labels_one_hot, 1), [-1,1])


Post a Comment for "Tensorflow Sampled Softmax Loss Correct Usage"