Derivatives Of N-dimensional Function In Keras
Solution 1:
The main issue here is theoretical.
You're trying to minimize doutput_tensor/dx + doutput_tensor/dx. Your network just linearly combines the input x-s, however, with relu
and softplus
activations. Well, softplus adds a bit of twist to it, but that also has a monotonously increasing derivative. Therefore for the derivative to be as small as possible, the network will just scale the input up as much as possible with negative weights, to make the derivative as small as possible (that is, a really large negative number), at some point reaching NaN. I've reduced the first layer to 5 neurons and ran the model for 2 epochs, and the weights became:
('dense_1', [array([[ 1.0536456 , -0.32706773, 0.0072904 , 0.01986691, 0.9854533 ], [-0.3242108 , -0.56753945, 0.8098554 , -0.7545874 , 0.2716419 ]], dtype=float32), array([ 0.01207507, 0.09927677, -0.01768671, -0.12874101, 0.0210707 ], dtype=float32)])
('dense_2', [array([[-0.4332278 ], [ 0.6621602 ], [-0.07802075], [-0.5798264 ], [-0.40561703]], dtype=float32), array([0.11167384], dtype=float32)])
You can see that the second layer keeps a negative sign where the first has a positive, and vice versa. (Biases don't get any gradient because they don't contribute to the derivative. Well, not exactly true because of the softplus
but more or less.)
So you have to come up with a loss function that is not divergent towards extreme parameter values because this will not be trainable, it will just increase the values of weights until they NaN.
This was the version I ran:
import tensorflow as tf
from keras.models import *
from keras.layers import *
from keras import backend as K
def grad(f, x):
return Lambda(lambda u: K.gradients(u[0], u[1]), output_shape=[2])([f, x])
def ngrad(f, x, n):
if 0 == n:
return f
else:
return Lambda(lambda u: K.gradients(u[0], u[1]), output_shape=[2])([ngrad( f, x, n - 1 ), x])
def custom_loss(input_tensor,output_tensor):
def loss(y_true, y_pred):
_df1 = grad(output_tensor,input_tensor)
df1 = tf.Print( _df1, [ _df1 ], message = "df1" )
_df2 = grad(df1,input_tensor)
df2 = tf.Print( _df2, [ _df2 ], message = "df2" )
df = tf.add(df1,df2)
return df
return loss
input_tensor = Input(shape=(2,))
hidden_layer = Dense(5, activation='softplus')(input_tensor)
output_tensor = Dense(1, activation='softplus')(hidden_layer)
model = Model(input_tensor, output_tensor)
model.compile(loss=custom_loss(input_tensor,output_tensor), optimizer='sgd')
xy = np.mgrid[-3.0:3.0:0.1, -3.0:3.0:0.1].reshape( 2, -1 ).T
#print( xy )
model.fit(x=xy,y=xy, batch_size=10, epochs=2, verbose=2)
for layer in model.layers: print(layer.get_config()['name'], layer.get_weights())
Post a Comment for "Derivatives Of N-dimensional Function In Keras"