Hi~
I'm reading your code and find that the activation function of mlp is tanh, and the output activation is softplus
pred_fun, loglike_fun, parser = build_mlp(layer_specs, output_activation=softplus)
def build_mlp(layer_sizes, activation=np.tanh, output_activation=lambda x: x):
......
def predict(weights, X):
cur_X = copy(X.T)
for layer in range(len(layer_sizes) - 1):
cur_W = parser.get(weights, ('weights', layer))
cur_B = parser.get(weights, ('biases', layer))
cur_Z = np.dot(cur_X, cur_W) + cur_B
cur_X = activation(cur_Z)
return output_activation(cur_Z.T)
def log_likelihood(weights, X, y):
y_hat = predict(weights, X)
return mse(y.T, y_hat.T)
the output of tanh ranges from -1 to 1, so after the output activation(softplus), the final output is also no greater than 1, but isn't average path length can sometimes be greater than 1?
Hi~
I'm reading your code and find that the activation function of mlp is tanh, and the output activation is softplus
the output of tanh ranges from -1 to 1, so after the output activation(softplus), the final output is also no greater than 1, but isn't average path length can sometimes be greater than 1?