What Are _get_hyper And _set_hyper In Tensorflow Optimizers?
Solution 1:
They enable setting and getting Python literals (int, str, etc), callables, and tensors. Usage is for convenience and consistency: anything set via _set_hyper can be retrieved via _get_hyper, avoiding repeating boilerplate code. I've implemented Keras AdamW in all major TF & Keras versions, and will use it as reference.
t_curis atf.Variable. Each time we "set" it, we must invokeK.set_value; if we doself.t_cur=5, this will destroytf.Variableand wreck optimizer functionality. If instead we usedmodel.optimizer._set_hyper('t_cur', 5), it'd set it appropriately - but this requires for it to have been defined viaset_hyperpreviously.Both
_get_hyper&_set_hyperenable programmatic treatment of attributes - e.g., we can make a for-loop with a list of attribute names to get or set using just_get_hyperand_set_hyper, whereas otherwise we'd need to code conditionals and typechecks. Also,_get_hyper(name)requires thatnamewas previously set viaset_hyper._get_hyperenables typecasting viadtype=. Ex:beta_1_tin default Adam is cast to same numeric type asvar(e.g. layer weight), which is required for some ops. Again a convenience, as we could typecast manually (math_ops.cast)._set_hyperenables the use of_serialize_hyperparameter, which retrieves the Python values (int,float, etc) of callables, tensors, or already-Python values. Name stems from the need to convert tensors and callables to Pythonics for e.g. pickling or json-serializing - but can be used as convenience for seeing tensor values in Graph execution.Lastly; everything instantiated via
_set_hypergets assigned tooptimizer._hyperdictionary, which is then iterated over in_create_hypers. Theelsein the loop casts all Python numerics to tensors - so_set_hyperwill not createint,float, etc attributes. Worth noting is theaggregation=kwarg, whose documentation reads: "Indicates how a distributed variable will be aggregated". This is the part a bit more than "for convenience" (lots of code to replicate)._set_hyperhas a limitation: does not allow instantiatingdtype. Ifadd_weightapproach in_create_hypersis desired with dtype, then it should be called directly.
When to use vs. not use: use if the attribute is used by the optimizer via TensorFlow ops - i.e. if it needs to be a tf.Variable. For example, epsilon is set regularly, as it's never needed as a tensor variable.
Post a Comment for "What Are _get_hyper And _set_hyper In Tensorflow Optimizers?"