Inspecting gradients

This example assumes you’ve read advanced.ipynb, and covers:

  • Inspecting gradients per layer
  • Estimating good values of gradient clipping threshold
[1]:
import deeptrain
deeptrain.util.misc.append_examples_dir_to_sys_path()

from utils import make_autoencoder, init_session
from utils import AE_CONFIGS as C

from tensorflow.keras.optimizers import Adam
import numpy as np

Configure training

[2]:
C['traingen']['iter_verbosity'] = 0  # silence iteration printing since currently irrelevant
tg = init_session(C, make_autoencoder)
WARNING: multiple file extensions found in `path`; only .npy will be used
Discovered 48 files with matching format
48 set nums inferred; if more are expected, ensure file names contain a common substring w/ a number (e.g. 'train1.npy', 'train2.npy', etc)
DataGenerator initiated

WARNING: multiple file extensions found in `path`; only .npy will be used
Discovered 36 files with matching format
36 set nums inferred; if more are expected, ensure file names contain a common substring w/ a number (e.g. 'train1.npy', 'train2.npy', etc)
DataGenerator initiated

NOTE: will exclude `labels` from saving when `input_as_labels=True`; to keep 'labels', add '{labels}'to `saveskip_list` instead
Preloading superbatch ... WARNING: multiple file extensions found in `path`; only .npy will be used
Discovered 48 files with matching format
................................................ finished, w/ 6144 total samples
Train initial data prepared
Preloading superbatch ... WARNING: multiple file extensions found in `path`; only .npy will be used
Discovered 36 files with matching format
.................................... finished, w/ 4608 total samples
Val initial data prepared
Logging ON; directory (new): C:\deeptrain\examples\dir\logs\M3__model-nadam__min999.000

Expected gradient norm estimation

We iterate over entire train dataset, gathering gradients from every fit and computing and storing their L2-norms.

[3]:
grad_norms, *_ = tg.gradient_norm_over_dataset()
Computing gradient l2-norm over datagen batches, in inference mode
WARNING: datagen states will be reset
'.' = slice processed, '|' = batch processed

Data set_nums shuffled

||||||||||10||||||||||20||||||||||30||||||||||40||||||||
Data set_nums shuffled


GRADIENT L2-NORM (AVG, MAX) = (0.003, 0.003), computed over 48 batches, 48 datagen updates
../../_images/examples_introspection_gradients_5_1.png

We can now restart training with setting optimizer clipnorm to 1.5x average value, avoiding extreme gradients while not clipping most standard gradients

[4]:
C['model']['optimizer'] = Adam(clipnorm=1.5 * np.mean(grad_norms))
tg = init_session(C, make_autoencoder)
tg.epochs = 1  # train just for demo
tg.train()
WARNING: multiple file extensions found in `path`; only .npy will be used
Discovered 48 files with matching format
48 set nums inferred; if more are expected, ensure file names contain a common substring w/ a number (e.g. 'train1.npy', 'train2.npy', etc)
DataGenerator initiated

WARNING: multiple file extensions found in `path`; only .npy will be used
Discovered 36 files with matching format
36 set nums inferred; if more are expected, ensure file names contain a common substring w/ a number (e.g. 'train1.npy', 'train2.npy', etc)
DataGenerator initiated

NOTE: will exclude `labels` from saving when `input_as_labels=True`; to keep 'labels', add '{labels}'to `saveskip_list` instead
Preloading superbatch ... WARNING: multiple file extensions found in `path`; only .npy will be used
Discovered 48 files with matching format
................................................ finished, w/ 6144 total samples
Train initial data prepared
Preloading superbatch ... WARNING: multiple file extensions found in `path`; only .npy will be used
Discovered 36 files with matching format
.................................... finished, w/ 4608 total samples
Val initial data prepared
Logging ON; directory (new): C:\deeptrain\examples\dir\logs\M4__model-Adam__min999.000

Data set_nums shuffled


_____________________
 EPOCH 1 -- COMPLETE 



Validating...
TrainGenerator state saved
Model report generated and saved
Best model saved to C:\deeptrain\examples\dir\models\M4__model-Adam__min.152
TrainGenerator state saved
Model report generated and saved
../../_images/examples_introspection_gradients_7_1.png
Training has concluded.

Complete gradient sum

This time we run a cumulative sum over actual gradient tensors, preserving and returning their shapes, allowing per-weight visualization

[5]:
plot_kw = {'h': 2}  # double default height since we expect many weights
grads_sum, *_ = tg.gradient_sum_over_dataset(plot_kw=plot_kw)
Computing gradients sum over datagen batches, in inference mode
WARNING: datagen states will be reset
'.' = slice processed, '|' = batch processed

Data set_nums shuffled

||||||||||10||||||||||20||||||||||30||||||||||40||||||||
Data set_nums shuffled


GRADIENTS SUM computed over 48 batches, 48 datagen updates:
../../_images/examples_introspection_gradients_9_1.png

We can use the mean of grads_sum to set clipvalue instead of clipnorm.