Framework Comparison¶

DeepTrain vs Pytorch Lightning¶

Major

+ Resumable/interruptible

• PL cannot be stopped and resumed mid-epoch without disrupting train/val loop & callbacks, DT can.

• KeyboardInterrupt any time, inspect model & train state as needeed, and resume.

+ Tracking state

Much finer-grained tracking and control of internal train & data states

+ Flexible batch_size

Set batch_size as integer/fraction multiple of that on file

+ Faster SSD loading

Due to flexible batch_size (example)

+ Reproducibility

DT's builtin callback sets seeds periodically, enabling reproducible training on epoch or batch level, rather than only from very beginning of training (as with PL)

+ Custom batch logic

• Feed transformed batch to model arbitrary number of times before moving on to next

• Control when "next" is

+ Fully configurable saving & loading of model, optimizer, traingen, data generator

PL lacks attribute-specific configuring

+ Windowed timeseries

• variable number of windows per batch

• window start increments

• window overlaps

Misc

+ Model naming, image report generation -- ex1, ex2

+ class_weights support

+ Print metrics at batch-level

PL logs cumulatively, on epoch-level

+ Shuffling options

Shuffle batches and samples within (across) batches (docs)

+ Callbacks

on_save and on_load options for saving/loading callback object states

+ Binary classifier threshold calibration

(DT finds optimal automatically)

+ Best validation batch subset search (for e.g. ensembling)

+ Documentation

Methods and attributes are generally documented in greater scope & detail, with references to where each is used and what purpose they serve.

– TPU/Multi-GPU support

DT lacks builtin support (e.g. auto-conversion), but can still run if coded to

– Gradient accumulation

DT lacks builtin support, but can implement by overriding fit_fn

– Learning rate finder

– Learning rate scheduler

• This was a design decision as updating LR externally is slower than by coding it into the optimizer's own loop

• Can still update externally via callbacks

– Examples scope

PL showcases more examples across various domains

– ONNX export

DT lacks builtin support

– Support community

I am one, they are many. Bug reports, feature requests, etc. will be handled slower. Collaborators welcome.