Framework Comparison

DeepTrain vs Pytorch Lightning

Major

+ Resumable/interruptible
• PL cannot be stopped and resumed mid-epoch without disrupting train/val loop & callbacks, DT can.
KeyboardInterrupt any time, inspect model & train state as needeed, and resume.
+ Tracking state
Much finer-grained tracking and control of internal train & data states
+ Flexible batch_size
Set batch_size as integer/fraction multiple of that on file
+ Faster SSD loading
Due to flexible batch_size (example)
+ Reproducibility
DT's builtin callback sets seeds periodically, enabling reproducible training on epoch or batch level, rather than only from very beginning of training (as with PL)
+ Custom batch logic
• Feed transformed batch to model arbitrary number of times before moving on to next
• Control when "next" is
+ Fully configurable saving & loading of model, optimizer, traingen, data generator
PL lacks attribute-specific configuring
+ Windowed timeseries
• variable number of windows per batch
• window start increments
• window overlaps

Misc

+ Model naming, image report generation -- ex1, ex2
+ class_weights support
+ Print metrics at batch-level
PL logs cumulatively, on epoch-level
+ Shuffling options
Shuffle batches and samples within (across) batches (docs)
+ Callbacks
on_save and on_load options for saving/loading callback object states
+ Binary classifier threshold calibration
(DT finds optimal automatically)
+ Best validation batch subset search (for e.g. ensembling)
+ Documentation
Methods and attributes are generally documented in greater scope & detail, with references to where each is used and what purpose they serve.

TPU/Multi-GPU support
DT lacks builtin support (e.g. auto-conversion), but can still run if coded to
Gradient accumulation
DT lacks builtin support, but can implement by overriding fit_fn
Learning rate finder
Learning rate scheduler
• This was a design decision as updating LR externally is slower than by coding it into the optimizer's own loop
• Can still update externally via callbacks
Examples scope
PL showcases more examples across various domains
ONNX export
DT lacks builtin support
Support community
I am one, they are many. Bug reports, feature requests, etc. will be handled slower. Collaborators welcome.