v0.3.0 - massive memory and throughput improvements
- reimplemented huggingface processor with focus on memory reduction and throughput saturation, can hit 5000 captions/sec
- reimplemented webdatasets processor to use the webshart library for massive throughput boost and memory use reduction thanks to the spicy Rust implementation
overall, the orchestrator and worker both will use about 0.5GiB of memory to run, as opposed to several GiB of memory.
added a mock_results mode for the dataset loaders and caption generator to assist in rapid development iteration.
What's Changed
Full Changelog: v0.2.4...v0.3.0