Skip to content

v0.3.0 - massive memory and throughput improvements

Choose a tag to compare

@bghira bghira released this 04 Sep 01:56
· 143 commits to main since this release
2530bbd
  • reimplemented huggingface processor with focus on memory reduction and throughput saturation, can hit 5000 captions/sec
  • reimplemented webdatasets processor to use the webshart library for massive throughput boost and memory use reduction thanks to the spicy Rust implementation

overall, the orchestrator and worker both will use about 0.5GiB of memory to run, as opposed to several GiB of memory.

added a mock_results mode for the dataset loaders and caption generator to assist in rapid development iteration.

What's Changed

  • hf URL-based datasets memory leak and performance fix for very-large datasets by @bghira in #36

Full Changelog: v0.2.4...v0.3.0