Rendered at 22:44:54 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
11 hours ago [-]
scott_s 5 days ago [-]
For disclosure, I've worked on TorchCodec. I'm happy to answer any questions!
weitendorf 11 hours ago [-]
> TorchCodec now has a dedicated WavDecoder for decoding WAV files. It bypasses FFmpeg entirely and reads WAV data directly, resulting in significantly faster decoding.
I'm working in this area recently and very keen to use this given the claimed performance benefits, but I tried all your links and didn't see any actual performance numbers. Do you have any to share?
IMO a fair performance benchmark for those not tied to the full pytorch stack would have ffmpeg and the wav already loaded into memory before execution. Given that torchcodec relies on the user-supplied ffmpeg installation I suspect that may not be the case for ffmpeg already, at least not by default.
I understand why meta wouldn't want to do this (then you are inevitably distributing exploitable security vulnerabilities in pytorch, because ffmpeg will probably always have them) but I've been statically linking fmpeg and keeping the binary in-memory while still using separate processes for different batches of audio, with I/O through UDS between the parent and ffmpeg; then the parent does VAD on the pcm on CPU before any further inference. My implementation for static linking is similar to the pattern in https://github.com/amenzhinsky/go-memexec#static-binary - would be interesting to see if this is possible in the pytorch/python ecosystem, or maybe it's already been done.
NicolasHug 7 hours ago [-]
We tend to be conservative with the benchmarks results that we make public, because all benchmarks are wrong and unfair - they depend too much on the machine capabilities, on software versions, and on the actual decoding patterns that are relevant for the user - none of which can be controlled or fairly captured in a benchmark.
That being said, we've got some benchmarks here, with a script that users can run on their own: https://github.com/meta-pytorch/torchcodec/pull/1474.
Note that TorchCodec relies on FFmpeg libraries, not the FFmpeg binary itself. The new WavDecoder is faster because it bypasses the FFmpeg libraries code, not because it bypasses loading the FFmpeg binary in memory.
Regarding static linking: we stick to dynamic linking to honor the L-GPL license of the FFmpeg libraries. TorchCodec is BSD-licensed, and statically linking against the L-GPL FFmpeg libs would not be compliant. Some libraries dynamically link against FFmpeg while still bundling the FFmpeg libraries as .so files in the Python wheel - whether that's still compliant is honestly unclear to me, so we prefer leaving it up for the user to supply their own FFmpeg via pure dynamic linking.
antixk 15 hours ago [-]
Hi,
In the past I have used NVVideoCodec and VPI for gpu accelerated decoding and processing. What would be torchcodec's appeal here? VPI already provides zero-copy interface with pytorch.
Thanks!
scott_s 8 hours ago [-]
1. A higher-level API that better integrates into the PyTorch ecosystem.
2. Ease of going back-and-forth between CPU and GPU; in our experience, there's still a lot of scenarios where CPU decoding makes sense.
What version of ffmpeg does this use? Last I tried torch tools used really outdated version of ffmpeg at the time of their release.
scott_s 8 hours ago [-]
The one you have installed. :) We don't distribute FFmpeg and instead find your installed version at runtime. We support versions 4 through 8.
alphatozeta 17 hours ago [-]
its really fast and the performance is great, but its really unfortunate it requires torch>=2.11
Too many NVIDIA libraries are still using 2.10 or an alpha version of 2.11 that doesn't have c++ methods used by torchcodec's underlying C++ code like use_blob and a few others. I had to fall back to ffmpeg-python unfortunately
scott_s 8 hours ago [-]
You can get older version of TorchCodec that work with older version of PyTorch, but it unfortunately will not have the new features (HDR video decoding; fast Wav decoding) in the latest release. See the compatbility matrix: https://github.com/meta-pytorch/torchcodec#compatibility-wit...
I'm working in this area recently and very keen to use this given the claimed performance benefits, but I tried all your links and didn't see any actual performance numbers. Do you have any to share?
IMO a fair performance benchmark for those not tied to the full pytorch stack would have ffmpeg and the wav already loaded into memory before execution. Given that torchcodec relies on the user-supplied ffmpeg installation I suspect that may not be the case for ffmpeg already, at least not by default.
I understand why meta wouldn't want to do this (then you are inevitably distributing exploitable security vulnerabilities in pytorch, because ffmpeg will probably always have them) but I've been statically linking fmpeg and keeping the binary in-memory while still using separate processes for different batches of audio, with I/O through UDS between the parent and ffmpeg; then the parent does VAD on the pcm on CPU before any further inference. My implementation for static linking is similar to the pattern in https://github.com/amenzhinsky/go-memexec#static-binary - would be interesting to see if this is possible in the pytorch/python ecosystem, or maybe it's already been done.
Note that TorchCodec relies on FFmpeg libraries, not the FFmpeg binary itself. The new WavDecoder is faster because it bypasses the FFmpeg libraries code, not because it bypasses loading the FFmpeg binary in memory.
Regarding static linking: we stick to dynamic linking to honor the L-GPL license of the FFmpeg libraries. TorchCodec is BSD-licensed, and statically linking against the L-GPL FFmpeg libs would not be compliant. Some libraries dynamically link against FFmpeg while still bundling the FFmpeg libraries as .so files in the Python wheel - whether that's still compliant is honestly unclear to me, so we prefer leaving it up for the user to supply their own FFmpeg via pure dynamic linking.
Thanks!
2. Ease of going back-and-forth between CPU and GPU; in our experience, there's still a lot of scenarios where CPU decoding makes sense.
3. Audio decoding support.
Please take a look at our tutorials to get a feel for what TorchCodec can do: https://meta-pytorch.org/torchcodec/stable/generated_example...
Up until recently, TorchCodec releases worked with one-and-only-one version of PyTorch. This is because up until recently, PyTorch did not have a stable ABI, and we needed to pin TorchCodec releases to PyTorch releases. But! PyTorch now has an excellent Stable ABI (https://github.com/meta-pytorch/torchcodec#compatibility-wit..., https://www.youtube.com/watch?v=HNdEmnvMvGE&t=1s) and TorchCodec is taking advantage of that since version 0.12.