-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
Description
There is around an ~80x performance regression in System.Numerics.Tensors TensorSpan<T>.FlattenTo (and ReadOnlyTensorSpan<T>.FlattenTo).
Prior to 10.0.0-preview.4.25258.110 the implementation copied to the destination in blocks but now copies per-element.
Reproduction + benchmark solution here
Run FlattenTo.Before and FlattenTo.After console apps to benchmark against each package version.
Regression?
This performance issue was not present in the stable 9.0.10 release however the package was marked Experimental before .NET 10.
10.0.0-preview.3.25171.5 is the last version before the rewrite #114927 that introduced TensorOperation.
Data
| Method | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|
| FlattenToBefore | 25.16 μs | 0.136 μs | 0.120 μs | 48 B |
| FlattenToAfter | 1.936 ms | 0.0013 ms | 0.0011 ms | - |
Analysis
release/9.0
Line 768 in 9c2fb4b
| public void FlattenTo(scoped Span<T> destination) |
release/10.0-rc2
Line 352 in 9a6bee3
| public void FlattenTo(scoped Span<T> destination) |
runtime/src/libraries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorOperation.cs
Line 156 in 9a6bee3
| public static void Invoke<TOperation, TArg, TResult>(in ReadOnlyTensorSpan<TArg> x, in Span<TResult> destination) |
cc: @tannergooding