`TensorSpan<T>.FlattenTo` is 80x slower with the .NET 10 package

### Description

There is around an ~80x performance regression in `System.Numerics.Tensors` `TensorSpan<T>.FlattenTo` (and `ReadOnlyTensorSpan<T>.FlattenTo`).

Prior to `10.0.0-preview.4.25258.110` the implementation copied to the destination in blocks but now copies per-element.

Reproduction + benchmark solution [here](https://github.com/user-attachments/files/23431014/FlattenToRegression.zip)
Run `FlattenTo.Before` and `FlattenTo.After` console apps to benchmark against each package version.

### Regression?

This performance issue was not present in the stable `9.0.10` release however the package was marked `Experimental` before .NET 10.

`10.0.0-preview.3.25171.5` is the last version before the rewrite #114927 that introduced `TensorOperation`.

### Data

| Method    | Mean     | Error    | StdDev   | Allocated |
|---------- |---------:|---------:|---------:|----------:|
| FlattenToBefore | 25.16 μs | 0.136 μs | 0.120 μs |      48 B |
| FlattenToAfter | 1.936 ms | 0.0013 ms | 0.0011 ms |         - |

[Before-Benchmarks.log](https://github.com/user-attachments/files/23431039/Before-Benchmarks-20251108-203558.log)

[After-Benchmarks.log](https://github.com/user-attachments/files/23431040/After-Benchmarks-20251108-203631.log)

### Analysis

#### release/9.0
https://github.com/dotnet/runtime/blob/9c2fb4b7f335da4aef8880786c238649a692dbb3/src/libraries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/ReadOnlyTensorSpan.cs#L768


#### release/10.0-rc2
https://github.com/dotnet/runtime/blob/9a6bee3849d40602eb6bc99a0d8f2203ea97044a/src/libraries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/ReadOnlyTensorSpan_1.cs#L352
https://github.com/dotnet/runtime/blob/9a6bee3849d40602eb6bc99a0d8f2203ea97044a/src/libraries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorOperation.cs#L156

cc: @tannergooding 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`TensorSpan<T>.FlattenTo` is 80x slower with the .NET 10 package #121463

Description

Regression?

Data

Analysis

release/9.0

release/10.0-rc2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Method	Mean	Error	StdDev	Allocated
FlattenToBefore	25.16 μs	0.136 μs	0.120 μs	48 B
FlattenToAfter	1.936 ms	0.0013 ms	0.0011 ms	-

TensorSpan<T>.FlattenTo is 80x slower with the .NET 10 package #121463

Description

Description

Regression?

Data

Analysis

release/9.0

release/10.0-rc2

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`TensorSpan<T>.FlattenTo` is 80x slower with the .NET 10 package #121463