Skip to content

S-B lost connection to cluster #355

@timtimb0t

Description

@timtimb0t

Argus

Scylla version: 2026.1.0~dev-20251024.8642629e8eab with build-id a6c027ab7126e113eee8cf7ea569e51eab00fb7b

S-B reported following error repeatedly during gce run:

< t:2025-10-26 03:33:34,905 f:base.py         l:235  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.142.15.241>: 2025/10/26 03:33:34 BATCH >>> [query statement="INSERT INTO scylla_bench.test (pk, ck, v) VALUES (?, ?, ?)" pk=1010 cks=91930..91939 avgValueSize=3581 consistency=QUORUM] || ERROR: gocql: host does not have a pool (potentially executed: false) attempts applied: 10

lots of retries:

< t:2025-10-26 03:33:52,020 f:base.py         l:235  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.142.15.241>: 2025/10/26 03:33:51 BATCH >>> [query statement="INSERT INTO scylla_bench.test (pk, ck, v) VALUES (?, ?, ?)" pk=1150 cks=95470..95479 avgValueSize=2519 consistency=QUORUM] || retry: attempt №8, sleep for 1s

20 min loader node reported:

2025-10-25T03:58:17.257 longevity-large-partitions-200k-pks-loader-node-cd917703-0-1  !INFO | sshd[5575] Timeout, client not responding from user scylla-test 10.142.0.119 port 40896

Also, according to logs main stress_cmd didn't start

Kernel Version: 6.14.0-1017-gcp

Extra information

Installation details

Cluster size: 6 nodes (n2-highmem-16)

Scylla Nodes used in this run:

- longevity-large-partitions-200k-pks-db-node-cd917703-0-9 (34.26.31.218 | 10.142.0.83) (shards: 14)


- longevity-large-partitions-200k-pks-db-node-cd917703-0-8 (35.231.131.127 | 10.142.0.11) (shards: 14)


- longevity-large-partitions-200k-pks-db-node-cd917703-0-7 (35.231.131.127 | 10.142.0.3) (shards: 14)


- longevity-large-partitions-200k-pks-db-node-cd917703-0-6 (34.148.129.36 | 10.142.15.229) (shards: 14)


- longevity-large-partitions-200k-pks-db-node-cd917703-0-5 (34.23.238.105 | 10.142.15.221) (shards: 14)


- longevity-large-partitions-200k-pks-db-node-cd917703-0-4 (34.74.13.142 | 10.142.15.217) (shards: -1)


- longevity-large-partitions-200k-pks-db-node-cd917703-0-3 (34.138.245.110 | 10.142.15.215) (shards: 14)


- longevity-large-partitions-200k-pks-db-node-cd917703-0-2 (34.26.58.231 | 10.142.15.211) (shards: 14)


- longevity-large-partitions-200k-pks-db-node-cd917703-0-12 (34.26.123.61 | 10.142.0.43) (shards: 14)


- longevity-large-partitions-200k-pks-db-node-cd917703-0-11 (35.190.151.243 | 10.142.0.85) (shards: 14)


- longevity-large-partitions-200k-pks-db-node-cd917703-0-10 (34.138.61.29 | 10.142.0.84) (shards: 14)


- longevity-large-partitions-200k-pks-db-node-cd917703-0-1 (35.237.220.23 | 10.142.15.205) (shards: 14)

OS / Image: https://www.googleapis.com/compute/v1/projects/scylla-images/global/images/scylla-2026-1-0-dev-x86-64-2025-10-25t01-39-36 (gce: N/A)

Test: longevity-large-partition-200k-pks-4days-gce-test
Test id: cd917703-9c0a-488e-ba7d-1dad94cbb122
Test name: scylla-master/tier1/longevity-large-partition-200k-pks-4days-gce-test

Test method: `longevity_large_partition_test.LargePartitionLongevityTest.test_large_partition_longevity`

Test config file(s):

Logs:

Jenkins job URL

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions