Skip to content

Large files are very slow to read locally #219

@yogevyuval

Description

@yogevyuval

Trying to read a 15M gzipped file (200M uncompressed) shows a strange behaviour, with logstash 7.9.2. I tried reading plain files and it doesnt seem to have a difference.

  1. It takes about a minute to read the file entirely (reading from an EC2 machine on the same region)
  2. Trying locally with python or bash, takes about 5 seconds to read the entire file.

After looking at the debug logs it seems that the download part is really fast, and the local processing is very slow.

[2021-01-12T14:45:50,258][DEBUG][logstash.inputs.s3] Processing {:bucket=>"test", :key=>"logs/sample1.gz"}
[2021-01-12T14:45:50,259][DEBUG][logstash.inputs.s3] Downloading remote file {:remote_key=>"logs/sample1.gz", :local_filename=>"/tmp/logstash/sample1.gz"}
[2021-01-12T14:45:50,572][DEBUG][logstash.inputs.s3] Processing file {:filename=>"/tmp/logstash/sample1.gz"}
[2021-01-12T14:46:40,435][DEBUG][logstash.inputs.s3] Processing {:bucket=>"test", :key=>"logs/sample2.gz"}

I tried looking into the source code of the plugin, removing some of the code to be able to pinpoint the problem. Eventually I removed almost every line in the process_local_log function, keeping only the codec (which is plain by default) decoding, and the queue << event line. It seems that this is the line that is taking most of the time.

Any idea what could be the cause of this? This makes the plugin almost unusable in use cases with large volumes for example flow logs forwarding

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions