-
Notifications
You must be signed in to change notification settings - Fork 147
Description
Trying to read a 15M gzipped file (200M uncompressed) shows a strange behaviour, with logstash 7.9.2. I tried reading plain files and it doesnt seem to have a difference.
- It takes about a minute to read the file entirely (reading from an EC2 machine on the same region)
- Trying locally with python or bash, takes about 5 seconds to read the entire file.
After looking at the debug logs it seems that the download part is really fast, and the local processing is very slow.
[2021-01-12T14:45:50,258][DEBUG][logstash.inputs.s3] Processing {:bucket=>"test", :key=>"logs/sample1.gz"}
[2021-01-12T14:45:50,259][DEBUG][logstash.inputs.s3] Downloading remote file {:remote_key=>"logs/sample1.gz", :local_filename=>"/tmp/logstash/sample1.gz"}
[2021-01-12T14:45:50,572][DEBUG][logstash.inputs.s3] Processing file {:filename=>"/tmp/logstash/sample1.gz"}
[2021-01-12T14:46:40,435][DEBUG][logstash.inputs.s3] Processing {:bucket=>"test", :key=>"logs/sample2.gz"}
I tried looking into the source code of the plugin, removing some of the code to be able to pinpoint the problem. Eventually I removed almost every line in the process_local_log function, keeping only the codec (which is plain by default) decoding, and the queue << event line. It seems that this is the line that is taking most of the time.
Any idea what could be the cause of this? This makes the plugin almost unusable in use cases with large volumes for example flow logs forwarding