Skip to content

WARNING: politessDelay unset, returning default 5000 #553

@cgr71ii

Description

@cgr71ii

Hi!

I'm crawling with a little bit less politeness configuration than the default and I'm frequently getting (1971 times in the 12 hours I've been crawling):

Mar 27, 2023 10:43:53 AM org.archive.modules.CrawlURI getPolitenessDelay
WARNING: politessDelay unset, returning default 5000 for https://www.unidavi.edu.br/fiqueAtento/2023/3/pedidos-vagas-1-2023-fora-do-prazo-07 (in thread 'ToeThread #163: https://www.unidavi.edu.br/fiqueAtento/2023/3/pedidos-vagas-1-2023-fora-do-prazo-07')

Is this expected? The configuration rules I've modified and that are related to politeness are:

 <bean id="fetchHttp" class="org.archive.modules.fetcher.FetchHTTP">
  <!-- <property name="timeoutSeconds" value="1200" /> -->
  <property name="timeoutSeconds" value="300" /> <!-- 5 min -->
 </bean>

 <bean id="disposition" class="org.archive.crawler.postprocessor.DispositionProcessor">
  <!-- <property name="delayFactor" value="5.0" /> -->
  <property name="delayFactor" value="2.0" />
  <!-- <property name="minDelayMs" value="3000" /> -->
  <property name="minDelayMs" value="1000" /> <!-- 1 sec -->
  <!-- <property name="respectCrawlDelayUpToSeconds" value="300" /> -->
  <property name="respectCrawlDelayUpToSeconds" value="100" />
  <!-- <property name="maxDelayMs" value="30000" /> -->
  <property name="maxDelayMs" value="10000" /> <!-- 10 sec -->
 </bean>

 <bean id="frontier" 
   class="org.archive.crawler.frontier.BdbFrontier">
  <!-- <property name="snoozeLongMs" value="300000" /> -->
  <property name="snoozeLongMs" value="250000" /> <!-- 2.5 min -->
  <!-- <property name="retryDelaySeconds" value="900" /> -->
  <property name="retryDelaySeconds" value="300" /> <!-- 5 min -->
  <!-- <property name="maxRetries" value="30" /> -->
  <property name="maxRetries" value="3" /> <!-- It should be incresed in case of large crawls (e.g. months) -->
 </bean>

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions