-
Notifications
You must be signed in to change notification settings - Fork 306
Open
Labels
check_for_bugNeeds to be reproducedNeeds to be reproduced
Description
Describe the bug
fscrawler stops working after he gets a 429 too many requests from elastic.
the JVM keeps running
**Job Settings**
```yml
---
name: "fscrawler"
fs:
ignore_above: "10mb"
url: "/REDACTED/tempdirs"
update_rate: "5m"
excludes:
- "*/~*"
json_support: false
filename_as_id: false
add_filesize: true
remove_deleted: false
add_as_inner_object: false
store_source: true
index_content: true
attributes_support: false
raw_metadata: false
xml_support: false
index_folders: false
lang_detect: false
continue_on_error: true
ocr:
enabled: false
language: "eng"
pdf_strategy: "ocr_and_text"
follow_symlinks: true
indexed_chars: "5000000"
elasticsearch:
push_templates: false
nodes:
- url: http://REDACTED:9200
bulk_size: 50
flush_interval: "5s"
byte_size: "10mb"
ssl_verification: false
username: fscrawler
password: REDACTED
tempdir is a collection of symlinks to various places on one filesystem..
Logs
Jul 23 16:15:51 fscrawler[2542166]: jakarta.ws.rs.ClientErrorException: HTTP 429 Too Many Requests
Jul 23 16:15:51 fscrawler[2542166]: at org.glassfish.jersey.client.JerseyInvocation.createExceptionForFamily(JerseyInvocation.java:1002) ~[jersey-client-3.1.7.jar:?]
Jul 23 16:15:51 fscrawler[2542166]: at org.glassfish.jersey.client.JerseyInvocation.convertToException(JerseyInvocation.java:984) ~[jersey-client-3.1.7.jar:?]
Jul 23 16:15:51 fscrawler[2542166]: at org.glassfish.jersey.client.JerseyInvocation.translate(JerseyInvocation.java:770) ~[jersey-client-3.1.7.jar:?]
Jul 23 16:15:51 fscrawler[2542166]: at org.glassfish.jersey.client.JerseyInvocation.lambda$invoke$1(JerseyInvocation.java:687) ~[jersey-client-3.1.7.jar:?]
Jul 23 16:15:51 fscrawler[2542166]: at org.glassfish.jersey.client.JerseyInvocation.call(JerseyInvocation.java:709) ~[jersey-client-3.1.7.jar:?]
Jul 23 16:15:51 fscrawler[2542166]: at org.glassfish.jersey.client.JerseyInvocation.lambda$runInScope$3(JerseyInvocation.java:703) ~[jersey-client-3.1.7.jar:?]
Jul 23 16:15:51 fscrawler[2542166]: at org.glassfish.jersey.internal.Errors.process(Errors.java:292) ~[jersey-common-3.1.7.jar:?]
Jul 23 16:15:51 fscrawler[2542166]: at org.glassfish.jersey.internal.Errors.process(Errors.java:274) ~[jersey-common-3.1.7.jar:?]
Jul 23 16:15:51 fscrawler[2542166]: at org.glassfish.jersey.internal.Errors.process(Errors.java:205) ~[jersey-common-3.1.7.jar:?]
Jul 23 16:15:51 fscrawler[2542166]: at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:391) ~[jersey-common-3.1.7.jar:?]
Jul 23 16:15:51 fscrawler[2542166]: at org.glassfish.jersey.client.JerseyInvocation.runInScope(JerseyInvocation.java:703) ~[jersey-client-3.1.7.jar:?]
Jul 23 16:15:51 fscrawler[2542166]: at org.glassfish.jersey.client.JerseyInvocation.invoke(JerseyInvocation.java:686) ~[jersey-client-3.1.7.jar:?]
Jul 23 16:15:51 fscrawler[2542166]: at org.glassfish.jersey.client.JerseyInvocation$Builder.method(JerseyInvocation.java:450) ~[jersey-client-3.1.7.jar:?]
Jul 23 16:15:51 fscrawler[2542166]: at fr.pilato.elasticsearch.crawler.fs.client.ElasticsearchClient.httpCall(ElasticsearchClient.java:926) ~[fscrawler-elasticsearch-client-2.
10-SNAPSHOT.jar:
Jul 23 16:15:51 fscrawler[2542166]: at fr.pilato.elasticsearch.crawler.fs.client.ElasticsearchClient.httpPost(ElasticsearchClient.java:902) ~[fscrawler-elasticsearch-client-2.
10-SNAPSHOT.jar:
Jul 23 16:15:51 fscrawler[2542166]: at fr.pilato.elasticsearch.crawler.fs.client.ElasticsearchClient.bulk(ElasticsearchClient.java:823) ~[fscrawler-elasticsearch-client-2.10-S
NAPSHOT.jar:?]
Jul 23 16:15:51 fscrawler[2542166]: at fr.pilato.elasticsearch.crawler.fs.client.ElasticsearchEngine.bulk(ElasticsearchEngine.java:82) ~[fscrawler-elasticsearch-client-2.10-SN
APSHOT.jar:?]
Jul 23 16:15:51 fscrawler[2542166]: at fr.pilato.elasticsearch.crawler.fs.client.ElasticsearchEngine.bulk(ElasticsearchEngine.java:31) ~[fscrawler-elasticsearch-client-2.10-SN
APSHOT.jar:?]
Jul 23 16:15:51 fscrawler[2542166]: at fr.pilato.elasticsearch.crawler.fs.framework.bulk.FsCrawlerBulkProcessor.execute(FsCrawlerBulkProcessor.java:150) [fscrawler-framework-2
.10-SNAPSHOT.jar
Jul 23 16:15:51 fscrawler[2542166]: at fr.pilato.elasticsearch.crawler.fs.framework.bulk.FsCrawlerBulkProcessor.executeIfNeeded(FsCrawlerBulkProcessor.java:130) [fscrawler-fra
mework-2.10-SNAP:?]
Jul 23 16:15:51 fscrawler[2542166]: at fr.pilato.elasticsearch.crawler.fs.framework.bulk.FsCrawlerBulkProcessor.add(FsCrawlerBulkProcessor.java:117) [fscrawler-framework-2.10-
SNAPSHOT.jar:?]
Jul 23 16:15:51 fscrawler[2542166]: at fr.pilato.elasticsearch.crawler.fs.client.ElasticsearchClient.indexRawJson(ElasticsearchClient.java:426) [fscrawler-elasticsearch-client
-2.10-SNAPSHOT.j
Jul 23 16:15:51 fscrawler[2542166]: at fr.pilato.elasticsearch.crawler.fs.service.FsCrawlerDocumentServiceElasticsearchImpl.indexRawJson(FsCrawlerDocumentServiceElasticsearchI
mpl.java:82) [fscore-2.10-SNAPSHOT.jar:?]
Jul 23 16:15:51 fscrawler[2542166]: at fr.pilato.elasticsearch.crawler.fs.service.FsCrawlerDocumentServiceElasticsearchImpl.index(FsCrawlerDocumentServiceElasticsearchImpl.jav
a:76) [fscrawler10-SNAPSHOT.jar:?]
Jul 23 16:15:51 fscrawler[2542166]: at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.indexFile(FsParserAbstract.java:449) [fscrawler-core-2.10-SNAPSHOT.jar:?]
Jul 23 16:15:51 fscrawler[2542166]: at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:276) [fscrawler-core-2.10-SNAPSHOT.jar:?]
Jul 23 16:15:51 fscrawler[2542166]: at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:303) [fscrawler-core-2.10-SNAPSHOT.jar:?]
Jul 23 16:15:51 fscrawler[2542166]: at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.run(FsParserAbstract.java:149) [fscrawler-core-2.10-SNAPSHOT.jar:?]
Jul 23 16:15:51 fscrawler[2542166]: at java.base/java.lang.Thread.run(Thread.java:833) [?:?]
Expected behavior
should not crash, wait and try again later
Versions:
- OS: oracle linux 8.10
- Version fscrawler-distribution-2.10-SNAPSHOT
Attachment
If the bug is related to a given file, please share this file, so we can reuse it in tests
to reproduce the problem and may be use it in our integration tests.
- threaddump
Metadata
Metadata
Assignees
Labels
check_for_bugNeeds to be reproducedNeeds to be reproduced