You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: python/dataproc_templates/elasticsearch/README.md
+72-5Lines changed: 72 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -363,7 +363,8 @@ This template has been tested with the following versions of the above mentioned
363
363
-`es.bq.input.api.key`: API Key for Elasticsearch Authorization
364
364
-`es.bq.output.dataset`: BigQuery dataset id (format: Dataset_id)
365
365
-`es.bq.output.table`: BigQuery table name (format: Table_name)
366
-
-`es.bq.temp.bucket.name`: Temporary bucket for the Spark BigQuery connector
366
+
-`es.bq.output.temporarygcsbucket`: The GCS bucket that temporarily holds the data before it is loaded to BigQuery
367
+
-`es.bq.output.persistentgcsbucket`: The GCS bucket that holds the data before it is loaded to BigQuery. If informed, the data won't be deleted after write data into BigQuery.
367
368
368
369
#### Optional Arguments
369
370
@@ -414,9 +415,26 @@ This template has been tested with the following versions of the above mentioned
414
415
-`es.bq.flatten.struct.fields`: Flatten the struct fields
415
416
-`es.bq.flatten.array.fields`: Flatten the n-D array fields to 1-D array fields, it needs es.bq.flatten.struct.fields option to be passed
416
417
-`es.bq.output.mode`: Output write mode (one of: append,overwrite,ignore,errorifexists) (Defaults to append)
418
+
-`es.bq.output.bigquerytablelabel`: Used to add labels to the table while writing to a table. Multiple labels can be set.
419
+
-`es.bq.output.createdisposition`: Specifies whether the job is allowed to create new tables.
420
+
-`es.bq.output.persistentgcspath`: The GCS path that holds the data before it is loaded to BigQuery. Used only with es.bq.output.persistentgcsbucket
421
+
-`es.bq.output.datepartition`: The date partition the data is going to be written to.
422
+
-`es.bq.output.partitionfield`: If this field is specified, the table is partitioned by this field.
423
+
-`es.bq.output.partitionexpirationms`: Number of milliseconds for which to keep the storage for partitions in the table.
424
+
-`es.bq.output.partitiontype`: Used to specify Time partitioning. Supported types are: HOUR, DAY, MONTH, YEAR. This option is mandatory for a target table to be Time partitioned. Defaults to DAY if es.bq.output.partitionfield is specified
425
+
-`es.bq.output.partitionrangestart`: Used to specify Integer-range partitioning. This option is mandatory for a target table to be Integer-range partitioned. Pass es.bq.output.partitionrangeend and es.bq.output.partitionrangeinterval along with this option.
426
+
-`es.bq.output.partitionrangeend`: Used to specify Integer-range partitioning. This option is mandatory for a target table to be Integer-range partitioned. Pass es.bq.output.partitionrangestart and es.bq.output.partitionrangeinterval along with this option.
427
+
-`es.bq.output.partitionrangeinterval`: Used to specify Integer-range partitioning. This option is mandatory for a target table to be Integer-range partitioned. Pass es.bq.output.partitionrangestart and es.bq.output.partitionrangeend along with this option.
428
+
-`es.bq.output.clusteredfields`: A string of non-repeated, top level columns seperated by comma.
429
+
-`es.bq.output.allowfieldaddition`: Adds the ALLOW_FIELD_ADDITION SchemaUpdateOption to the BigQuery LoadJob. Allowed values are true and false. Default to false
430
+
-`es.bq.output.allowfieldrelaxation`: Adds the ALLOW_FIELD_RELAXATION SchemaUpdateOption to the BigQuery LoadJob. Allowed values are true and false.
431
+
-`es.bq.output.bignumericdefaultprecision`: An alternative default precision for BigNumeric fields, as the BigQuery default is too wide for Spark. Values can be between 1 and 38.
432
+
-`es.bq.output.bignumericdefaultscale`: An alternative default scale for BigNumeric fields. Values can be between 0 and 38, and less than bigNumericFieldsPrecision. This default is used only when the field has an unparameterized BigNumeric type.
417
433
418
434
**Note:** Make sure that either ```es.bq.input.api.key``` or both ```es.bq.input.user``` and ```es.bq.input.password``` is provided. Setting or not setting all three properties at the same time will throw an error.
419
435
436
+
Pass either ```es.bq.output.temporarygcsbucket``` or ```es.bq.output.persistentgcsbucket```.
Used to specify Time partitioning. Supported types are: HOUR, DAY, MONTH, YEAR. This option is mandatory for a target table to be Time partitioned. Defaults to DAY if es.bq.output.partitionfield is specified
Used to specify Integer-range partitioning. This option is mandatory for a target table to be Integer-range partitioned. Pass es.bq.output.partitionrangeend and es.bq.output.partitionrangeinterval along with this option.
Used to specify Integer-range partitioning. This option is mandatory for a target table to be Integer-range partitioned. Pass es.bq.output.partitionrangestart and es.bq.output.partitionrangeinterval along with this option.
Used to specify Integer-range partitioning. This option is mandatory for a target table to be Integer-range partitioned. Pass es.bq.output.partitionrangestart and es.bq.output.partitionrangeend along with this option.
An alternative default scale for BigNumeric fields. Values can be between 0 and 38, and less than bigNumericFieldsPrecision. This default is used only when the field has an unparameterized BigNumeric type.
0 commit comments