Creates a new job to analyze a dataset and create its data profile.
See also: AWS API Documentation
See ‘aws help’ for descriptions of global parameters.
create-profile-job
--dataset-name <value>
[--encryption-key-arn <value>]
[--encryption-mode <value>]
--name <value>
[--log-subscription <value>]
[--max-capacity <value>]
[--max-retries <value>]
--output-location <value>
--role-arn <value>
[--tags <value>]
[--timeout <value>]
[--job-sample <value>]
[--cli-input-json | --cli-input-yaml]
[--generate-cli-skeleton <value>]
--dataset-name
(string)
The name of the dataset that this job is to act upon.
--encryption-key-arn
(string)
The Amazon Resource Name (ARN) of an encryption key that is used to protect the job.
--encryption-mode
(string)
The encryption mode for the job, which can be one of the following:
SSE-KMS
-SSE-KMS
- Server-side encryption with AWS KMS-managed keys.
SSE-S3
- Server-side encryption with keys managed by Amazon S3.Possible values:
SSE-KMS
SSE-S3
--name
(string)
The name of the job to be created. Valid characters are alphanumeric (A-Z, a-z, 0-9), hyphen (-), period (.), and space.
--log-subscription
(string)
Enables or disables Amazon CloudWatch logging for the job. If logging is enabled, CloudWatch writes one log stream for each job run.
Possible values:
ENABLE
DISABLE
--max-capacity
(integer)
The maximum number of nodes that DataBrew can use when the job processes data.
--max-retries
(integer)
The maximum number of times to retry the job after a job run fails.
--output-location
(structure)
An Amazon S3 location (bucket name an object key) where DataBrew can read input data, or write output from a job.
Bucket -> (string)
The S3 bucket name.
Key -> (string)
The unique name of the object in the bucket.
Shorthand Syntax:
Bucket=string,Key=string
JSON Syntax:
{
"Bucket": "string",
"Key": "string"
}
--role-arn
(string)
The Amazon Resource Name (ARN) of the AWS Identity and Access Management (IAM) role to be assumed when DataBrew runs the job.
--tags
(map)
Metadata tags to apply to this job.
key -> (string)
value -> (string)
Shorthand Syntax:
KeyName1=string,KeyName2=string
JSON Syntax:
{"string": "string"
...}
--timeout
(integer)
The job’s timeout in minutes. A job that attempts to run longer than this timeout period ends with a status of
TIMEOUT
.
--job-sample
(structure)
Sample configuration for profile jobs only. Determines the number of rows on which the profile job will be executed. If a JobSample value is not provided, the default value will be used. The default value is CUSTOM_ROWS for the mode parameter and 20000 for the size parameter.
Mode -> (string)
Determines whether the profile job will be executed on the entire dataset or on a specified number of rows. Must be one of the following:
FULL_DATASET: Profile job will be executed on the entire dataset.
CUSTOM_ROWS: Profile job will be executed on the number of rows specified in the Size parameter.
Size -> (long)
Size parameter is only required when the mode is CUSTOM_ROWS. Profile job will be executed on the the specified number of rows. The maximum value for size is Long.MAX_VALUE.
Long.MAX_VALUE = 9223372036854775807
Shorthand Syntax:
Mode=string,Size=long
JSON Syntax:
{
"Mode": "FULL_DATASET"|"CUSTOM_ROWS",
"Size": long
}
--cli-input-json
| --cli-input-yaml
(string)
Reads arguments from the JSON string provided. The JSON string follows the format provided by --generate-cli-skeleton
. If other arguments are provided on the command line, those values will override the JSON-provided values. It is not possible to pass arbitrary binary values using a JSON-provided value as the string will be taken literally. This may not be specified along with --cli-input-yaml
.
--generate-cli-skeleton
(string)
Prints a JSON skeleton to standard output without sending an API request. If provided with no value or the value input
, prints a sample input JSON that can be used as an argument for --cli-input-json
. Similarly, if provided yaml-input
it will print a sample input YAML that can be used with --cli-input-yaml
. If provided with the value output
, it validates the command inputs and returns a sample output JSON for that command.
See ‘aws help’ for descriptions of global parameters.