Partials
Authentication
The Authentication map defines the authentication parameters for connecting to a remote service (e.g. HDFS, Blob Storage, etc.).
Parameters
| Attribute | Type | Required | Description |
|---|---|---|---|
| method | String | true | A value of AzureSharedKey, AzureSharedAccessSignature, AzureDataLakeStorageToken, AzureDataLakeStorageGen2AccountKey, AzureDataLakeStorageGen2OAuth, AmazonAccessKey, GoogleCloudStorageKeyFile which defines which method should be used to authenticate with the remote service. |
| accountName | String | false* | Required for AzureSharedKey and AzureSharedAccessSignature. |
| signature | String | false* | Required for AzureSharedKey. |
| container | String | false* | Required for AzureSharedAccessSignature. |
| token | String | false* | Required for AzureSharedAccessSignature. |
| clientID | String | false* | Required for AzureDataLakeStorageToken. |
| refreshToken | String | false* | Required for AzureDataLakeStorageToken. |
| accountName | String | false* | Required for AzureDataLakeStorageGen2AccountKey. |
| accessKey | String | false* | Required for AzureDataLakeStorageGen2AccountKey. |
| clientID | String | false* | Required for AzureDataLakeStorageGen2OAuth. |
| secret | String | false* | Required for AzureDataLakeStorageGen2OAuth. |
| directoryID | String | false* | Required for AzureDataLakeStorageGen2OAuth. |
| accessKeyID | String | false* | Required for AmazonAccessKey. |
| secretAccessKey | String | false* | Required for AmazonAccessKey. |
| projectID | String | false* | Required for GoogleCloudStorageKeyFile. |
| keyFilePath | String | false* | Required for GoogleCloudStorageKeyFile. |
Examples
{
"type": "DelimitedExtract",
...
"authentication": {
"method": "AzureSharedKey",
"accountName": "myaccount",
"signature": "ctzMq410TV3wS7upTBcunJTDLEJwMAZuFPfr0mrrA08=",
}
...
}
{
"type": "DelimitedExtract",
...
"authentication": {
"method": "AzureSharedAccessSignature",
"accountName": "myaccount",
"container": "mycontainer",
"token": "sv=2015-04-05&st=2015-04-29T22%3A18%3A26Z&se=2015-04-30T02%3A23%3A26Z&sr=b&sp=rw&sip=168.1.5.60-168.1.5.70&spr=https&sig=Z%2FRHIX5Xcg0Mq2rqI3OlWTjEg2tYkboXr1P9ZUXDtkk%3D",
}
...
}
{
"type": "DelimitedExtract",
...
"authentication": {
"method": "AmazonAccessKey",
"accessKeyID": "AKIAIOSFODNN7EXAMPLE",
"secretAccessKey": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
}
...
}
Environments
The Environments list specifies a list of environments under which the stage will be executed. The environments list must contain the value in the ETL_CONF_ENV environment variable or etl.config.environment spark-submit argument for the stage to be executed.
Examples
If a stage is to be executed in both production and testing and the ETL_CONF_ENV environment variable is set to production or test then the DelimitedExtract stage defined here will be executed. If the ETL_CONF_ENV environment variable was set to something else like user_acceptance_testing then this stage will not be executed and a warning message will be logged.
{
"type": "DelimitedExtract",
...
"environments": ["production", "test"],
...
}
A practical use case of this is to execute additional stages in testing which would prevent the job from being automatically deployed to production via Continuous Delivery if it fails:
{
"type": "ParquetExtract",
"name": "load the manually verified known good set of data from testing",
"environments": ["test"],
"outputView": "known_correct_dataset",
...
},
{
"type": "EqualityValidate",
"name": "ensure the business logic produces the same result as the known good set of data from testing",
"environments": ["test"],
"leftView": "newly_caluclated_dataset",
"rightView": "known_correct_dataset",
...
}
User Defined Functions
To help with common data tasks several additional functions have been added to Arc in addition to the inbuilt Spark SQL Functions.
get_json_double_array
Since: 1.0.9
Similar to get_json_object - but extracts a json double array from path.
SELECT get_json_double_array('[0.1, 1.1]', '$')
get_json_integer_array
Since: 1.0.9
Similar to get_json_object - but extracts a json integer array from path.
SELECT get_json_integer_array('[1, 2]', '$')
get_json_long_array
Since: 1.0.9
Similar to get_json_object - but extracts a json long array from path.
SELECT get_json_long_array('[2147483648, 2147483649]', '$')