How does Pulumi on AWS with Databricks interact? <...
# aws
How does Pulumi on AWS with Databricks interact? mentions that the workspace must already be deployed/created. But I was hoping to create the databricks worksapces automatically from pulumi. How would this work instead?
Are you aware of any templates deploying Databricks with Pulumi? (just getting started with it ) so I would appreciate any pointers to build from
nope, but if you let me know what you’re trying to do I can maybe help
Thanks! This would be awesome. I have 3 AWS accounts dev/test/prod and want to create a databricks workspace in all 3 stacks. Then I want to create some autoscaling cluster in the workspace. Furthermore, I want to create some S3 buckets a) one for my data outputs b) any necessary ones for the databricks workspace to live happily in their account. Any useful information like URLs to sign in or bucket ids/names should be in the stack readme ( For the reademe: I was wollowing
Copy code
import pulumi
from pulumi_aws_native import s3

# Create an AWS resource (S3 Bucket)
bucket = s3.Bucket("my_bucket")


with open('./') as f:
The readme with contents of:
Copy code
- main bucket ${}
- my_bucket ${my_bucket}
Is neatly uploaded, however the ${} reference to the variable is never resolved and empty For the databricks workspace:
Copy code
workspace = mws.Workspaces("my_workspace", 
    account_id="<AccountId>", # Your Account Id
    aws_region="<AWSRegion>",  # The AWS region for the VPC
    credentials_id="<CredentialsId>",  # Your credential id
    storage_configuration_id="<StorageConfigurationId>",  # Your storage configuration id
    network_id="<NetworkId>",  # Your network id
    workspace_name="<WorkSpaceName>",)  # Name for your workspace
- how should I retrieve the AWS region? - - should I us the require concept:
name = config.require('name');
? - or can these be auto-resolved from a configured AWS CLI? - how to set the following: - credentials_id - storage_configuration_id - network_id I think these might require setting up some more S3 buckets or VPCs for databricks? But am unsure what to put there. Regarding the cluster: I think is already describing how to set up the cluster. But if you could include a mini dummy cluster example this would be very helpful as well.
Ideally you could include how perhaps a unity enabled workspace can be created which also include the metastore
Also: is confusing as it is mentioning GCP whereas I am interested in the AWS version.
By the way the actual import seems to be.
from pulumi_databricks import MwsWorkspaces
The pulumi AI wants me to execute:
Copy code
from pulumi_databricks.mws import MwsWorkspaces, MwsCredentials, MwsStorageConfigurations
but this fails with undefined imports
I am exploring a bit further together with pulumi AI. When looking at the credentials_id it looks like: creds = MwsCredentials("creds", credentials_name="my-credentials", role_arn="arnawsiam:123456789012role/my-role") is needed. However, I am unsure what proper role/arn and sufficient permissions to choose here. are the manual steps and their terraform pendents though. In particular for step 3 (Step 3: Create a credential configuration in Databricks) they mention certain settings which need to be defined on the databricks side. Can these also be set automatically via Pulumi?
@billowy-army-68599 can you help me further?
unfortunately databricks is not my area of expertise.
just reading the thread now
is this your first foray into databricks with AWS?
I have a lot of AWS experience but no databricks experience I’m afraid,
i’ve added creating an example to my todo list, but it could take some time
Understood - I will try to go step-by-step for and turning each individual step into a pulumi automation script.
@billowy-army-68599 but the question regarding the readme topic why the fields do not render in the readme - can you answer that part? It is generic AWS related