Datacenter
📸 Summary¶
This workflow deploys the basic datacenter infrastructure for Proxmox. This includes the following: - ipsets: basically a group of aliases that can be referenced in a single rather than individually - firewall aliases: each node, and network defined in the tfvars will get its own alias. the aliases will also be configured as part of the default security groups to manage network access. - custom bootstrapping script: - with preinstalled packer for vm template building - automatically installs ssh keys, and ssh banner - runs the pve-nag buster script, and sets up the default repositories - configures smtp and default notifications through proxmox - predefined iso and lxc templates to clone into your infrastructure (ubuntu and debian) - global datacenter firewall policies with default drop all inbound traffic - default security group for virtual machines to grant basic default access such as outbound https - configures the default bridge interface and allows you to pass multiple nics to the automation.
🗃️ Repo¶
git clone git@gitlab.com:loganmancuso_public/infrastructure/proxmox/datacenter.git
📜 Dependencies¶
Sample TFvars¶
datacenter_name = "env" # this is the prefix for all state files and is used to isolate environments
domain_name = "mydomain.com"
# nodes have this schema to help isolate them in the automation such that you can always idividually adress nodes as necessary. I use an identifier, primary and worker, each node will have a name and designated ip. The zfs pool is an optional parameter and will just setup regular snapshotting of the zfs disks on the node, you are welcome to remove the parameter. The onboard nics are just a list of available nics that you want to permit vlan traffic over and bond to the default interface. If you want to configure more bridges you shoule be able to modify network.tf and add additional bridges to the automation. They can then be sent as outputs. The key "primary" is required but additional nodes do not need to be called worker01, they can have any name. The automation just needs a way to reference the nodes so if you prefer a serial numbner or guid that should work as well.
nodes = {
"primary" = {
name = "node01"
ip = "XXX.XXX.XXX.XXX"
netmask = "/YY"
zfs_pool = null
onboard_nics = ["eth0", "eth1"]
}
"worker01" = {
name = "node02"
ip = "XXX.XXX.XXX.XXX"
netmask = "/YY"
zfs_pool = "pool"
onboard_nics = ["eth0"]
}
}
# i configure my local dns server (router in my case) to be the authority in my datacenter, i allow for a list. I would recomment https://quad9.net they have been a reliable source from my experience. local is a required field as some things are hardcoded in the automation to look for certain elements.
dns_servers = {
"local" = "XXX.XXX.XXX.XXX",
"mypreferreddns" = "XXX.XXX.XXX.XXX",
"mypreferreddns_alt" = "XXX.XXX.XXX.XXX"
}
networks = {
# this subnet is for the default management layer, id make this your trusted lan
"mgmt" = {
cidr = "XXX.XXX.XXX.XXX"
gateway = "XXX.XXX.XXX.XXX"
netmask = "YY"
comment = "management subnet"
subdomain = "mgmt"
vlan = 0
},
# optional vpn subnet, all this does in the automation is add it to the ipset's trusted list so that you dont have to permit each vm instance to allow inbound from mgmt and vpn as I want them to be treated the same.
"vpn" = {
cidr = "XXX.XXX.XXX.XXX"
gateway = "XXX.XXX.XXX.XXX"
netmask = "YY"
comment = "vpn subnet"
subdomain = null
vlan = 0
},
# this is the subnet for all the nodes to sit on, in my current setup i do not have multiple nics on my systems so i cant do a proper implementation. However in some testing using multiple virutal machines of proxmox you should be able to just define another network and mdify or add additional bridges to the existing infrastructure to achieve the multi nic setup.
"cluster" = {
cidr = "XXX.XXX.XXX.XXX"
gateway = "XXX.XXX.XXX.XXX"
netmask = "YY"
comment = "cluster nodes subnet"
subdomain = "cluster"
vlan = 10
},
}
⚙️ Deployment Instructions¶
🛑 Pre-Deployment¶
Set these secret variables in your environment. I use a file called proxmox.env and run source proxmox.env
to load the values into my shell. This is how Terraform will authenticate with your Proxmox API without storing the credentials in source code.
# Proxmox #
export TF_HTTP_PASSWORD=glpat-XXXXXXXXXXXXXXXXXXXXXXXXXXXX
export TF_HTTP_USERNAME={gitlab_username}
# Hashi Vault #
export VAULT_ADDR='https://localhost:8200'
export VAULT_CAPATH=/etc/ssl/certs/${var.cert_subject.common_name}.pem
export VAULT_DEV_ROOT_TOKEN_ID=hvs.XXXXXXXXXXXXXXXXXXXXXX
export UNSEAL_KEY_1=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
export UNSEAL_KEY_2=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
export UNSEAL_KEY_3=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
- you will also need to set the root password for the proxmox node to the same value you set in the global secrets.
- in addition to setting the root password you will need to copy the ssh key to the remote server, (this could be automated in the future).
ssh-copy-id -i ~/.ssh/path_to_key root@{node_ip}
the path should match the global secrets proxmox key path as well.
🟢 Deployment¶
you will want to look at the variable defaults for this workflow. there are some iso and lxc container templates that are downloaded. if you do not want these pulled in change the values to null or choose your own os and versions to support in your datacenter.
To deploy this workflow link the environment folder to the root directory.
ln -s env/{env}/* .
tofu init .
tofu plan
tofu apply
🏁 Post-Deployment¶
-
It is likely that the vmbr0 is already configured and this workflow will error on first execution.
╷ │ Error: Error creating Linux Bridge interface ││ with proxmox_virtual_environment_network_linux_bridge.vmbr0, │ on network.tf line 79, in resource "proxmox_virtual_environment_network_linux_bridge" "vmbr0": │ 79: resource "proxmox_virtual_environment_network_linux_bridge" "vmbr0" { ││ Could not create Linux Bridge, unexpected error: failed to create network interface "vmbr0" for node "pve-test": received an HTTP 400 │ response - Reason: Parameter verification failed. (iface: interface already exists)
-
To fix this, you will need to import the virtual bridge into the state to manage it through the automation.
tofu import proxmox_virtual_environment_network_linux_bridge.vmbr0[\"{node_id}\"] {node_name}:vmbr0
A successful import will look like this:
tofu import proxmox_virtual_environment_network_linux_bridge.vmbr0[\"{node_id}\"] {node-name}:vmbr0
data.terraform_remote_state.vault: Reading...
proxmox_virtual_environment_network_linux_bridge.vmbr0: Importing from ID "pve-test:vmbr0"...
data.proxmox_virtual_environment_nodes.available_nodes: Reading...
data.proxmox_virtual_environment_roles.available_roles: Reading...
data.proxmox_virtual_environment_nodes.available_nodes: Read complete after 0s [id=nodes]
proxmox_virtual_environment_network_linux_bridge.vmbr0: Import prepared!
Prepared proxmox_virtual_environment_network_linux_bridge for import
proxmox_virtual_environment_network_linux_bridge.vmbr0: Refreshing state... [id=pve-test:vmbr0]
data.proxmox_virtual_environment_roles.available_roles: Read complete after 0s [id=roles]
data.terraform_remote_state.vault: Read complete after 1s
Import successful!
The resources that were imported are shown above. These resources are now in
your OpenTofu state and will henceforth be managed by OpenTofu.
- once the datacenter has been deployed you need to generate an API key for the operations user that was just created.
- Open the Proxmox UI and navigate to Datacenter > Users > API Tokens.
- Create a new token by clicking add, the user is operations and the token id should be 'packer', and uncheck the 'privilege separation' box.
- Save this key and use it to populate the global secrets workflow to store it for future deployments.
📝 Notes¶
Nodes Map¶
In the example above I have a node named "node01", this will be the hostname of the node and its id in terraform will be "primary". There must be a "primary" declared in the tfvars and than any number of worker (or other name) nodes can be declared. By setting a static name like primary for the id of one node it allows the automation to leverage a single variable (var.nodes) rather than having a variable for primary node and a second variable for all the other nodes, its a cleaner solution.
Networks Map¶
In the example above I have declared 2 networks "mgmt" and "vpc", both of these are required however if you are deploying these nodes to a single subnet and not using vlan just set both maps to the same cidr and gateway an the automation will still work. This is how my local DEV system works as I do not have a vlan on my localhost vnet, and instead the proxmox host and the instances that run on it share a single subnet. In my production the proxmox hosts run on a different subnet with the instances in a different vlan for better segregation.
Images¶
lxc and iso images are centrally controlled in this workflow. Images are a variable map with the key being the operating system, then the version followed by the url, checksum, and algorithm. I originally had the iso as part of the template workflows. The issue came down to the way terraform manages lifecycle. You can have one core iso that many different machines are based on. In this case I have a stable version of the custom machine template and a latest one where I test new changes before migrating to it. These workflows would fight to control the lifecycle of the underlying iso file downloaded to the cluster. By instead placing it in the datacenter, the template workflow will only ever reference the local version and should not try to re-download the file. This also fixes an issue where the url or checksum for an image may change over time as these are pulled from public registries. I have added a lifecycle prevent destroy to help keep the files local to the cluster unless you expressly delete them. Lastly the images can take a long time to download. I have set the timeout to 2 hours and noted that in the code, however it is not uncommon for them to take longer, so adjust accordingly. It will need to download one of each file for each node in the cluster.
Datacenter Default Security Group¶
as part of the automation the firewall and security groups are created and deployed to proxmox. Most users do not leverage the firewall on proxmox but this automation deploys all the required ports and also sets up a default drop all inbound rule, for added security.
action | protocol | source | Port | destination | comment |
---|---|---|---|---|---|
ACCEPT | tcp | mgmt | 8006 | nodes | webui for proxmox |
ACCEPT | tcp | mgmt | 8800-8810 | nodes | permit packer to run and deploy vms to this node |
ACCEPT | tcp | mgmt | 8800-8810 | vpc | permit packer to host files for the vm being deployed to the nodeterm |
ACCEPT | tcp | mgmt | 22 | permit ssh | |
ACCEPT | icmp | mgmt | nodes | permit icmp to the node running proxmox | |
ACCEPT | icmp | mgmt | vpc | permit icmp from mgmt to the vpc | |
ACCEPT | icmp | vpc | vpc | permit icmp within the vpc | |
DROP | icmp | vpc | mgmt | block icmp from vpc to mgmt network |
Node Default Security Group¶
right now this isnt fully implemented as the provider does not support node level security groups. however, at the datacenter level each node will have its own rule to permit ingress traffic to the webui and api.
action | protocol | source | Port | destination | comment |
---|---|---|---|---|---|
ACCEPT | tcp | mgmt | 8006 | nodes | webui for proxmox |
ACCEPT | tcp | mgmt | 8800-8810 | nodes | permit packer to run and deploy vms to this node |
ACCEPT | tcp | mgmt | 8800-8810 | vpc | permit packer to host files for the vm being deployed to the nodeterm |
ACCEPT | tcp | mgmt | 22 | permit ssh | |
ACCEPT | icmp | mgmt | nodes | permit icmp to the node running proxmox | |
ACCEPT | icmp | mgmt | vpc | permit icmp from mgmt to the vpc | |
ACCEPT | icmp | vpc | vpc | permit icmp within the vpc | |
DROP | icmp | vpc | mgmt | block icmp from vpc to mgmt network |
Instance Default Security Group¶
by default each instance will be permitted a small subset of rules. These rules allow limited ssh and packer connectivity to the instance for management and deployment of new images.
action | protocol | source | Port | destination | comment |
---|---|---|---|---|---|
ACCEPT | tcp | mgmt | 22 | ssh into instance | |
ACCEPT | tcp | mgmt | 8800-8810 | nodes | permit packer to run and deploy vms to this node |
ACCEPT | tcp | mgmt | 8800-8810 | vpc | permit packer to host files for the vm being deployed to the nodeterm |
📅 Tasks¶
👎 Known Issues¶
<<Global Secrets --- Template Machine>>