azure locations and file crawling

context

Azure is Microsoft’s cloud offering. Each possible resource that can be deployed in Azure has a location it’s deployed in, such as “East US” or “Italy.” While some resources can be deployed in all locations, other resources have location constraints.

It’s common, when deploying, to have a whole ecosystem of resources that will work together. However, this introduces a problem: which locations work for all resources in a deployment?

Let’s dig in. (Want just the outcome? Check the summary.)

bicep

Bicep is a language for describing Azure resources. A Bicep file sets out a series of resources with preset or parameterized properties in order to deploy said resources.

A minimal Bicep file that creates a resource group might look like this:

param resourceGroupName string = 'myResourceGroup'
param location string = "westus2"

resource resourceGroup 'Microsoft.Resources/resourceGroups@2022-09-01' = {
  name: resourceGroupName
  location: location
}

This is easy to start parsing - I can use grep to find that Microsoft.Resources/resourceGroups@2022-09-01 string and go from there.

nesting

However, minimal is uncommon. As stated above, deployments of multiple resources are much more common.

When working with large deployments, certain resources may be needed more than once. You can repeat your earlier storage account declaration, or, instead, you can template out how to deploy a storage account with given parameters, then reuse that template. This is called a module, and it’s fundamental to organizing Bicep files.

Let’s say this is our file structure. Ignore the lack of parameter files or READMEs, this is just an example.

.
|--infra
   |--env
   |  |--dev
   |  |  |--main.bicep
   |  |--prod
   |     |--main.bicep
   |--modules
      |--rg
      |  |--main.bicep
      |--vm
      |  |--modules
      |  |  |--network.bicep
      |  |  |--virtual-machine.bicep
      |  |--main.bicep
      |--kv
         |--modules
         |  |--role-assignment.bicep
         |  |--key-vault.bicep
         |--main.bicep

Bicep files use relative references for local modules, so infra/env/dev/main.bicep references ../../modules/vm/main.bicep, which references ./modules/network.bicep. While the directory structure in this example could be flattened, my point is: modules can nest, and each module refers relatively to the module(s) it relies on.

finding all resources

Okay, let’s backtrack. From a given Bicep file, we want:

  1. All referenced resource types
  2. All referenced modules

grep

Resources and modules both have patterns in how they are declared. Thankfully, they’re pretty simple regexes. grep will spit out lines in a file that match a given regex.

# this gets us strings like 
# resource resourceGroup 'Microsoft.Resources/resourceGroups@2022-09-01' = {
grep -E "^resource " "$file"

# this gets us strings like
# module vm '../../modules/vm/main.bicep' = {
grep -E "^module " "$file"

cut

From there, let’s use cut to strip off the parts we don’t want.

# this gets us strings like 
# Microsoft.Resources/resourceGroups
grep -E "^resource " "$file" \
   | cut -d "'" -f 2 - \
   | cut -d "@" -f 1 -

# this gets us strings like
# ../../modules/vm/main.bicep
grep -E "^module " "$file" \
   | cut -d "'" -f 2 -

These calls are a little opaque. -d sets a delimiter (what to split on). -f picks a field to return, numbered from 1.

mapfile

We’ll save these values to variables. mapfile reads a file, putting each line into a new array element. -t trims newline characters. The <s do some redirection, and yes, the space between them matters.

mapfile -t resources < <(grep -E "^resource " "$file" \
   | cut -d "'" -f 2 - \
   | cut -d "@" -f 1 -)
mapfile -t modules < <(grep -E "^module " "$file" \
   | cut -d "'" -f 2 -)

dirname (& more)

We can’t just stop there. We need to search each module in turn. Using dirname, we can get the directory of the file we’re searching, then append the relative module path.

get_resources () {
  # ... grep, cut, etc ...
  
  directory=$(dirname "$file")

  for module in "${modules[@]}"
  do
    mapfile -t -O "${#resources[@]}" resources < <(get_resources "$directory/$module")
  done
}

A lot just happened there besides dirname. {modules[@]} is all the array elements (as opposed to just $modules, which evaluates to the first element). ${#modules[@]}, on the other hand - note the pound sign - is the number of elements in the array.

Additionally, mapfile usually writes from index 0 onwards. But with the -O argument, we can specify an origin. By setting the starting point to the length of the array, we append to the array rather than writing over existing data.

Finally, we got some recursion going! get_resources calls get_resources for every module found.

the get_resources function

So far, our code looks like this:

get_resources () {
  mapfile -t resources < <(grep -E "^resource " "$file" \
    | cut -d "'" -f 2 - \
    | cut -d "@" -f 1 -)
  mapfile -t modules < <(grep -E "^module " "$file" \
    | cut -d "'" -f 2 -)

  directory=$(dirname "$file")

  for module in "${modules[@]}"
  do
    mapfile -t -O "${#resources[@]}" resources < <(get_resources "$directory/$module")
  done
  
  for resource in "${resources[@]}"; do; echo "$resource"; done 
}

That last one-liner just returns our results. Note that we don’t just echo "${resources[@]}" - this results in a space-delimited string and it’ll be helpful later to have a newline-delimited string.

finding locations

Now we need to use these resource types to get available locations. First, actually call our function from above. We’ll assume we’re in a directory with a top-level main.bicep file.

mapfile -t resources < <(get_resources "main.bicep")

sort

Does sorting matter? Not really, but sort has a useful feature, -u, which returns unique items (aka, it deduplicates). Looking up the same resource type twice slows us down.

mapfile -t resources < <(get_resources "main.bicep" | sort -u)

sort is one reason it helps to have newlines as delimiters - it expects that.

az

We’ll use az to list all the locations - just to give ourselves a starting point. You could also use the locations for the first resource type.

mapfile -t locations < <(az account list-locations --query "[].displayName" \
  --out tsv)

We can then use an az command to find available locations for a given resource type:

mapfile -t newLocations < <(az provider show --namespace "$namespace" \
  --query "resourceTypes[?resourceType=='$resourceType'].locations | [0]" \
  --out tsv)

--out tsv means we will get a list with no decoration whatsoever - it’s vital for programmatic handling of az command output.

cut (again)

We’ll need to get those $namespace and $resourceType variables. cut comes back in handy:

# remember, $resource is something like Microsoft.Resources/resourceGroups

# this gets us strings like
# Microsoft.Resources
namespace=$(echo "$resource" | cut -d "/" -f 1 -)

# this gets us strings like
# resourceGroups
resourceType=$(echo "$resource" | cut -d "/" -f 2 -)

comm

Okay, we can get locations. How do we handle finding their intersection?

comm to the rescue. It finds common lines between two sorted files. Its default output is three columns - lines only in file 1, lines only in file 2, and lines common to both. We can suppress the first two columns with -12.

comm expects files, so we’ll reuse our redirection < <(someCommand) from earlier.

mapfile -t locations < <(comm -12 \
  <(for location in "${locations[@]}"; do echo "$location"; done) \
  <(for location in "${newLocations[@]}"; do echo "$location"; done) ) 

comm also likes newline-delimited input, so we’re again looping through the array rather than echoing all values at once.

catching errors

With functionality as it is, many deployments will come back with 0 locations available. Turns out some basic resource types, like role assignments, don’t have locations. So let’s filter those.

if [[ ${#newLocations[@]} -eq 0 ]]
then
   # handle
fi

tee

We’ll print the locations to the shell. We can even use tee to print them to a file for good measure:

for location in "${locations[@]}"; do echo "$location"; done | tee locations.txt

the location code

Here’s our code for this section:

mapfile -t resources < <(get_resources "main.bicep" | sort -u)

mapfile -t locations < <(az account list-locations --query "[].displayName" \
  --out tsv)

for resource in "${resources[@]}" 
do
  namespace=$(echo "$resource" | cut -d "/" -f 1 -)
  resourceType=$(echo "$resource" | cut -d "/" -f 2 -)

  mapfile -t newLocations < <(az provider show --namespace "$namespace" \
    --query "resourceTypes[?resourceType=='$resourceType'].locations | [0]" \
    --out tsv)

  if [[ ${#newLocations[@]} -eq 0 ]]
  then
    continue
  fi
  
  mapfile -t locations < <(comm -12 \
    <(for location in "${locations[@]}"; do echo "$location"; done) \
    <(for location in "${newLocations[@]}"; do echo "$location"; done) )
done

for location in "${locations[@]}"; do echo "$location"; done | tee locations.txt

summary

Here’s our final script:

# Recursively crawls bicep files to find all referenced resources
get_resources () {
  mapfile -t resources < <(grep -E "^resource " "$file" \
    | cut -d "'" -f 2 - \
    | cut -d "@" -f 1 -)
  mapfile -t modules < <(grep -E "^module " "$file" \
    | cut -d "'" -f 2 -)

  directory=$(dirname "$file")

  for module in "${modules[@]}"
  do
    mapfile -t -O "${#resources[@]}" resources < <(get_resources "$directory/module")
  done
  
  for resource in "${resources[@]}"; do echo "$resource"; done 
}

# Execution starts here
mapfile -t resources < <(get_resources "main.bicep" | sort -u)

mapfile -t locations < <(az account list-locations --query "[].displayName" \
  --out tsv)

for resource in "${resources[@]}" 
do
  namespace=$(echo "$resource" | cut -d "/" -f 1 -)
  resourceType=$(echo "$resource" | cut -d "/" -f 2 -)

  mapfile -t newLocations < <(az provider show --namespace "$namespace" \
    --query "resourceTypes[?resourceType=='$resourceType'].locations | [0]" \
    --out tsv)

  if [[ ${#newLocations[@]} -eq 0 ]]
  then
    continue
  fi
  
  mapfile -t locations < <(comm -12 \
    <(for location in "${locations[@]}"; do echo "$location"; done) \
    <(for location in "${newLocations[@]}"; do echo "$location"; done) )
done

for location in "${locations[@]}"; do echo "$location"; done | tee locations.txt

category: reference


see more software