Pan-Canadian AI Compute Environment Clusters¶

The PAICE (Pan-Canadian AI Compute Environment) clusters are part of the Digital Research Alliance of Canada (DRAC) infrastructure, dedicated exclusively to AI research and reserved for CIFAR AI Chairs. Unlike general DRAC clusters, PAICE clusters use AIP (Artificial Intelligence Project) allocations and follow a tiered access model.

Note

When publishing research using the Pan-Canadian AI Compute Environment (PAICE), acknowledge the Digital Research Alliance of Canada, your specific regional partner and the AI institute that manages your cluster.

Account creation¶

PAICE accounts are hosted on the DRAC CCDB portal. Account creation follows the same steps as for other DRAC clusters. See the DRAC Clusters guide for the full process.

As a Mila researcher, you can request access of all the resources in the Artificial Intelligence tabs in Resources > Access Systems at CCDB.

Access to AIP allocation¶

Before you will be able to submit jobs on PAICE clusters, your professor must add you to their AIP (Artificial Intelligence Project) allocation. To be added, share your CCRI.

Connect to the clusters¶

Connecting to PAICE clusters follows the same steps as DRAC clusters, including setting up SSH keys and configuring multifactor authentication on the CCDB portal. See the DRAC Clusters guide for instructions.

Renewal¶

Account renewal follows the same annual process as other DRAC clusters. See the DRAC Clusters guide for instructions.

Clusters¶

These clusters use AIP allocations (--account=aip-${PI_NAME}), where ${PI_NAME} is the name of your supervising professor. Regular DRAC allocations and the Mila global allocation will not work.

Access priority is distributed across three tiers based on supervisor affiliation and researcher location. By default, tier 1 and tier 2 together receive 85% of each cluster's resources, while tier 3 receives 10%. Contact your supervisor to determine which tier applies to your research group.

The table below provides information on the allocation depending of the default tier of a Mila researcher for the period which spans from April 7, 2026 to Spring 2027.

Cluster	CPUs	RGUs allocated	# GPU equiv	Model	Unrestricted internet
TamIA tier1 + tier2	435	1738	143	H100-80G H200	No
Killarney tier3	0	794	75	H100-80G L40S	Yes
Vulcan tier3	0	850	82	L40S	No

Check the current status of the clusters on the DRAC status page.

TamIA¶

Digital Research Alliance of Canada doc

Cluster managed by Mila and Calcul Québec, located at Université Laval. Compute resources in TamIA are not assigned to jobs on a per-CPU, but on a per-node basis. No internet access on compute nodes.

Killarney¶

Digital Research Alliance of Canada doc

Cluster managed by Vector and SciNet, located at the University of Toronto.

Vulcan¶

Digital Research Alliance of Canada doc

Cluster managed by the University of Alberta and AMII, located at the University of Alberta. No internet access on compute nodes.

Launching jobs¶

Users must specify the AIP allocation using the flag --account=aip-${PI_NAME}, where ${PI_NAME} is the name of the supervising professor. To launch a CPU-only job:

sbatch --time=1:00:00 --account=aip-${PI_NAME} job.sh

To launch a GPU job:

sbatch --time=1:00:00 --account=aip-${PI_NAME} --gres=gpu:1 job.sh

To get an interactive session:

salloc --time=1:00:00 --account=aip-${PI_NAME} --gres=gpu:1

TamIA: per-node allocation

On TamIA, compute resources are allocated on a per-node basis, not per-CPU. Refer to the TamIA documentation for node specifications and submission guidelines.

The full documentation for job launching on Alliance clusters can be found here.

Storage¶

Storage	Path	Usage
`$HOME`	`/home/<user>/`	Code, specific libraries
`$HOME/projects`	`/project/<project>`	Compressed raw datasets
`$SCRATCH`	`/scratch/<user>`	Processed datasets, experimental results, logs of experiments
`$SLURM_TMPDIR`	(on compute node)	Temporary job data or results

When a series of experiments is finished, results should be transferred back to Mila servers.

More details on storage can be found on the DRAC Clusters guide or on DRAC wiki.

Using CometML and Wandb¶

Some compute nodes don't have access to the internet, but there is a special module that can be loaded in order to allow training scripts to access some specific servers, which includes the necessary servers for using CometML and Wandb ("Weights and Biases").

1	`module load httpproxy`

More documentation about this can be found here.

Note

Be careful when using Wandb with httpproxy. It does not support sending artifacts and wandb's logger will hang in the background when your training is completed, wasting resources until the job times out. It is recommended to use the offline mode with wandb instead to avoid such waste.

Pan-Canadian AI Compute Environment Clusters¶

Account creation¶

Access to AIP allocation¶

Connect to the clusters¶

Renewal¶

Clusters¶

TamIA¶

Killarney¶

Vulcan¶

Launching jobs¶

Storage¶

Using CometML and Wandb¶

Comments