Batch Execution (Batch Execution) — CAE Glossary
What is Batch Execution
Definition and Basic Concepts
Batch Execution is a method of running CAE solvers without using a GUI (graphical user interface), instead launching them from the command line or through scripts. The greatest advantage is that you don't need to sit in front of the screen; you can submit a job and walk away while it computes until results are ready.
Typical batch execution command examples:
# For Abaqus
abaqus job=model_01 cpus=8 interactive
# For OpenFOAM
mpirun -np 16 simpleFoam -parallel > log.simpleFoam 2>&1
# For ANSYS Mechanical
ansys212 -b -np 4 -i input.dat -o output.txt
What's the difference between running batch execution and running analysis normally with a GUI? At my company, everyone just clicks the "Submit" button from the Abaqus/CAE screen...
When you submit from the GUI, it's actually running batch execution behind the scenes. However, with the GUI you need to stay at your PC, and you have to manually operate each case one at a time. By using batch execution directly, you can submit 50 cases overnight and have all the results ready when you come in the morning.
That sounds convenient. But command line seems difficult... Do I need Linux knowledge?
You need at least basic Linux commands—cd to change directories, ls to list files, cat to view file contents. That much is sufficient to get started. In practice, HPC clusters run Linux almost exclusively, so you'll end up learning it sooner or later anyway.
HPC Queue System
At companies and research institutions, CAE analysis is performed on HPC (High Performance Computing) clusters shared by multiple users. In this environment, a job scheduler (queue system) is responsible for fairly distributing compute resources.
Representative job schedulers:
- SLURM (Simple Linux Utility for Resource Management) — Currently the most widespread. Submit jobs with
sbatch, check status withsqueue - PBS / Torque (Portable Batch System) — Classic standard. Submit with
qsub, check status withqstat - LSF (Load Sharing Facility) — IBM product, widely adopted by large enterprises. Submit with
bsub - SGE (Sun Grid Engine) — Currently maintained as Open Grid Scheduler
I often hear about SLURM and PBS when running analysis on HPC clusters. What are those? How is that different from running on my own PC?
They are software called job schedulers. For example, when 100 engineers use the same cluster, if everyone ran their jobs freely, resources would quickly run out. The scheduler automatically allocates jobs—"this person's job runs on nodes 3-6, that person's job runs on nodes 10-12," and so on.
How exactly do you submit a job?
Using SLURM as an example, you first write a "job script"—a shell script. You specify the required CPUs, memory, maximum execution time, and execution command in it, then submit it with sbatch job.sh. It goes into the queue and runs automatically when resources become available. If the wait is long, you can check your job status with squeue -u username.
How to Write Job Scripts
Basic SLURM job script structure:
#!/bin/bash
#SBATCH --job-name=crash_sim_01
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=32
#SBATCH --time=24:00:00
#SBATCH --partition=general
#SBATCH --output=job_%j.out
#SBATCH --error=job_%j.err
module load abaqus/2024
cd $SLURM_SUBMIT_DIR
abaqus job=crash_model cpus=$SLURM_NTASKS scratch=$TMPDIR mp_mode=mpi
PBS job script example:
#!/bin/bash
#PBS -N thermal_analysis
#PBS -l nodes=1:ppn=16
#PBS -l walltime=12:00:00
#PBS -q batch
#PBS -o thermal.log
#PBS -e thermal.err
module load ansys/2024r1
cd $PBS_O_WORKDIR
ansys242 -b -np 16 -i thermal_input.dat -o thermal_output.txt
What happens if I exceed the wall time in #SBATCH --time=24:00:00?
The job gets forcibly terminated. This is one of the biggest gotchas for beginners. Setting wall time too short means your queue wait time is shorter, but set it too short and it gets killed midway. Initially, set it conservatively, then learn appropriate values from experience.
If it gets killed, do I have to start over? That's tough...
Many solvers have a restart capability. For example, Abaqus lets you restart from a previous job with oldjob=previous_job_name. LS-DYNA also has R=rstfile for restart. For large nonlinear or explicit impact analyses, it's standard practice to set frequent restart file output.
Integration with Parametric Studies
Batch execution truly shines when running many cases by varying design parameters—a parametric study (parameter sweep). Running all combinations manually through the GUI—say, 5 thickness levels × 3 materials × 4 load levels = 60 cases—is impractical, but batch execution can automate it.
How do you generate different input files for a parametric study batch run? You can't write 60 cases by hand.
The standard approach is "template + substitution script." First, create an input file template with parameters marked as placeholders like @@THICKNESS@@. Then use Python or a shell script to generate files by replacing placeholders with actual values. Here's the idea:
# Example: Generating parametric input files in Python
import itertools, os
thicknesses = [1.0, 1.5, 2.0, 2.5, 3.0]
materials = ['steel', 'aluminum', 'titanium']
loads = [1000, 2000, 3000, 4000]
template = open('template.inp').read()
for t, mat, load in itertools.product(thicknesses, materials, loads):
case_name = f"case_t{t}_mat{mat}_F{load}"
inp = template.replace('@@THICKNESS@@', str(t))
inp = inp.replace('@@MATERIAL@@', mat)
inp = inp.replace('@@LOAD@@', str(load))
os.makedirs(case_name, exist_ok=True)
with open(f"{case_name}/model.inp", 'w') as f:
f.write(inp)
Once the input files are created, can job submission also be automated? I'd rather not manually sbatch 60 times.
SLURM has a job array feature. Write #SBATCH --array=0-59 and 60 jobs are submitted at once, each getting a number from 0-59 as $SLURM_ARRAY_TASK_ID. Inside the script, use this ID to pull parameters, and one script manages all cases. This is a fundamental HPC operation technique.
#!/bin/bash
#SBATCH --job-name=param_sweep
#SBATCH --array=0-59
#SBATCH --ntasks=8
#SBATCH --time=04:00:00
#SBATCH --output=logs/job_%A_%a.out
# Get case from parameter list
CASE_DIR=$(sed -n "$((SLURM_ARRAY_TASK_ID + 1))p" case_list.txt)
cd $CASE_DIR
module load abaqus/2024
abaqus job=model cpus=$SLURM_NTASKS scratch=$TMPDIR
Can post-processing also be automated? Creating graphs for 60 cases by hand is exhausting.
Post-processing automation is where batch execution really shines. Write Python to extract results from ODB files, write max stresses to CSV, then visualize everything at once with pandas and matplotlib. This way you immediately see how results change with parameters. Combined with DOE (Design of Experiments), it's a gateway to optimization.
Troubleshooting
When a batch job fails, how do I figure out what went wrong? Unlike the GUI, there's no error dialog.
First check the job's standard output and error log files. With SLURM that's job_12345.out and job_12345.err. Then look at solver-specific logs—Abaqus has .msg and .sta files, OpenFOAM has log.simpleFoam. There are three common causes:
- Out of Memory (OOM): Check with
sacct -j <jobid> --format=MaxRSS. Solution: increase nodes or coarsen mesh - Wall Time Exceeded: Check with
sacct -j <jobid> --format=Elapsed,Timelimit. Solution: increase time limit or use restart - License Shortage: Solver license tokens all in use, job can't start. Check with
lmstat -a
When licenses run out, does the job fail or does it wait in the queue?
It depends on solver and scheduler config. Abaqus fails immediately if licenses aren't available. So if you submit many jobs at once and licenses run out, most fail. To handle this, you can check license availability before submission in the job script, or use SLURM's license resource management. With open-source solvers like OpenFOAM, you have no license worries at all.
Related Terms
- HPC (High Performance Computing): Large-scale parallel computing environment. Cluster-type HPC is mainstream in CAE
- MPI (Message Passing Interface): Standard inter-process communication library for parallel computing
- DOE (Design of Experiments): Experimental design methodology. Used for efficient parametric study planning
- Job Array: Scheduler feature to run the same script simultaneously with different parameters
- Restart Analysis: Feature to resume interrupted jobs from a checkpoint
Accurate understanding of CAE terminology is the foundation of team communication. — Project NovaSolver also supports practical learner needs.
Tell us about challenges you face with batch execution
Project NovaSolver aims to solve practical challenges CAE engineers encounter daily—setup complexity, computational costs, result interpretation. Your hands-on experience drives better tool development.
Contact Us (Coming Soon)Related Topics
Detail
Error