Quick Start

This guide walks you through a complete Chiltepin workflow, from setting up an endpoint to submitting tasks.

Overview

Chiltepin is a collection of tools for implementing distributed exascale numerical weather prediction workflows using Parsl and Globus Compute.

Warning

This collection of resources is not intended for use in operational production environments, and is for research purposes only.

Prerequisites

Before starting, ensure you have:

Installed Chiltepin (see Installation)
Access to an HPC system (or use local execution for testing)
A Globus account and a web browser for Globus authentication

Complete Workflow Example

This example demonstrates the full workflow: configure an endpoint, start it, and submit tasks.

Step 1: Authenticate

First, log in to Globus services. This should be done on the machine where you want to run tasks:

$ chiltepin login

This opens a browser for authentication or, if one is not available, provides a URL to complete the authentication manually. Follow the prompts to authorize Chiltepin.

Step 2: Configure an Endpoint

Create a new Globus Compute endpoint to which you will submit tasks. This should be done on the machine where you want to run tasks:

$ chiltepin endpoint configure my-endpoint

This creates the endpoint configuration in ~/.globus_compute/my-endpoint/.

Step 3: Start the Endpoint

Launch the endpoint:

$ chiltepin endpoint start my-endpoint

The endpoint will register with Globus Compute and begin accepting tasks.

Step 4: Get the Endpoint UUID

Retrieve your endpoint’s UUID:

$ chiltepin endpoint list

Example output:

my-endpoint  a1b2c3d4-1234-5678-90ab-cdef12345678  Running

Note the UUID (a1b2c3d4-1234-5678-90ab-cdef12345678) for the next step.

Step 5: Create a Configuration File

Create my_config.yaml with your endpoint UUID:

# Local resource for small tasks
local:
  provider: "localhost"
  init_blocks: 1
  max_blocks: 1

# Remote endpoint for HPC tasks
remote:
  endpoint: "a1b2c3d4-1234-5678-90ab-cdef12345678"  # Use your UUID
  provider: "slurm"
  cores_per_node: 4
  nodes_per_block: 1
  partition: "compute"
  account: "myproject"
  walltime: "00:30:00"
  environment:
    - "module load python/3.11"

Replace the endpoint UUID with your actual UUID from Step 4.

Step 6: Write Your Workflow

Create my_workflow.py:

import parsl
import chiltepin.configure
from chiltepin.tasks import bash_task, python_task

# Define tasks
@python_task
def hello_local():
    import platform
    return f"Hello from {platform.node()}"

@bash_task
def hello_remote():
    return "hostname"

@python_task
def compute_task(n):
    """Simple computation task"""
    result = sum(i**2 for i in range(n))
    return result

if __name__ == "__main__":
    # Load configuration
    config_dict = chiltepin.configure.parse_file("my_config.yaml")
    parsl_config = chiltepin.configure.load(
        config_dict,
        include=["local", "remote"],
        run_dir="./runinfo"
    )

    with parsl.load(parsl_config):
        # Run local task on "local" resource
        local_future = hello_local(executor="local")

        # Run remote bash task on "remote" resource (returns exit code: 0 = success)
        remote_future = hello_remote(executor="remote")

        # Run multiple compute tasks on "remote" resource
        futures = [compute_task(i, executor="remote") for i in range(1, 5)]

        # Get the results
        print(f"Local: {local_future.result()}")
        print(f"Remote exit code: {remote_future.result()}")
        print(f"Computation results: {[f.result() for f in futures]}")

        print("All tasks completed!")

Step 7: Run Your Workflow

Execute the workflow:

$ python my_workflow.py

Expected output:

Local: Hello from my-laptop.local
Remote exit code: 0
Computation results: [0, 1, 5, 14]
All tasks completed!

Step 8: Stop the Endpoint

When finished:

$ chiltepin endpoint stop my-endpoint

Note

Endpoints automatically scale down resources after idle periods, so manual stopping is optional.

Local-Only Quickstart

For testing without an HPC system, use local execution:

Configuration File (`local_config.yaml`)

local:
  provider: "localhost"
  init_blocks: 1
  max_blocks: 1

Simple Workflow (`simple_workflow.py`)

import parsl
import chiltepin.configure
from chiltepin.tasks import bash_task, python_task

# Define tasks
@python_task
def multiply(a, b):
    return a * b

@bash_task
def system_info():
    return "echo 'Task completed successfully'"

if __name__ == "__main__":
    # Load configuration
    config_dict = chiltepin.configure.parse_file("local_config.yaml")
    parsl_config = chiltepin.configure.load(config_dict, run_dir="./runinfo")

    with parsl.load(parsl_config):
        result = multiply(6, 7, executor="local").result()
        print(f"6 * 7 = {result}")

        exit_code = system_info(executor="local").result()
        print(f"Bash task exit code: {exit_code}")

Run it:

$ python simple_workflow.py

Working with MPI Tasks

Chiltepin supports MPI applications on HPC systems:

Configuration (`mpi_config.yaml`)

mpi-resource-name:
  endpoint: "your-endpoint-uuid"
  mpi: True
  max_mpi_apps: 2
  mpi_launcher: "srun"
  provider: "slurm"
  cores_per_node: 128
  nodes_per_block: 4
  partition: "compute"
  account: "myproject"
  walltime: "01:00:00"
  environment:
    - "module load openmpi/4.1"
    - "export MPIF90=$MPIF90"

MPI Workflow

import parsl
import chiltepin.configure
from chiltepin.tasks import bash_task

@bash_task
def compile_mpi():
    return "$MPIF90 -o mpi_app mpi_app.f90"

@bash_task
def run_mpi(ranks=4):
    return f"srun -n {ranks} ./mpi_app"

if __name__ == "__main__":
    config_dict = chiltepin.configure.parse_file("mpi_config.yaml")
    parsl_config = chiltepin.configure.load(config_dict, run_dir="./runinfo")

    with parsl.load(parsl_config):
        # Compile MPI application on the MPI resource (returns exit code)
        compile_result = compile_mpi(executor="mpi-resource-name").result()
        print(f"Compilation exit code: {compile_result}")

        # Run with different rank counts on the MPI resource
        results = []
        for ranks in [4, 8, 16]:
            future = run_mpi(ranks, executor="mpi-resource-name")
            results.append(future.result())

        for i, result in enumerate(results, 1):
            print(f"Run {i} exit code: {result}")

Key Concepts

Resources

Resources define where and how tasks run:

Local: Runs on the current machine
HPC: Submits jobs to schedulers (Slurm, PBS Pro)
Globus Compute: Runs on remote endpoints

See Configuration for detailed resource configuration options.

Task Decorators

Chiltepin provides three task decorators to define workflow tasks:

@python_task: Execute Python functions
@bash_task: Execute shell commands (returns exit code)
@join_task: Coordinate multiple tasks without blocking

When calling a task, use the executor parameter to specify which resource to use:

@python_task
def my_task():
    return "result"

# Specify which resource to use
result = my_task(executor="compute").result()

The executor value must match a resource name from your configuration file.

Configuration Loading

The include parameter selects specific resources to load from the configuration:

# Load only specific resources
parsl_config = chiltepin.configure.load(
    config_dict,
    include=["local", "compute"],  # Only these resources
    run_dir="./runinfo"
)

If include is omitted, all resources in the configuration are loaded.

Directory Structure

After running workflows, you’ll see:

.
├── my_config.yaml              # Configuration file
├── my_workflow.py              # Workflow script
└── runinfo/                    # Parsl runtime directory
    ├── 000/                     # Run directory
    │   ├── local/               # Local resource files
    │   ├── remote/              # Remote resource files
    │   └── submit_scripts/      # Job submission scripts
    └── parsl.log                # Parsl log file

The runinfo directory contains execution logs, job scripts, and task outputs.

Troubleshooting

Tasks Not Running

Verify endpoint is running: chiltepin endpoint list
Check you’re using the correct endpoint UUID
Review logs in runinfo/ directory
Check endpoint logs: ~/.globus_compute/my-endpoint/endpoint.log

Authentication Expired

$ chiltepin logout
$ chiltepin login

Configuration Errors

Validate your YAML syntax:

import yaml
with open("my_config.yaml") as f:
    config = yaml.safe_load(f)
    print(config)

Resource Limits

If jobs fail to start:

Check partition/queue names
Verify account/project is valid
Confirm node/core requests are within limits
Machine may be busy and resource pool job may be pending or may be full

Next Steps

Comprehensive task documentation: Tasks
Detailed configuration options: Configuration
Endpoint management: Endpoint Management
Run the test suite: Testing
Set up Docker environment: Docker Container
Explore the API: API Reference