Endpoint Management

Chiltepin provides command-line tools for managing Globus Compute endpoint lifecycles. These tools are light wrappers around the existing Globus Compute CLI, adding convenience features that make it easier to manage endpoints for use with Chiltepin. This page describes how to configure, start, stop, and manage endpoints.

Overview

Globus Compute endpoints enable you to run tasks on remote compute resources. Chiltepin’s CLI simplifies endpoint management by:

  • Managing authentication flows for all required Globus services at the same time

  • Automatically configuring endpoints with appropriate templates matching Chiltepin’s configuration options

  • Automatically starting endpoints in the background

The chiltepin Command

All endpoint management is done through the chiltepin command-line interface.

$ chiltepin --help
usage: chiltepin [-h] {login,logout,endpoint} ...

options:
  -h, --help            show this help message and exit

chiltepin commands:
  {login,logout,endpoint}
                        Chiltepin commands
    login               login to the Chiltepin App
    logout              logout of the Chiltepin App
    endpoint            endpoint commands

Authentication

Before using Globus Compute endpoints, you must authenticate with Globus services.

Login

$ chiltepin login

This will:

  1. Open a browser window (or present a URL) for Globus authentication

  2. Save authentication tokens for future use

Note

Login credentials are cached, so you typically only need to login once per system.

Logout

$ chiltepin logout

This revokes all authentication tokens and clears cached credentials.

Endpoint Commands

All endpoint operations use the chiltepin endpoint subcommand.

$ chiltepin endpoint --help
usage: chiltepin endpoint [-h] [-c CONFIG_DIR]
                         {configure,list,start,stop,delete} ...

Configure an Endpoint

Create and configure a new endpoint:

$ chiltepin endpoint configure my-endpoint

This will:

  1. Create a new endpoint configuration directory (~/.globus_compute/my-endpoint/)

  2. Generate a default endpoint configuration with a template compatible with Chiltepin’s configuration options

  3. Set the endpoint display name

  4. Enable debug logging

  5. Configure the system PATH required for the endpoint environment

What Gets Created

After configuration, you’ll have:

~/.globus_compute/my-endpoint/
├── config.yaml                      # Main endpoint configuration
├── user_config_template.yaml.j2     # Jinja2 template for user configs
└── user_environment.yaml            # PATH configuration for the endpoint process

The config.yaml includes:

  • Display name matching your endpoint name

  • Debug mode enabled

The user_config_template.yaml.j2 includes:

  • User endpoint configuration template with Jinja2 variables for Chiltepin configuration options

The user_environment.yaml includes:

  • System PATH settings required for the endpoint process to find necessary executables

Custom Configuration Directory

By default, endpoints are stored in ~/.globus_compute/. You can specify a custom location:

$ chiltepin endpoint -c /path/to/config configure my-endpoint

List Endpoints

View all configured endpoints:

$ chiltepin endpoint list

With custom configuration directory:

$ chiltepin endpoint -c /path/to/config list

Output format:

endpoint-name    endpoint-uuid                         status
my-endpoint      12345678-1234-1234-1234-123456789abc  Running
test-endpoint    87654321-4321-4321-4321-cba987654321  Stopped

Start an Endpoint

Start a configured endpoint:

$ chiltepin endpoint start my-endpoint

With custom configuration directory:

$ chiltepin endpoint -c /path/to/config start my-endpoint

The endpoint will:

  1. Register with Globus Compute services (first time only)

  2. Start accepting tasks

  3. Run in the background

  4. Automatically scale resources based on task demand

What Happens at Startup

  • Endpoint UUID is registered with Globus Compute (if first start)

  • Background daemon process is started

  • Endpoint begins polling for tasks

  • User endpoint configurations are validated

Tip

After starting an endpoint for the first time, note its UUID from chiltepin endpoint list. You’ll need this UUID to reference the endpoint in your Chiltepin configuration files.

Stop an Endpoint

Stop a running endpoint:

$ chiltepin endpoint stop my-endpoint

With custom configuration directory:

$ chiltepin endpoint -c /path/to/config stop my-endpoint

This will:

  1. Stop accepting new tasks

  2. Wait for running tasks to complete (graceful shutdown)

  3. Terminate the endpoint daemon

Delete an Endpoint

Remove an endpoint configuration:

$ chiltepin endpoint delete my-endpoint

With custom configuration directory:

$ chiltepin endpoint -c /path/to/config delete my-endpoint

This will:

  1. Delete the endpoint configuration directory

  2. Remove all associated files

  3. Deregister the endpoint UUID from Globus Compute services

Warning

This operation is permanent and cannot be undone.

Note

Deleting an endpoint does not stop it if it’s currently running. You must stop the endpoint first before deleting it.

Endpoint Lifecycle

Typical Workflow

  1. Configure - Set up a new endpoint

    chiltepin endpoint configure my-hpc-endpoint
    
  2. Start - Launch the endpoint

    chiltepin endpoint start my-hpc-endpoint
    
  3. Get UUID - Retrieve the endpoint UUID

    chiltepin endpoint list
    # Note the UUID for my-hpc-endpoint
    
  4. Use in Config - Add to your Chiltepin YAML configuration

    my-executor:
      endpoint: "12345678-1234-1234-1234-123456789abc"
      mpi: True
      provider: "slurm"
      # ... additional options
    
  5. Submit Tasks - Run your workflow

  6. Stop - When finished (or let it auto-shutdown)

    chiltepin endpoint stop my-hpc-endpoint
    

Endpoint Auto-Scaling

Globus Compute endpoints automatically scale based on task demand:

  • Idle Shutdown: Endpoints automatically stop after extended idle periods

  • On-Demand Start: Endpoints can be automatically restarted when new tasks arrive

  • Resource Scaling: Blocks (resource pools) scale between min_blocks and max_blocks

The endpoint configuration includes:

  • idle_heartbeats_soft: 120 - Shutdown after ~60 minutes of inactivity (at 30s/heartbeat)

  • idle_heartbeats_hard: 5760 - Force shutdown after ~48 hours even if tasks are stuck

User Endpoint Configuration

When you use an endpoint UUID in your Chiltepin configuration, the configuration options define the properties of a resource pool to be allocated and used by that endpoint. Multiple resource pools can be configured for the same endpoint UUID using different options for different purposes. This allows you to have a single endpoint for a particular HPC system. That single endpoint can run different types of tasks with different resource requirements.

Resource pool properties include:

Your configuration options define the properties of a resource pool to be allocated and used by that endpoint.

Chiltepin Config:

my-executor-1:
  endpoint: "uuid-here"
  mpi: True
  max_mpi_apps: 4
  provider: "slurm"
  partition: "gpu"
  cores_per_node: 128
  nodes_per_block: 8
my-executor-2:
  endpoint: "uuid-here"  # Same endpoint UUID as my-executor-1
  provider: "slurm"
  partition: "service"
  cores_per_node: 1
  nodes_per_block: 1

Troubleshooting

Check Endpoint Status

$ chiltepin endpoint list

If an endpoint shows as “Unknown”, it may not be running or may have crashed.

View Endpoint Logs

Logs are stored in the endpoint directory:

$ ls -la ~/.globus_compute/my-endpoint/

# View the main endpoint log
$ tail -f ~/.globus_compute/my-endpoint/endpoint.log

Authentication Issues

If you encounter authentication errors:

$ chiltepin logout
$ chiltepin login

Tasks Not Running

Verify:

  1. Endpoint is running: chiltepin endpoint list

  2. Correct endpoint UUID in your configuration

  3. Resource limits (walltime, nodes), endpoint resource pool job may be pending in the scheduler

  4. Check endpoint logs for error messages

Python API

You can also manage endpoints programmatically:

import chiltepin.endpoint as endpoint

# Login
clients = endpoint.login()
compute_client = clients["compute"]

# Configure endpoint
endpoint.configure("my-endpoint")

# Start endpoint
endpoint.start("my-endpoint")

# List endpoints
ep_info = endpoint.show()
for name, props in ep_info.items():
    print(f"{name}: {props['id']}")

# Stop endpoint
endpoint.stop("my-endpoint")

# Logout
endpoint.logout()

See Also