Cylc: The Path To Exascale

Hilary Oliver, Oliver Sanders, Dave Matthews
3rd ENES Workshop on Workflows
13 September 2018
Cylc development is part-funded by ESiWACE

Exascale Workflow

What do we mean by this???

Hilary Oliver
NIWA (NZ)

Exascale Computing

Exascale Workflow?

Exascale Workflow 1

(we've got this covered)

  • even if the primary purpose of these machines is massive exascale models, much of their time will be spent running many smaller models (e.g ensembles in our business) and processing vast amounts of data generated by them.
  • TERMINOLOGY: "suite" == "workflow"
Exascale Workflow 2
workflow automation must be at the heart of utilizing these massive rescources
MO Exascale Scope
(Credit Keir Bovis: Met Office exascale programme scope)

Cylc (Re-)Introduction

The Cylc Workflow Engine

Hilary Oliver
NIWA (NZ)


Cycling workflows:
  • what are they
  • why do we need them?
Then a quick overview of what Cylc is like.
  • domain of applicability
  • workflow definition
  • architecture
  • user interface
  • repeat cycles...
  • ...there's inter-cycle dependence between some task...
  • ...which is technically no different to intra-cycle dependence...
  • ...no need to label cycle points (boxes) as if they have global relevance...
  • ... so you can see this is actually an infinite single workflow that happens to be composed of repeating tasks
animation key
The key to an animation of how cylc manages such an infinite workflow.
Cylc's dynamic cycling mode.

What's cycling needed for?

  • dynamic cycling is not strictly needed for small, short workflows.
  • historically achieved (NWP) with sequential whole cycles.
Catch-up from delays much faster if inter-cycle dependence is explicitly managed.

aside: pipelines

%% This is a comment in mermaid markup graph LR A(process
A) B(process
B) C(process
C) A-->B B-->C classDef A1 fill:#d7d7d7,stroke:slategray,stroke-width:5px; classDef B2 fill:#d7d7d7,stroke:slategray,stroke-width:5px; classDef C3 fill:#d7d7d7,stroke:slategray,stroke-width:5px; class A A1 class B B2 class C C3
%% This is a comment in mermaid markup graph LR A(process
A) B(process
B) C(process
C) A-->B B-->C classDef A1 fill:#fcf,stroke:#b8b,stroke-width:5px; classDef B2 fill:#d7d7d7,stroke:slategray,stroke-width:5px; classDef C3 fill:#d7d7d7,stroke:slategray,stroke-width:5px; class A A1 class B B2 class C C3
%% This is a comment in mermaid markup graph LR A(process
A) B(process
B) C(process
C) A-->B B-->C classDef A1 fill:#cff,stroke:#8bb,stroke-width:5px; classDef B2 fill:#fcf,stroke:#b8b,stroke-width:5px; classDef C3 fill:#d7d7d7,stroke:slategray,stroke-width:5px; class A A1 class B B2 class C C3
%% This is a comment in mermaid markup graph LR A(process
A) B(process
B) C(process
C) A-->B B-->C classDef A1 fill:#ffc,stroke:#bb8,stroke-width:5px; classDef B2 fill:#cff,stroke:#8bb,stroke-width:5px; classDef C3 fill:#fcf,stroke:#b8b,stroke-width:5px; class A A1 class B B2 class C C3
%% This is a comment in mermaid markup graph LR A(process
A) B(process
B) C(process
C) A-->B B-->C classDef A1 fill:#d7d7d7,stroke:slategray,stroke-width:5px; classDef B2 fill:#ffc,stroke:#bb8,stroke-width:5px; classDef C3 fill:#cff,stroke:#8bb,stroke-width:5px; class A A1 class B B2 class C C3
%% This is a comment in mermaid markup graph LR A(process
A) B(process
B) C(process
C) A-->B B-->C classDef A1 fill:#d7d7d7,stroke:slategray,stroke-width:5px; classDef B2 fill:#d7d7d7,stroke:slategray,stroke-width:5px; classDef C3 fill:#ffc,stroke:#8bb,stroke-width:5px; class A A1 class B B2 class C C3
%% This is a comment in mermaid markup graph LR A(process
A) B(process
B) C(process
C) A-->B B-->C classDef A1 fill:#d7d7d7,stroke:slategray,stroke-width:5px; classDef B2 fill:#d7d7d7,stroke:slategray,stroke-width:5px; classDef C3 fill:#d7d7d7,stroke:slategray,stroke-width:5px; class A A1 class B B2 class C C3

[scheduling]
   [[dependencies]]
      [[[P1]]]
         graph = """A => B => C
            A[-P1] => A
            B[-P1] => B
            C[-P1] => C"""
      

[scheduling]
   [[dependencies]]
      [[[P1]]]
         graph = """A => B => C"""
      

What Cylc Is Like

Not currently well suited to very large numbers of small quick-running jobs (later...)

A workflow is primarily a configuration of the workflow engine, and a config file is easier for most users and most use cases than programming to a Python API. However, ...!

ETC.: event handling, checkpointing, extreme restart, ...

distributed architecture: ad hoc server per workflow
Plus comprehensive CLI.

# Hello World! Plus
[scheduling]
   [[dependencies]]
       graph = "hello => farewell & goodbye"
      

# Hello World! Plus
[scheduling]
   [[dependencies]]
       graph = "hello => farewell & goodbye"
[runtime]
   [[hello]]
       script = echo "Hello World!"
      

# Hello World! Plus
[scheduling]
   [[dependencies]]
       graph = "hello => farewell & goodbye"
[runtime]
   [[hello]]
       script = echo "Hello World!"
       [[[environment]]]
           # ...
       [[[remote]]]
           host = hpc1.niwa.co.nz
       [[[job]]]
           batch system = PBS
           # ...
       # ...
   # ...
      
Plus:
  • [runtime] is an inheritance heirarchy for efficient sharing of common settings

 #!Jinja2
 {% set SAY_BYE = false %}
 [scheduling]
    [[dependencies]]
       graph = """hello
 {% if SAY_BYE %} 
         => goodbye & farewell
 {% endif %}
               """
[runtime]
    # ...
      
[[dependencies]]
   graph = "pre => sim<M> => post<M> => done"
   # with M = 1..5
[[dependencies]]
   graph = "prep => init<R> => sim<R,M> => post<R,M> => close<R> => done"
   # with M = a,b,c; and R = 1..3

[cylc]
   cycle point format = %Y-%m
[scheduling]
   initial cycle point = 2010-01
   [[dependencies]]
      [[[R1]]]
         graph = "prep => foo"
        

[cylc]
   cycle point format = %Y-%m
[scheduling]
   initial cycle point = 2010-01
   [[dependencies]]
      [[[R1]]]
         graph = "prep => foo"
      [[[P1M]]]
         graph = "foo[-P1M] => foo => bar & baz => qux"
        

[cylc]
   cycle point format = %Y-%m
[scheduling]
   initial cycle point = 2010-01
   [[dependencies]]
      [[[R1]]]
         graph = "prep => foo"
      [[[P1M]]]
         graph = "foo[-P1M] => foo => bar & baz => qux"
      [[[R2/^+P2M/P1M]]]
         graph = "baz & qux[-P2M] => boo"
        

Exascale Challenges

Present capabilities and challenges

Hilary Oliver
NIWA (NZ)

(Successes so far, before exa-scalability...)

This is a technical necessity, to survive into the exascale era!

Immediate: Web GUIs, Python 3

  • Python 2 is near end of life
  • PyGTK GUIs are near obsolete
  • web GUI work starting
    • need new architecture!
Exascale Workflow 2
THEN: scaling; config API; visualization; light sub-suites...
  • 3 cycles of a small deterministic regional NWP suite. Obs processing tasks in yellow. Atmospheric model red, plus DA and other pre and post-processing tasks A few tasks ... generates thousands of products from a few large model output files.
  • ... ~45 tasks (3 cycles)
  • ... as a 10-member ensemble, ~450 tasks (3 cycles)
  • ... as a 30-member ensemble, ~1300 tasks (3 cycles)
Cylc currently scales to 10ks of tasks (with caveats) BUT a problem has emerged: incremental forecast-time ensemble verification and product-generation:
(caveats: amount of config per task; amount of runahead; server resources; GUI, esp. graph view)
  • Met Office IMPROVER: 100k jobs per cycle.
Handling this kind of suite better leads to applicability in other domains: massive numbers of small tasks PLUS complex workflow and production capabilities.
1 member, 2 files/fc-hour for 10 fc-hours, 1 tasks/file = 1 x 2 x 10 x 1 = 20 jobs per cycle ...

30 members, 7 files/fc-hour for 10 fc-days, 5 tasks/file ...
30 x 7 x (24x10) x 5 = 250,000 jobs per cycle!
5 tasks per file: i.e. each of the (expanded) pink nodes is a sub-workflow

Problem is the parameterized "nested cycle" over forecast hour. To handle this NOW, we need:

  • NOTE: current HPC infrastructure can't handle this yet either
  • these tasks totally dominate the workflow
  • cylc can have multiple "top-level" dynamic cycling sequences in one suite, but nested cycles still have to be parameterized

BUT actually it's worse than this!

Real ensemble post-processing systems may:
  • not bunching friendly: complex dependencies within the bunch

Plans to address these issues:

(now over to Oliver...)

Meeting Exascale Challenges

An outline of some potential pathways for future development

Oliver Sanders
Met Office (UK)

New Cylc GUI

Combine gscan & gcylc

View N Edges To Selected Node

Alternative Views

The Modularity Problem

It's hard to incorporate a module into a workflow

Ideally we would write dependencies to/from the module itself rather than the tasks within it

Workflows could be represented as tasks

foo => baz => module<param> => pub

Python API

Python > Jinja2

Illustrative examples Python could provide Cylc:

bar = cylc.Task('myscript')

cylc.run(
    foo >> bar >> baz
)

Use Python data structures as Cylc parameters:

animal = cylc.Parameter({
    'cat': {'lives': 9, 'memory': 2},
    'dog': {'lives': 1, 'memory': 10}
})

baz = cylc.TaskArray('run-baz',
    args=('--animal', animal),
    env={'N_LIVES': animal['lives']}),
    directives={'--mem': animal['memory']}
)

Use Python to write Cylc modules:

import my_component
graph = cylc.graph(
    foo >> bar >> my_component >> baz,
    my_component.pub >> qux
)

Alternative Scheduling Paradigms

Abstract dependency
foo => bar => baz
Data dependency
foo:
  out: a
bar:
  in: a
  out: b
baz:
  in: a, b
  out: c

Scaling With Dependencies

Cylc can currently scale to tens of thousands of tasks and dependencies

But there are limitations, for example:

Many to many triggers result in NxM dependencies

Cylc should be able to represent this as a single dependency

The scheduling algorithm currently iterates over a "pool" of tasks.

We plan to re-write the scheduler using an event driven approach.

This should make Cylc more efficient and flexible model solving problems like this.

Kernel - Shell Architecture

Working towards a leaner Cylc we plan to separate the codebase into a Kernel - Shell model

Shell
Kernel
User Commands Scheduler
Suite Configuration Job Submission

Batching Jobs

Combining multiple jobs to run in a single job submission.

Arbitrary Batching

A lightweight Cylc kernel could be used to execute a workflow within a job submission.

  • The same Cylc scheduling algorithm
  • No need for job submission
  • Different approach to log / output files

Future Challenges

Introduction to Rose

Storage and configuration of complex Cylc workflows

David Matthews
Met Office (UK)

History

Development started late 2011 around the time we chose cylc as our workflow tool

Motivation

Tools which have moved / will move into Cylc

Tools which will remain in Rose

rose edit / task-run - Application configuration

rosie - suite storage and discovery

(plus a fewer smaller, more specialised utilities)

Application configuration: The problem

Tasks may be complex to define, requiring

Rose applications provide a convenient way to encapsulate all of this configuration, storing it all in one place to make it easier to handle and maintain.

In suite.rc:

[runtime]
    [[hello]]
        script = rose task-run

This will run an app config named "hello" (by default)

An app config is a directory containing a file named rose-app.conf

Simple ini style format (similar to Cylc suite.rc)

Describes:

rose-app.conf example

[command]
default=echo "Hello ${WORLD}!"
[env]
WORLD=Earth
[file:input.nml]
source=namelist:latlon
[namelist:latlon]
latitude=52.168
longitude=5.744

App config directory may also contain other directories:

Metadata - What's it used for?

Documenting settings

Performing automatic checking (e.g. type checking)

Formatting the rose config-edit GUI

rose-app.conf:

[env]
WORLD=Earth

meta/rose-meta.conf:

[env=WORLD]
description=The name of the world to say hello to
values=Mercury, Venus, Earth, Mars, Jupiter,
      =Saturn, Uranus, Neptune

Example: if we were to change the value of WORLD to Pluto

$ rose macro -V
Value Pluto not in allowed values
['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter',
'Saturn', 'Uranus', 'Neptune']
        

Metadata location

metadata can be local to the app or centrally installed (e.g. the metadata for a particular model release)

rose-app.conf:

meta=um-atmos/vn11.1

rose macro - usage

Optional configs

Configuration files which can add to or overwrite the default rose-app.conf configuration.
Used for minor variants, e.g.

Why Use Rose App Configs?

Encapsulate required inputs and environment required in a simple human-readable configuration.

Settings can have associated metadata & macros.

Edit using a text editor or with rose config-edit GUI.

Optional configurations avoid the need to duplicate app configs for minor variants

rosie - suite storage & discovery

Suites are version controlled using Subversion

Each Rosie suite is assigned a unique name.
e.g.: mo-aa001

Each suite has a rose-suite.info file which provides information about the suite. e.g.

title=PS40 high resolution trial
project=global-nwp-trial
description=Copy of u-af326/trunk@23458
owner=davidmatthews
access-list=oliversanders

Particular projects can define their own metadata requirements which are enforced.

rosie info

rosie features

Use from the command line or use the GUI

rosie info

Why Consider Using Rosie?

You want to take advantage of the suite storage, identification and discovery features

And

You're comfortable using Subversion for version control

Rose: Roadmap

Complete migration of Rose functionality into Cylc.

Replace current GTK based GUIs with plugins to Cylc web GUI.

Port to Python 3.