Get from
GitHub
by:
% git clone https://github.com/xcrypt-job/xcrypt
Computational scientists often perform large scale simulations in their research or development such as car body design and drug discovery. For parameter sweeps or optimal parameter searches, such a simulation often forms Plan-Do-Check-Act (PDCA) cycles, that is, iterations of plenty of sequential/parallel job executions with different parameters.
PDCA cycles should be automated. However, pre-existing general script languages, such as Perl or Ruby, are hard for typical computational scientists to use for preparing input files, generating a job script for each job, extracting necessary parts from output files to analyze results, and managing plenty of asynchronously running jobs. Though they can also use GUI-based workflow tools, it is difficult to describe some kind of complicated workflows with them.
Therefore, we are developing a job-level parallel script language named Xcrypt that helps such automation.
The goal of Xcrypt is to give a simple way to computational scientists, who are typically familiar with C or FORTRAN and not familiar with general script languages such as Perl and Ruby, to automate various workflows that consist of plenty of runs of programs and dependencies among them.
Differently from pre-existing workflow tools, Xcrypt is required to be not only simple but also flexible as a programming language; we should be able to implement from simple parameter sweeps to complicated search algorithms using Xcrypt. We realized these requirements by starting with Perl, a general script programming language, and extending it by adding features to release programmers from various annoying tasks such as writing job scripts for batch systems, generating/analyzing a huge number of input/output files, and managing states of asynchronously running jobs.
In addition, we provided a mechanism that enables “Perl wizards” to add various helpful “spells” (e.g., smart search algorithms) as modules in the way that end-users can use them easily.
Due to all of these features, Xcrypt users can run a wide variety of workflows only by writing simple scripts.
We aim to achieve peta/exa-scale computing easily by combination with lower-layer parallelization implemented using OpenMP, MPI, and/or XscalableMP.
use base qw(limit core);
limit::initialize (30);
%template = (
'id' => 'example',
'RANGE0' => [1..5000],
'exe0' => './a.out',
'arg0_0@' => '"input$_[0]"'
'arg0_1@’ => '"output$_[0]"',
'queue' => 'medium',
);
@jobs = prepare (%template);
submit (@jobs);
sync (@jobs);
This is a simple example of an end-user script of Xcrypt. This script submits 5,000 jobs that execute a single program “a.out” with different command line arguments for each job, with limiting simultaneously running jobs up to 30.
As this example, a typical Xcrypt script consists of:
Jobs are defined declaratively as a Perl hash object that contains parameter values as members’ values. Using parameters named RANGEn, we can define a single object for a sequence of jobs. In that case, we can set a different parameter for each job using a parameter name ending with
S
and a string evaluated by Perl interpreter in the environment where $_[n] binds the corresponding value in the RANGEn.
Defined jobs are submitted imperatively by the submit() function. Before submitting, we need to call prepare() to make a working directory and copy necessary files for each submitting job. All the submitted jobs are executed asynchronously; we can wait for the jobs finished by sync().
For limiting the number of simultaneously running jobs, the user script shown above uses the
limit
module, which is implemented as shown below.
package limit;
use NEXT;
use Thread::Semaphore;
my $smph;
sub initialize { $smph = Thread::Semaphore->new($_[0]); }
sub before {
my $self = shift;
$smph->down;
$self->NEXT::before();
}
sub after {
my $self = shift;
$self->NEXT::after();
$smph->up;
}
An Xcrypt module is defined as an extension to the class for job objects named
core.
In the
core
class and its subclasses, methods named
before
and
after
have special meanings; they are invoked asynchronously before submitting a job and after the job finished, respectively.
Due to this mechanism, a wide variety of functionalities can be developed and end-users can use them easily only by writing module names for (multiple) class inheritance. For instance, we have also implemented the modules for dry execution and for allowing management of the order of submitting jobs by declarative description of dependencies among them.
When submit() is invoked, Xcrypt runtime generates a job script for the batch scheduler (e.g., NQS, Torque, LSF, or SGE) based on information in a job object. In order to support a wide variety of batch schedulers, which have different command-line interfaces, specifications for job scripts, and so on each other, Xcrypt provides a mechanism that enable programmers or system administrators define a new batch scheduler by writing a Perl-based configuration script. The following script shows an example of such a script.Each parameter value is allowed to be a string or a function object, which realizes both easiness to write and flexibility for various specifications of batch schedulers.
$jobsched::jobsched_config{"NQS"} = {
qsub_command => "/usr/local/bin/qsub",
qdel_command => "/usr/local/bin/qdel -K",
qstat_command => "/usr/local/bin/qstat",
jobscript_option_queue => '# @$-q ',
jobscript_option_stdout => '# @$-o ',
jobscript_option_stderr => '# @$-e ',
extract_req_id_from_qsub_output => sub {
my (@lines) = @_;
if ($lines[0] =~ /([0-9]*).nqs/) { return $1 ;}
else { return -1; }
},
...
}
Of course, we can use legacy Unix tools such as grep, sed, and awk in order to generate input files and extract data from output files for a huge number of jobs. However, it is not so easy for users who are unfamiliar with regular expressions to generate a large number of FORTRAN namelists that are slightly different each other or extract certain elements from an output file that represents a matrix.
Therefore, Xcrypt provides higher level generation/extraction libraries; we improved usability by specializing them for use in computational science such as modifying a FORTRAN namelist and extracting data from output files by specifying both row and column numbers.