FANDOM


Overall GoalsEdit

In general, we want users to be able to run normal, native, and offload jobs without having to manually copy over files or libraries. The individual components are:

  1. Normal Jobs. Get normal (non-MIC) jobs to run normally on MIC-enabled nodes.
  2. Native Jobs. Native (ssh into MIC and run there) jobs should run.
  3. Offload Jobs. Jobs should be able to offload (using pragmas) jobs to the MICs from the CPUs.
  4. File System. Surface some part of the file system in a way that is clear and intuitive to users.
  5. Build Environment. MIC programs can only be compiled on MIC-enabled computes, so we need to make this as simple as possible.
  6. Job Start End. MIC jobs take several minutes to start because the MICs are booted as part of the prologue script. Needs streamlining.
  7. Documentation. Need to document the process for compiling and running various kinds of jobs.

Details on the progress of each goal are below.

Normal JobsEdit

Goal: Get normal (non-MIC) jobs to run normally on MIC-enabled nodes.

Status: Incomplete. See below for more details.

Known Issues:

  1. MPI Jobs Hang

Other To Do Items:

  • Run a variety of common software programs (Gaussian, Abaqus, etc) to ensure that they run correctly


Native JobsEdit

Goal: Get jobs to run directly on MICs.

Status: Complete. Seems to work - here's a simple example from Jeffers' and Reinders' book on MIC programming. The program is run using KMP_AFFINITY=compact, so threads minimize the number of MIC cores that they run on (up to four per core). We get half performance for one thread but near-peak performance for two, no improvement for four, and double performance for eight. All of this is expected.

 ~/mic/jeffers $ hostname
 br001-mic0
 ~/mic/jeffers $ OMP_NUM_THREADS=1 ./helloflops3
 Initializing
 Starting Compute
 Using 1 threads...
 Gflops =     25.600, Secs =      1.531, GFlops per sec =     16.726
 ~/mic/jeffers $ OMP_NUM_THREADS=2 ./helloflops3
 Initializing
 Starting Compute
 Using 2 threads...
 Gflops =     51.200, Secs =      1.531, GFlops per sec =     33.436
 ~/mic/jeffers $ OMP_NUM_THREADS=4 ./helloflops3
 Initializing
 Starting Compute
 Using 4 threads...
 Gflops =    102.400, Secs =      3.072, GFlops per sec =     33.329
 ~/mic/jeffers $ OMP_NUM_THREADS=8 ./helloflops3
 Initializing
 Starting Compute
 Using 8 threads...
 Gflops =    204.800, Secs =      3.076, GFlops per sec =     66.584

Here's a near-peak run on the full MIC:

 ~/mic/jeffers $ OMP_NUM_THREADS=118 KMP_AFFINITY=scatter ./helloflops3
 Initializing
 Starting Compute
 Using 118 threads...
 Gflops =   3020.800, Secs =      1.691, GFlops per sec =   1786.572

Known Issues:

  1. MIC Libraries (Resolved)

Offload JobsEdit

Goal: Get jobs on host to successfully offload work to MICs.

Status: Works, but subpar performance. See known issues below for details.

  • On BR, MKL Automatic Offloading works:
 [jkrometi@br001 mic]$ MKL_MIC_ENABLE=1 OFFLOAD_REPORT=2 ./matmul.host -s 8192 -t 64
 ./matmul.host
 [MKL] [MIC --] [AO Function]    DGEMM
 [MKL] [MIC --] [AO DGEMM Workdivision]  0.24 0.38 0.38
 [MKL] [MIC 00] [AO DGEMM CPU Time]      6.302579 seconds
 [MKL] [MIC 00] [AO DGEMM MIC Time]      1.185316 seconds
 [MKL] [MIC 00] [AO DGEMM CPU->MIC Data] 749207552 bytes
 [MKL] [MIC 00] [AO DGEMM MIC->CPU Data] 849346560 bytes
 [MKL] [MIC 01] [AO DGEMM CPU Time]      6.302579 seconds
 [MKL] [MIC 01] [AO DGEMM MIC Time]      1.236653 seconds
 [MKL] [MIC 01] [AO DGEMM CPU->MIC Data] 749207552 bytes
 [MKL] [MIC 01] [AO DGEMM MIC->CPU Data] 849346560 bytes
 (etc etc)
  • Manual offloading (i.e., using pragmas) appears to work. Setting MIC_OMP_NUM_THREADS (along with MIC_ENV_PREFIX, which is set by the MIC module as of 10/16/13) controls the number of threads to be used on the MIC.
 [jkrometi@br007 omp]$ module list
 Currently Loaded Modules:
   1) intel/13.1    2) mvapich2/1.9a2    3) mkl/11    4) mic/1.0
 [jkrometi@br007 omp]$ export MIC_ENV_PREFIX=MIC
 [jkrometi@br007 omp]$ export MIC_OMP_NUM_THREADS=2
 [jkrometi@br007 omp]$ ./omphw.offload
 Hello World from thread = 0
 Number of threads = 2
 Hello World from thread = 1
 [jkrometi@br007 omp]$ export MIC_OMP_NUM_THREADS=4
 [jkrometi@br007 omp]$ ./omphw.offload
 Hello World from thread = 0
 Number of threads = 4
 Hello World from thread = 1
 Hello World from thread = 2
 Hello World from thread = 3
  • On HB, MKL offloading appears to work with some tweaks (see MIC Module under Known Issues)

See below for more details.

Known Issues:

  1. Offload Performance
  2. MIC Module

File SystemEdit

Goal: Surface some part of the file system in a way that is clear and intuitive to users.

Status: Some internal MIC networking issues - see Known Issues below.

Known Issues:

  1. Dropped Packets

UpdatesEdit

10/16/13: Chris and Justin review the settings on Stampede; they surface the entire Home directory (as well as /opt/apps apps stack). Chris and Justin decide to do the same on BR.

10/21/13: Chris and Brandon surface Home and /opt/apps to the MICs.

Build EnvironmentEdit

Goal: Ensure that a complete and simple build environment exists on the MIC-enabled nodes.

Status: Unclear: Some issues have been resolved but requires more testing and development on the nodes to see what other issues might need to be resolved. Also need to check whether emacs is available on the computes.

Job Start EndEdit

Goal: Streamline the job start/end process. MIC jobs take several minutes to start because the MICs are booted as part of the prologue script. This can be troublesome for interactive jobs, where the user is waiting for access. The reason for this is because another method for cleaning up rogue processes has not been identified. Newer versions of Torque may have a solution for this.

Status: Incomplete. Job start/end has been streamlined but still need a solution for checking for and eliminating rogue processes.

===Updates===

10/17-21/13: Chris and Brandon implement solution to streamline addition and removal of node access at job start and end.

DocumentationEdit

Goal: Document usage of MIC nodes for communication to users.

Status: Drafted. Step-by-step examples for each of the three kinds of MIC usage that we currently support (Native, Offload, MKL Offload) have been drafted here . Steps before completion:

  • All content should be reviewed for typos/mistakes</li></li>
  • Submission lines in the examples should be edited once final decisions are made about how to surface the MICs to users (i.e. are they in a separate queue? flagged a la highmem nodes?)</li></li>
  • Complete submission scripts added to the bottom for each example</li></li>
  • Ad blocker interference detected!


    Wikia is a free-to-use site that makes money from advertising. We have a modified experience for viewers using ad blockers

    Wikia is not accessible if you’ve made further modifications. Remove the custom ad blocker rule(s) and the page will load as expected.