Turns Engine (PAR Multi-Tasking Option)
This Xilinx Development System option allows you to use multiple machines (nodes) that are networked together for a multi-run PAR job, significantly reducing the total amount of time to completion. You can specify multi-tasking from the UNIX command line.
Turns Engine Overview
Before the Turns Engine was developed for the Xilinx Development System, PAR could only run multiple jobs in a linear way. The total time required to complete PAR was equal to the sum of the times that it took for each of the PAR jobs to run. This is illustrated by the following PAR command.
par -l 5 -n 10 -i 10 -c 1 mydesign.ncd output.dir
The above tells PAR to run 10 place and route passes (-n 10) at effort level 5 (-l 5), a maximum of 10 router passes (-i 10), and one cost-based cleanup pass (c 1). It runs each of the 10 jobs consecutively, generating an output NCD file for each job, i.e., output.dir/5_5_1.ncd, output.dir/5_5_2.ncd, etc. If each job takes approximately one hour, then the run takes approximately 10 hours.
Suppose, however, that you have five nodes available. The Turns Engine allows you to use all five nodes at the same time, dramatically reducing the time required for all ten jobs. To do this you must first generate a file containing a list of the node names, one per line as in the following example.
NOTEA pound sign (#) in the example indicates a comment.
| # NODE names
jupiter #Fred's node
mars #Harry's node
mercury #Betty's node
neptune #Pam's node
pluto #Mickey's node
Now run the job from the command line as follows.
par -m nodefile_name -l 5 -n 10 -i 10 -c 1 mydesign.ncd output.dir
nodefile_name is the name of the node file you created.
This runs the following jobs on the nodes specified.
jupiter: par -l 5 -i 10 -c 1 mydesign.ncd output.dir/5_5_1.ncd
mars: par -l 5 -i 10 -c 1 mydesign.ncd output.dir/5_5_2.ncd
mercury: par -l 5 -i 10 -c 1 mydesign.ncd output.dir/5_5_3.ncd
neptune: par -l 5 -i 10 -c 1 mydesign.ncd output.dir/5_5_4.ncd
pluto: par -l 5 -i 10 -c 1 mydesign.ncd output.dir/5_5_5.ncd
As the jobs finish, the remaining jobs are started on the nodes until all 10 jobs are complete. Since each job takes approximately one hour, all 10 jobs complete in approximately two hours.
NOTEYou cannot judge the relative benefits of multiple placements by running the Turns Engine with options that generate multiple placements but do not route any of the placed designs (the -r PAR option specifies no routing). The design score you receive is the same for each placement. To get some indication of the quality of the placed designs, run at least one routing iteration (-i 1) on each placed design.
|
Turns Engine Input Files
The following are the input files to the Turns Engine.
- NCD File - A mapped design.
- Nodelist file - A user-created ASCII file listing workstation names. A sample nodelist file is shown below.
# This is a comment
# Note: machines are accessed by Turns Engine
# from top to bottom
# Sparc 20 machines running Solaris
kirk
spock
mccoy
krusher
janeway
picard
# Sparc 10 machines running SunOS
michael
jermaine
marlon
tito
jackie
# HPs running HP-UX
william
george
ronald
jimmy
gerald
Turns Engine NCD Output File
The naming convention for the NCD file, which may contain placement and routing information in varying degrees of completion, is placer_level_router_level_table.ncd. If any of these elements are not used, they are replaced by an 'x'. For example, for the first design file being run with the options -n 5 -t 16 -rl 4 -pl 2, the NCD output file name would be 2_4_16.ncd. The second file would be named 2_4_17.ncd. For the first design file being run with the options -n 5 -t 16 -r -pl 2, the NCD output file name would be 2_x_16.ncd. The second file would be named 2_x_17.ncd.
Homogeneous and Heterogeneous Networks
The Turns Engine can run on the following networks.
- Homogenous networks - All SunOS, all Solaris, or all HP-UX.
- Heterogeneous networks - A mix of SunOS, Solaris, and HP-UX. You must have the Xilinx software and a license for each platform on which you intend to run. See the sample .cshrc file below to set up the environment variables. This is possible because the nodes read their environment variables from the .cshrc file; they do not receive them from the launching node.
Limitations
The following limitations apply to the Turns Engine.
- The Turns Engine can operate only on Xilinx FPGA families. It cannot operate on CPLDs.
- The Turns Engine can only operate on UNIX workstations.
- Each node uses a single license while running. The maximum number of nodes that can operate in parallel is limited to the number of licenses available. You must also have an IMPMAN (Implementation Manager) license to run the Turns Engine.
- Each run targets the same part, and uses the same algorithms and options. Only the starting point, or the cost table entry, is varied.
System Requirements
These are the system requirements for running the Turns Engine.
- rsh must be located through the path variable.
- The executables required on the machines defined in the nodes file are
- /bin/sh
- par (must be located through path variable).
- The Turns Engine logs onto a node and then invokes PAR. The environment variables on the node are read from the node's .cshrc file (or equivalent); they are not passed from the host to the node. Therefore, all the Xilinx environment variables below must be defined in the .cshrc file. If not, the PAR process on the node will not be able to find the software or the licenses.
- XILINX (points at Xilinx directory structure - must be a path accessible to both the machine from which the Turns Engine is run and the node).
- LD_LIBRARY_PATH (supports par path for shared libraries - must be a path accessible to both the machine from which the Turns Engine is run and the node).
- path (contains $XILINX/bin/$PLATFORM, where $PLATFORM is one of the following: sun, sol, hp, or rs6000).
To determine if everything is set up correctly, you can run the rsh command to the nodes to be used. Type the following.
rsh node_name /bin/sh -c par
If you get the usage message back on your screen, everything is set correctly.
Turns Engine Environment Variables
The environment variables below are interpreted by the Turns Engine manager.
- PAR_AUTOMNTPT - Specifies the network automount point. The Turns Engine uses network path names to access files. For example, a local path name to a file may be designs/cpu.ncd, but the network path name may be /home/machine_name/ivan/designs/cpu.ncd or /net/machine_name/ivan/designs/cpu.ncd. The PAR_AUTOMNT environment variable should be set to the value of the network automount point. The automount points for the examples above are /home and /net. The default value for PAR_AUTOMNT is /net.
The line below sets the automount point to /nfs. If the current working directory is /usr/user_name/design_name on node mynode, the command cd /nfs/mynode/usr/user_name/design_name is generated before PAR runs on the machine.
setenv PAR_AUTOMNTPT /nfs
The setting below does not issue a cd command; you are required to enter full paths for all of the input and output file names.
setenv PAR_AUTOMNTPT ""
The setting below tells the system that paths on the local workstation are the same as paths on remote workstations. This can be the case if your network does not use an automounter and all of the mounts are standardized, or if you do use an automounter and all mount points are handled generically.
setenv PAR_AUTOMNTPT "/"
- PAR_AUTOMNTTMPPT - Most networks use the /tmp_mnt temporary mount point. If your network uses a temporary mount point with a different name, like /t_mnt, then you must set the PAR_AUTOMNTTMPPT variable to the temporary mount point name. In the example above you would set PAR_AUTOMNTTMPPT to /t_mnt. The default value for PAR_AUTOMNTTMPPT is /tmp_mnt.
- PAR_M_DEBUG - Causes the Turns Engine to run in debug mode. If the Turns Engine is causing errors that are difficult to correct, you can run PAR in debug mode in the following way.
- Set the PAR_M_DEBUG variable.
setenv PAR_M_DEBUG 1
- Create a node list file containing only a single entry (one node).
This single entry is necessary because if the node list contains multiple entries, the debug information from all of the nodes is intermixed, and troubleshooting is difficult.
- Run PAR with the -m (multi-tasking mode) option.
In debug mode, all of the output from all commands generated by the PAR run is echoed to the screen. There are also additional checks performed in debug mode, and additional information supplied to aid in solving the problem.
Security
If you attempt to run multiple PAR jobs with the -m nodefile_name option, the Turns Engine manager (impman) license must be available so that jobs can be allotted to the designated hosts to perform each individual PAR run. If the impman license is not available, you get an error message.
If PAR is able to lock the impman license, each job running on a node tries to lock a Turns Engine place and route (imppar) license. If it is able to do this, the job is automatically timing-driven and device-independent. You see a message like this on your screen.
Starting job 5_1 on node NODE1
If PAR is unable to lock an imppar license, you do not see a starting job message and PAR reverts to the normal sequence of par, tdpar, and family licensing.
For more information on Xilinx security, see the applicable Install and Release Document.
Starting the Turns Engine From the Command Line
The following is the PAR command line syntax to run the Turns Engine.
par -m nodelist_file -n #_of_iterations -s #_of_iterations_to_save mapped_desgin.ncd output_directory.dir
-m nodelist_file specifies the nodelist file for the Turns Engine run.
-n #_of_iterations specifies the number of place and route passes.
-s #_of_iterations_to_save saves only the best -s results.
mapped design.ncd is the input NCD file.
output_directory.dir is the directory where the best results (-s option) are saved. Files include placed and routed NCD, summary timing reports (DLY), pinout files (PAD), and log files (PAR).
Debugging
With the Turns Engine you may receive messages from the login process. The problems are usually related to the network or to environment variables.
- Network Problem - You may not be able to logon to the machines listed in the nodelist file.
- Try to ping the nodes by running the following command.
ping machine_name
You should get a message that the machine is alive. The ping command should also be in your path (UNIX cmd: which ping).
- Try to logon to the nodes using the command rsh machine_ name. You should be able to logon to the machine. If you cannot, make sure rsh is in your path (UNIX cmd: which rsh). If rsh is in your path, but you still cannot logon, contact your network administrator.
- Try to launch PAR on a node by entering the following command.
rsh machine_name /bin/sh -c par.
This is the same command that the Turns Engine uses to launch PAR. If this command is successful, everything is set up correctly for the machine_name node.
- Environment Problem - logon to the node with the problem by entering the following UNIX command
rsh machine name
Check the $XILINX, $LD_LIBRARY_PATH, and $PATH variables by entering the UNIX command echo $variable_name.
If these variables are not set correctly, check to make sure these variables are defined in your .cshrc file.
NOTESome, but not all, errors in reading the .cshrc may prevent the rest of the file from being read. These errors may need to be corrected before the XILINX environment variables in the .cshrc are read.
| The error message /bin/sh: par not found indicates that the environment in the .cshrc file is not being correctly read by the node.
Screen Output
When PAR is running multiple jobs and is not in multi-tasking mode, output from PAR is displayed on the screen as the jobs run. When PAR is running multiple jobs in multi-tasking mode, you only see information regarding the current status of the Turns Engine. For example, when the job described in the Turns Engine Overview section is executed, the following screen output would be generated.
Starting job 5_5_1 on node jupiter
Starting job 5_5_2 on node mars
Starting job 5_5_3 on node mercury
Starting job 5_5_4 on node neptune
Starting job 5_5_5 on node pluto
When one of the jobs finishes, a message similar to the following displays.
Finished job 5_5_3 on node mercury
These messages continue until there are no jobs left to run, at which time Finished appears on your screen.
NOTEFor HP workstations, you are not able to interrupt the job with Control-C as described below if you do not have Control-C set as the escape character. To set the escape character, refer to your HP manual.
| You may interrupt the job at any time by pressing Control-C. If you interrupt the program, you see the following on your screen.
CONTRL-C interrupt detected.
Please choose one of the following options:
1. Continue processing and ignore the interrupt.
2. Normal program exit at next check point.
3. Exit program immediately.
4. Add a node for running jobs.
5. Stop using a node.
6. Display current status.
Enter choice - - >
Choices are described below.
- Continue processing and ignore the interrupt - self-explanatory.
- Normal program exit at next check point - allows the Turns Engine to wait for all jobs to finish before terminating. PAR is allowed to generate the master PAR output file (PAR), which describes the overall run results.
When you select option 2, a secondary menu appears as shown below.
How would you like to handle the currently running job?
1. Allow jobs to finish.
2. Halt jobs at next checkpoint.
3. Halt jobs immediately.
Enter choice - - >
- Allow jobs to finish - current jobs finish but no other jobs start if there are any. For example, if you are running 100 jobs (-n 100) and the current jobs running are 5_5_49 and 5_5_50, when these jobs finish, job 5_5_51 is not started.
- Halt jobs at next checkpoint - all current jobs stop at the next checkpoint; no new jobs are started.
- Halt jobs immediately - all current jobs stop immediately; no other jobs start.
- Exit program immediately - all running jobs stop immediately (without waiting for running jobs to terminate) and PAR exits the Turns Engine.
- Add a node for running jobs - allows you to dynamically add a node on which you can run jobs. When you make this selection, you are prompted as follows.
Input the name of the node to be added to the list
After you enter the node name, a job starts immediately on that node and a Starting job message is displayed.
- Stop using a node - allows you to remove a node from the list so that no job runs on that node.
If you select Stop using a node, you must also select from the following options.
Which node do you wish to stop using?
1. jupiter
2. mars
3. mercury
Enter number identifying the node.(<CR> to ignore)
Enter the number identifying the node. If you enter a legal number, you are asked to make a selection from this menu.
Do you wish to
1.Terminate the current job immediately and resubmit.
2.Allow the job to finish.
Enter number identifying choice. (<CR> to ignore)
The options are described below.
- Terminate the current job immediately and resubmit - halts the job immediately and sets it up again to be run on the next available node. The halted node is not used again unless it is enabled by the add function.
- Allow the job to finish - finishes the node's current job, then disables the node from running additional jobs.
NOTEThe list of nodes described above is not necessarily numbered in a linear fashion. Nodes that are disabled are not displayed. For example, if NODE2 is disabled, the next time Stop using a node is opted, the following is displayed.
| Which node do you wish to stop using?
1. jupiter
3. mercury
Enter number identifying the node. (<CR> to ignore)
- Display current status - displays the current status of the Turns Engine. It shows the state of nodes and the respective jobs. Here is a sample of what you would see if you chose this option.
ID NODE STATUS JOB TIME
1. jupiter Job Running 5_5_10 02:30:45
2. mars Job Running 5_5_11 02:28:03
3. mercury Not Available
4. neptune Pending Term 5_5_12 02:20:01
5. pluto Job Running 5_5_13 02:20:01
6. venus Idle
7. earth Job Running 5_5_12 25
Each entry is described below:
- jupiter has been running job 5_5_10 for approximately 2 1/2 hours.
- mars has been running job 5_5_11 for approximately 2 1/2 hours.
- mercury has been deactivated by the user with the Stop using a node option or it was not an existing node or it was not running. Nodes are pinged to see if they exist and are running before attempting to start a job.
- neptune has been halted immediately with job resubmission. The Turns Engine is waiting for the job to terminate. Once this happens the status is changed to not available.
- pluto has been running job 5_5_13 for 2 hours 20 minutes.
- venus has finished its current job and is available for another. When you see the Idle message, it usually means that no other jobs are available.
- earth is running job 5_5_12. This job was resubmitted when neptune was dropped. It has been running for 25 seconds. It is unlikely that you will see the same job listed twice (as in the sample above) since the job pending termination usually finishes very quickly.
There is also a status named Job Finishing. This appears if the Turns Engine has been instructed to halt the job at the next checkpoint.