honeypot@pipelinefx.com
PipelineFX

 

Knowledge Base

Contents

 

General

 

Can you give an overview of Qube's architecture from a workflow standpoint?

Yes, here is a sample workflow that showcases Qube's main components:

  1. An artist submits a job from either Client machine (through the QubeGUI, in-application submission, command-line, python, etc)
  2. This creates a package of information (strings, numbers, etc) that are sent to the Supervisor and stored in the MySQL database.
  3. The Supervisor identifies available Workers to process the job.
  4. The Supervisor sends the job package to the Worker. 5. The Worker service then launches the respective backend (script or executable) that reads the job package and launches the appropriate commandline or executable for the rendering, etc.
  5. The application (like maya) then reads in the scene (stored in a central location) and then renders the resulting frames to a central location (like a NAS or other file server). Note that no file staging/copying is done locally to the workers to minimize network traffic.
  6. The artist or anyone else, can view the current status of a job through the QubeGUI, command-line, python, etc.
 

What are the main recommended hardware components used by Qube

From a hardware standpoint, the main things recommended are:

  1. Server machine to act as the Supervisor
  2. A File Server or NAS to store the scenes, textures, and rendered images in a central location.
  3. Either artist workstations or a dedicated farm of Worker machines to process/render.
 

What does a job CPUs, a job's subjobs, and the Host Resource "host.processors" mean with respect to Machine Cores?

Qube does not explicitly restrict a job to run on a particular core. It leaves that up to the applications to determine. If this is the case, then what to these terms used in Qube mean and how do they relate to machine cores?

The terms can be misleading. Here is a summary of the terms and their meanings:

  • Job Terms
    • CPUs: This is the number of (render) processes to concurrently run. Those processes can use multiple cores, though that is left up to the processes. It does not have a direct relation to the number of machine cores.
    • subjobs: These are the actual processes being run by the jobs. They are not a dependent or child job, but rather a process parented under a job.
    • reservations host.processors=1: This is the number of process slots (see below) used for each job process. If one sets in the job host.processors=2, then each process for that job will use 2 process slots (though not limited to 2 cores). Reservations can also be used for reserving things like memory or licenses for each process.
  • Worker Terms
    • host.processors: This is the number of subjobs or process that can run concurrently on a Worker. Think of them as "process slots". It is not directly tied to the number of cores on the Worker, though it is set by default to the number of cores on the system. To have a worker run only a single render process, but with the capability to use all the cores, set host.processors=1 or lock the Worker to only have 1 unlocked slot.

Putting it together...

The job's CPUs value refers to the number of subjobs (or processes) the Supervisor should dispatch to run the job. Every job has at least 1 subjob/process, and each subjob/process runs in a slot on a host. The number of subjob slots a single subjob takes up is controlled by the host.processors resource reservation. By default a single subjob takes up a single slot (host.processors=1) when submitting a job.

Therefore...

When you submit a job and request a number of "CPUs," you are not actually asking for Qube to map processes to processor/cpus/cores. Rather the "cpus" value is represents the number of "subjobs" to dispatch to various Workers. By default, each subjob takes up one "slot," the number of slots on a Worker determined by the "host.processors" resource. That value is set by the worker_cpus configuration variable. If it's set to "0," then host.processors is automatically set to the number of cores on the Worker. Any other value is what host.processors is set to.

That's why every job by default has a reservation of "host.processors=1" It means "look for a worker with 1 slot open, and then fill it with a subjob, and reduce the number of slots on the Worker by 1."

When a subjob launches, Qube is executing an instance of an application, which may be a simple command line, or it may be an elaborate interactive session with Maya (Maya Job Type). At this level, we are simply running the application, and depending upon the OS to determine actual CPU utilization.

Now for a practical example...

To have each Worker on your Qube farm render a single frame at a time using all cores (with a multi-threaded renderer like Maya), just reduce the number of available host.processor resource slots on the Worker to 1. This can be done by "Locking" off the number of available process slots to 1 (done through the QubeGUI or commandline).

 

Installation

 

What about CentOS?

CentOS 5 is apparently compatible with our RHEL 4

 

I'm getting an error on an Windows installer about unable to detect Qube

Run the registry editor on the machine (Start menu, "Run..." then "regedit" and hit enter), and check the following path.

HKEY_LOCAL_MACHINESOFTWAREPipelineFXqube

There should be BASE_DIR and INSTALL_DIR, which should point to, by default, respectively:

BASE_DIR C:Program Filespfx
INSTALL_DIR C:Program Filespfxqube
 

How to Install Qube from the command line

OS X:

mount, install, unmount it:

hdiutil attach dmgfile
installer -pkg /Volumes/volume/package.pkg -target /
hdiutil detach /Volumes/volume

Linux:

rpm -ivh rpmfile

Windows:

The msiexec.exe command will perform an MSI installation via the command line.

msiexec -i msifile

The various flags supported by the installer are:

  • INSTALL_WORKER_SERVICE
  • INSTALL_WATCHDOG_SERVICE
  • INSTALL_USER_PATH
  • INSTALL_ADMIN_PATH
  • INSTALL_MAYA_JOB_TYPE
  • INSTALL_MAYA_API

Setting them to 1 will have the same effect as clicking the checkbox in the interactive installer.

 

MSI installation with logging

Sometime you need to see what's going wrong with the MSI installer. You can use the command line msiexec to install with logging output to a file:

msiexec /i mymsifile.msi /Lime logfile.txt

or more verbose

msiexec /i mymsifile.msi /L*vime logfile.txt

where mymsifile.msi is the path to the MSI file.

 

How to have the worker service run as a particular user on Windows

Configure the service to log on a a particular user.  This user must be in the local Administrator's group, and the following User Rights Assigments policies must be applied to both the Administrator's and Network Service groups:

  • replace a process-level token
  • act as part of the operating system
  • adjust memory quota for process
 

My Supervisor install fails before completing

On Windows, I recommend you try backing out of Qube, uninstalling MySQL, and reinstall Qube. On OS X. You could uninstall MySQL, or you can run the normal installer and make sure to use the Customize option and deselect MySQL before installing the Supervisor.

 

How do I get past an installer stuck on the MySQL database install phase?

There is also an alternate method if you cannot reset the password, in the configuration file:

Windows

C:winntqb.conf or C:windowsqb.conf

Linux/OSX

/etc/qb.conf

Add these lines:

database_user = root
database_password = "yourpassword"

You can use any account you wish. However the user account you choose, must be capable of creating and deleting databases.

 

How do I set the Qube database with a different user?

Our installer assumes a new installation of MySQL, so it should probably have only the default users and passwords, that way, we can add our own access. We leave the root password blank. You can configure the Supervisor by editing its qb.conf to set an alternate user and password with the following variables:

database_user
database_password
 

I get this error: libwx_gtk2_aui-2.7.so.1: cannot open shared object file: No such file or directory

Install GTK 2.7. The installers are located on our FTP site.

 

How to I reset the Shutdown Policy?

  1. Go to Administrative Tools > Local Security Policy.
  2. Go to Security Settings > Local Policies > User Rights Assignment.
  3. Double-click on "Shut down the system".
  4. Click Add User or Group....
  5. Enter "Administrators" as object. Click OK. Repeat for Power Users and Users.
  6. May need to confirm with network user name and password.
  7. Click OK at the "Shut down the system Properties" dialog.
  8. Confirm that "Administrators," "Power Users" and "Users" shows in the Securities Setting for "Shut down the system".
  9. Close Local Security Policy dialog.
 

Newly installed Workers are listed as "down"

This is probably the result of a firewall either on the Worker or the Supervisor. Disabling all firewalls and restarting the Workers should fix the problem. If security issues require the firewalls, open the following ports to TCP/IP and UDP:

  • 50001
  • 50002
  • 50011
 

What ports are needed by Qube to "punch" through the firewall?

See post on Qube_Knowledge_Base#Newly_installed_Workers_are_listed_as_"down"

 

Jobs fail with "ERROR: unsupported perl version 5.010000"

To get around this issue, install either Perl 5.6 or Perl 5.8.

 

Configuration

 

Where is the qb.conf file?

Linux: /etc/qb.conf

OS X: /etc/qb.conf

Windows XP: WINDOWSqb.conf

Windows Vista: PROGRAM DATAPfxQubeqb.conf

 

How to turn off preemption

Set supervisor_preempt_policy = disabled and the supervisor will not preempt jobs either passively or aggressively.

 

What if I want to lock down certain hosts to specific groups that will only run jobs submitted to those groups?

As we discussed, a host group is like an "alias" for a set of machines. You can assigned a host to more than one group, but jobs sent to the group will only run on those machines. A cluster is a priority scheme that will allow a job to run anywhere there is an available machine, but it could get preempted by a job that has a cluster specification that matches the machine.

Since your customer wants to divide up the farm strictly so that jobs intended for machines assigned to a project can't run elsewhere (even if hosts are available), you'd use a group.

Here's how I'd set up the qbwrk.conf:

[project1]
worker_groups = "project1"
worker_cluster = "/project1"

[project2]
worker_groups = "project2"
worker_cluster = "/project2"

[project3]
worker_groups = "project3"
worker_cluster = "/project3"

[project4]
worker_groups = "projetc4"
worker_cluster = "/project4"

[project5]
worker_groups = "project5"
worker_cluster = "/project6"

[xcube1]: project1
[xcube2]: project2
[xcube3]: project3
[xcube4]: project4
[xcube5]: project5
 

Yes, but what if I want each client to only submit to a specific group?

To do that, you would need to go back to the cluster, establish worker restrictions, then set the client cluster to submit jobs to the appropriate cluster.

1. You will need to modify the qbwrk.conf:

[project1]
worker_cluster = "/"
worker_restrictions = "/project1"

[project2]
worker_cluster = "/"
worker_restrictions = "/project2"

[project3]
worker_cluster = "/"
worker_restrictions = "/project3"

[project4]
worker_cluster = "/"
worker_restrictions = "/project4"

[project5]
worker_cluster = "/"
worker_restrictions = "/project6"

[xcube1]: project1
[xcube2]: project2
[xcube3]: project3
[xcube4]: project4
[xcube5]: project5

2. You will need to set the following on each client qb.conf (for example):

client_cluster = "/projectA"

You need to set this on each client, so that the client will by default only submit jobs to the cluster you specify in the qb.conf.

The cluster setting is a hierarchy, so you don't necessarily need to put each host in a cluster. The restriction will limit the host to only run jobs submitted with the appropriate cluster spec, and the client_cluster will limit the cluster a job will be submitted to.

As a caution, if a user submits a job with a different cluster setting, the job will not go out with the default set in the qb.conf, but rather the one specified by the user, so they should not submit the job with a cluster setting.

 

How do I set up a host so that job will only run that type of host?

For each Maya Worker create a host property called "host.maya" and set it equal to 1. You can either do this with the Configuration GUI or using a qbwrk.conf:

[hostname01]
worker_properties = host.maya=1

When you submit a job, the requirement must then include:

host.maya=1
 

What are all the numbers that go into the license resource?

Qube! keeps track of 4 numbers for licensing.

  • qb.conf controls 1,
  • the supervisor maintains 1,
  • The tool qbupdateresource controls 2 of them.

The qb.conf value is set like this, for example:

license.maya=50

This value of 50 can only be set in the qb.conf. This is the number of licenses allocated to the farm.

Let's say the output from the qbadmin command

% qbadmin supervisor --resources
license.maya=20/50

The value of 20 is the Supervisor-tracked license resource for the farm. The value of 20 means that 20 licenses are in use from a total allocation of 50.

The final 2 components, you won't see in the output of qbadmin. They represent

  1. "licenses currently in use across the facility" and
  2. total licenses in your facility.

This is because the Supervisor needs to differentiate between licenses it's using and licenses used outside of the farm. Since the Supervisor already knows how many licenses it is using, it can determine how many don't belong to the farm and adjust the available resources accordingly.

For example, if you issued a series of qbupdateresource calls

% qbupdateresource --name license.nuke --total 50 --used 10
% qadmin supervisor --resource
license.nuke=10/50
% qbupdateresource --name license.nuke --total 40 --used 10
% qadmin supervisor --resource
license.nuke=20/50

In the example because the total number of licenses in the whole facility dropped from 50 to 40, the supervisor compensates its in use number by 10 since now there are 10 less to work with.

 

I need to get rid of the worker and/or the supervisor tabs in the configuration GUI

Remove either the worker or the supervisor.

 

Cross-Platform rendering: Linux or Mac to Windows

If going from Windows using UNC paths to Linux or Mac, one can use symbolic links to map the UNC paths to absolute paths.

See http://support.pipelinefx.com/wiki/index.php/Qube_Knowledge_Base#Set_up_Linux_or_OS_X_to_handle_jobs_with_UNC_paths

 

 

Cross-Platform rendering: Windows to Linux or Mac

If going from Linux or OSX to Windows, this gets a bit trickier. Having all paths within the scenefile be relative paths is usually essential. If that is done, then one needs to then just make sure that the path that the rendering Worker is using is valid.

For example, if one submits from a Mac to the file:

/Volumes/mynet/myproject/myscene.ma

The Windows PC will likely expect something like:

//mynet/myproject/myscene.ma

You can first try this out manually when submitting a job by modifying the scenefile name that is being submitted.

Automation of this can be done by modifying the submission dialog .py file and adding a postDialog callback to adjust the paths. We are also working on solutions for client-side path translation that may handle this in future versions of Qube.

 

Administration

 

Temporarily take hosts out of the farm.

  1. Ban the worker using qbadmin worker --remove.
  2. Stop and then disable the qubeworker service on that host.

This stops the worker from showing up again if I make a --clearbanned call.

So to reinstate a worker

  1. Reenable and re-start the qubeworker service
  2. qbadmin worker --clearbanned. This brings back only those workers with an active service.
 

How do you remove a duplicate "down" host?

I think you can remove the offending host by using qbadmin and referring to it by the MAC address:

qbadmin worker --remove 00:30:48:5A:71:5D

If that doesn't worker, you'll need to remove it by name or IP:

qbadmin worker --remove LAX-RF-029

If you're lucky, you'll get the down host removed. If both get removed, just use --clearbanned and restart the Worker:

qbadmin worker --clearbanned
 

Supervisor

 

I can't seem to qbping the Supervisor, even though I know it is up

Have you checked your firewall settings:

Unix:

iptables -L
 

Supervisor won't start because it can't open port 50002

There is another system which is included on linux by default which directly conflicts with port numbers we use:

  • 50001
  • 50002
  • 50011

If a site is unable to start their supervisor, they may need to disable the hplip service, or they can change their supervisor port number. However if they do this, every single client and worker will also need to reflect these settings (for example):

Supervisor qb.conf

supervisor_port = 10001
supervisor_sub_port = 10002

Worker qb.conf

worker_port = 10011
 

How do I run the Supervisor service as some other user?

  1. In the Services Control Panel, right-click the qubesupervisor service and select Properties from the menu.
  2. Click the LogOn tab and then
  3. click the radio button to set "This Account" with the proper login and password.
 

How do I backup the Supervisor?

You can use standard backup tools. Here is a list of files that are critical:

  • supervisor_logfile: /var/spool/supelog
  • supervisor_logpath: /var/spool/qube
  • qb_directory: /usr/local/pfx/qube
  • /etc/my.conf
  • /etc/qb.conf
  • /etc/qbwrk.conf
  • /etc/init.d/supervisor
  • innodb_data_home_dir: /var/lib/mysql
 

Migrating Supervisor to new host

 

     

  1. Before migration, you should let all jobs finish or at least reach a termination state (complete, fail, kill).
  2. Shutdown the Supervisor and MySQL.
  3. Install the new Supervisor. (Shutdown the new Supervisor and MySQL if they get started.)
  4. Copy over these files:
  5. supervisor_logfile: /var/spool/supelog
    supervisor_logpath: /var/spool/qube
    /etc/qb.conf
    /etc/qbwrk.conf
    /etc/qb.lic
    /etc/my.conf
    innodb_data_home_dir: /var/lib/mysql
  6. Start up new MySQL and Supervisor.
  7. Update Workers and clients with new qb_supervisor setting.
  8.  

 

 

My job package variables are getting truncated

Fix by enlarging the field size of job.data

% mysql -u root qube
mysql> ALTER TABLE job MODIFY data LONGTEXT;
 

Force a status change through the database

% mysql -u root qube
mysql> UPDATE job SET status = 0x140 WHERE id = <myjobid>;

 

 

I am getting 'Invalid agenda item name "1". Skipping slice.' warnings in the QubeGUI. What's doing on and how do I fix this?

Cause: Likely you have just recently reset your qube database on the same machine that was previously running a qube supervisor. The MySQL database was cleared, but the job log files are still present. The descrepency between those log files and what is stored on the supervisor mysql database is what the QubeGUI is likely issuing these warnings about.

Solution: Delete or move the job log files. They can be found at:

  • Windows: C:Program Filesqubelogsjob
  • Linux: /var/spool/qube/job
 

How do I reset the Supervisor MySQL database?

(Example commands provided for OSX platform)

  1. login to your Supervisor
  2. open a Terminal window
  3. run the following command
sudo /Applications/pfx/qube/utils/upgrade_supervisor -reset

Note: you may need to restart the Supervisor as well

sudo SystemStarter stop supervisor
sudo SystemStarter start supervisor
 

Worker

 

Add a lag between worker job launches

The qb.conf setting you need to use is:

worker_job_start_delay

The field is in seconds.

worker_job_start_delay = 10
 

How do I restart the Worker remotely? (Windows)

Submit a remote job:

qbsub --host hostname "net stop qubeworker && net start qubeworker"

(Alternate) Install sshd from Cygwin

ssh hostname "net stop qubeworker && net start qubeworker"
 

How do I reboot the Workers remotely?

To reboot a Worker:

qbadmin worker --reboot hostname

Reboot all Workers (Windows):

qbsub --flags host_list shutdown r
 

How do I centralize my worker job logs?

One can place all of the job logs (containing stdout,stderr,etc) directly in a central location. This requires modifying the configuration for both the Supervisor and the Workers.

  • On the Supervisor, open the Configuration GUI. Under Supervisor Settings->Path Settings, set the "Job Log Directory" to a network path.
    • Note: Use UNC paths and forward slashes (/) if on Windows.
    • Note: This can also be manually set directly in the Supervisor's qb.conf file by setting the supervisor_logpath parameter.
  • On each Worker, open the Configuration GUI. Under Worker Settings->Advanced Settings, set the "Job Log Directory" to a network path.
    • Note: Use UNC paths and forward slashes (/) if on Windows.
    • Note: This can also be manually set directly in the Worker's qb.conf file (or on the Supervisor's qbwrk.conf file) by setting the worker_logpath parameter.
 

How do I login to the local "qubeproxy" account on a Worker?

Logging into the "qubeproxy" local user is useful for troubleshooting if you are running in "proxy" mode for that Worker. The "qubeproxy" account is a local machine user account. The username and password for this account is:

  • Username: qubeproxy
  • Password: Pip3lin3P@$$wd

 

 

How do I reset the proxy password?

 

     

  1. Get the encrypted password string by using qblogin:
  2. qblogin --display --user proxyuser

    where proxyuser is the username for the proxy user. After successfully entering the password, an encrypted version of the password will be output.

  3. Paste the proxy_password entry in the qb.conf:
  4. proxy_password = password

    where password is the encrypted string.

 

 

Maya

 

How do I set up Maya to do path translation?

Your Windows clients need to translate the paths into something understandable by the Linux/Mac OS X Workers. To do this, we sometimes recommend the use of the MEL command dirmap. It has the capability to do the translation, and we have support for it in our Job Type. It has some limitations, so it's not for every situation.

In order to set up the dirmap, you will need to edit each users userSetup.mel file. Copy it around. In it, you add a line to enable dirmapping:

dirmap -en 1;

Then, you add the map such that the first directory is the FROM and the second is the TO mapping:

dirmap -m "<windowsDirectory>" "<linuxDirectory>"

For example:

dirmap -en 1;
dirmap -m "R:Project" "/uniserver/project"

To test if you have it set up correctly:

  1. launch Maya
  2. bring up a Maya shell
  3. Type dirmap -gam

You should then see your mappings as output.

When you submit the job, the mappings should be translated when the job gets submitted. It may take some finagling to get everything working.

 

My Maya job won't launch

Looks like your account isn't set up to include the maya bin directory in the PATH environment. Make sure you set up the MAYA_LOCATION as well.

If your shell is /bin/bash put the following in your $HOME/.profile:

export QBDIR=/Applications/pfx/qube
export ALIAS_LOCATION=/Applications/Alias
export MAYA_LOCATION=$ALIAS_LOCATION/maya7.0/Maya.app/Contents
export PATH=$QBDIR/bin:$QBDIR/sbin:$MAYA_LOCATION/bin:$PATH

On csh/tcsh, the following into your $HOME/.cshrc or $HOME/.tcshrc:

setenv QBDIR /Applications/pfx/qube
setenv ALIAS_LOCATION /Applications/Alias
setenv MAYA_LOCATION $ALIAS_LOCATION/maya7.0/Maya.app/Contents
setenv PATH $QBDIR/bin:$QBDIR/sbin:$MAYA_LOCATION/bin:$PATH
 

I'd like to use the "waitfor" option in MEL

Unfortunately the "waitfor" option isn't something available in the individual APIs however, there is an equivalent field in the job which the "waitfor" option can take advantage of. It's called "dependency".

Just add into your dependency field something similar to:

"dependency", "complete-job-123155"

Where complete is the state you are looking for, job is the kind of event, and the number is the job id. Note, you should use "done" rather than "complete" if you don't care if the job has failed, been killed, etc...

 

mental ray

 

Mental ray service problem

You need to change the permissions on the file below:

C:windowssystem32driversetcservices

The file contains the port numbers for mi. The problem is that under a proxy account, the proxy user may not have permissions to read that file. You could try one of the following:

  • Elevate the Proxy Account to Administrator

Or

  • Modify the permissions on the service file to give Everyone Read access.
 

3DS Max

 

I want to install 3DS Max in a nonstandard location. How do I inform the Job Type?

Edit the default_3dsmax_locations in the jobtypes/3dsmax/job.conf file

 

In-app submission not showing up with the latest Max jobtype.

The in-application submission not showing up with the latest Max jobtype when selecting the menu item. Also the QubeGUI launching from within Max is not working either. What's going on and how does one fix this?

The new in-application submission for the 3ds Max jobtype calls the QubeGUI executable and provides it the scenefile and other parameters. If the QubeGUI (qube.exe) cannot be found, then no dialog will come up.

If this happens, it is likely a path issue. From the commandline, type "qube.exe". If the GUI does not come up, then it likely cannot be found from within 3ds Max. Add to your System Environment Variables the PATH to where the QubeGUI is located (either C:Program Filespfxqubebin or C:Program Files (x86)pfxqubebin). Alternatively one can adjust the menu.ms script that calls the QubeGUI from within Max.

 

qbsub

 

How do I submit a frame render using qbsub?

Bear in mind that when you submit a command via qbsub, the Supervisor dispatches as many "subjobs" as you ask for with the "--cpus" option. Each subjob will execute the command.

That means, if the command is set up to render a range of frames, each subjob will render all those frames, wasting a lot of time and work. If you know how to set up your command to render a single frame, you can use qbsub to instruct the Supervisor to keep a list of frames to render. With the inclusion of a macro term to your command, you can instruct the Worker to request a frame from the Supervisor's list and execute the command on that one frame. Repeat this across all your subjobs, and you're distributing your frames across your farm!

Suppose you have a dumb command that renders frames with a couple of arguments:

Render --start # --end # <scene>

Where # are frame numbers and <scene> is the file.

If you submit the job naively using qbsub:

qbsub -cpus 10 Render --start 1 --end 100 scene

Your going to have each subjob (all 10 of them) render the whole scene from frames 1-100. Not good.

Instead, let's look at rendering a single frame, say 1:

Render --start 1 --end 1 scene

If we submit that naively:

qbsub -cpus 10 Render --start 1 --end 1 scene

We still do the same thing, but only do one frame. What if we could do this, but get each subjob to do different frames. It's pretty straightforward. Just give the Supervisor the list of frames, and change the command to include a placeholder where the frame would go:

qbsub --frames 1-100 --cpus 10 Render --start QB_FRAME_NUMBER
--end QB_FRAME_NUMBER scene

Now, when you submit the job, each subjob will call the Supervisor and ask for a frame to render, and substitute for the QB_FRAME_NUMBER placeholder. Easy! Each subjob will render one or more different frames, and will automatically quit when there are no more to render because the Supervisor keeps track.

 

My job is finished but I seem to have pending subjobs

Check to see if you have host_list set as a job flag

 

How to restrict a host to only one kind of job

So when you submit a job, you can do this to keep only one of your job's kind on a host:

qbsub --requirements "not (host.duty.kind has mykind)" --kind mykind  command

The cool thing is you can do it with types as well:

qbsub --requirements "not (host.duty.type has cmdline)" command

There is a reverse syntax if you want to use it:

qbsub --requirements "not (cmdline in host.duty.type)"  command

This tells the queuing system to filter out all hosts which have your kind of job already running on a host.

For the API:

not (job.type in host.duty.type)
 

Using the --type and --data with qbsub to submit a job

Here's a normal command line sleep 1000 qbsub:

qbsub sleep 1000

This is how you'd do it with the --data and --type:

qbsub --type cmdline --data '(=(cmdline=sleep "1000"))'

I found the data string by running

qbsub --xml --export job.xja sleep 1000

Examining the job.xja file for the <data></data> pair shows:

<data>(=(cmdline=sleep "1000"))</data>

So you should be able to submit an miGen job, check the xja file in the job log directory for the <data> tags and use the contents as a template.

 

Running Jobs

 

What directory will my job run in?

It will run in the same directory as it was submitted in, as long as that directory is valid on the executing Worker.

 

How to limit the number of renders on a host

The easiest thing to do is to submit your jobs with a memory reservation. The reservation will force the Supervisor to look for hosts with the requisite amount of memory before dispatching the job, and then block out (or reserve) the amount requested. This will serve to limit the number of subjobs running on the host to only the number that it can safely handle.

For example, say your hosts have 4 subjob slots and 2GB of memory. If each render process or thread needs 1GB or memory, you will soon overtax the machine because you will have 4 subjobs each asking for 1GB or more.

If you add a resource reservation (in MB) when you submit the job:

host.memory=1000

Then you will only have at most 2 subjobs running on the host, because that's as much memory as it can handle. Memory is a resource, so you should be able to monitor it in the QubeGUI by selecting a Worker and examining the Properties tab, under host resources.

You can also restrict jobs by limiting the number subjobs per host on a per job basis. If you have hosts with 4 subjob slots, you can just send each job a resource reservation of:

host.processors=4

However, if you have a mix of hosts with different numbers of subjob slots, then you would need to do something like this:

host.processors=1+

This will reserve a minimum of 1 slot per subjob, up to the maximum number of slots on the host. This won't guarantee a host will have multiple subjobs, so you may need to investigate the other options above.

You could reconfigure each host to have only one subjob slot per host. To do this, you will need to log in to the Worker and use the Configuration tool. Go to Worker Settings, then Advanced settings and set the Worker CPUs to 1.

Create a limited resource on each host. For example, if you're working with Maya render jobs, you can create a Maya worker resource with a quantity of 1 per Worker. You'll need to use the Configuration tool, select Worker Settings, then Worker Configuration. Add a Resource called host.maya Worker resource, and a Total of 1.

When you submit the job add this reservation:

host.maya=1

More information on resources and using the Configuration GUI can be found in the Administration manual.

 

How do I run the same job on every host?

qbsub --flags host_list command
 

The Worker cannot find a file when rendering. How can I troubleshoot this?

Qube requires that the Workers need to be able to read the scenes and textures on the network. The easiest way to check to see if a particular file or directory can be read by a Worker is to run a commandline job.

To verify that a particular file can be read by the Worker:

  1. Launch the QubeGUI
  2. Select the menu item Submit->Commandline Job...
  3. On Windows: Set the "Command" to "dir <path to a scene/texture/directory>" (without the " " quotes or < >)
  4. On Linux OSX: Set the "Command" to "ls <path to a scene/texture/directory>" (without the " " quotes or < >)
  5. Submit the job
  6. Refresh the GUI and check the "Stdout" Panel for the results if the Worker can see that file
 

QubeGUI

 

What image formats are supported by the GUI?

wxImage

This class encapsulates a platform-independent image. An image can be created from data, or using wxBitmap::ConvertToImage. An image can be loaded from a file in a variety of formats, and is extensible to new formats via image format handlers. Functions are available to set and get image bits, so it can be used for basic image manipulation.

Handlers

  • wxBMPHandler For loading and saving, always installed.
  • wxPNGHandler For loading (including alpha support) and saving.
  • wxJPEGHandler For loading and saving.
  • wxGIFHandler Only for loading, due to legal issues.
  • wxPCXHandler For loading and saving (see below).
  • wxPNMHandler For loading and saving (see below).
  • wxTIFFHandler For loading and saving.
  • wxIFFHandler For loading only.
  • wxXPMHandler For loading and saving.
  • wxICOHandler For loading and saving.
  • wxCURHandler For loading and saving.
  • wxANIHandler For loading only.
 

How do I setup submission-side path translation in the QubeGUI?

The QubeGUI 5.4 version uses the standardized SimpleCmd/SimpleSubmit framework for all of the submission dialogs. These submission dialogs are editable and located in the simplecmds/ directory (see File->Open SimpleCmds Directory...). A postDialog callback can be added to convert all path parameters to what the renderfarm machines expect.

Here is an example of modification to the Nuke (cmdline) submission interface that will convert the paths from OSX to Windows UNC paths:

def create():
cmdjob = SimpleCmd('Nuke (cmdline)', hasRange=True, canChunk=True, help='Nuke render jobtype', [b]postDialog=postDialog[/b])
...
def postDialog(cmd, jobProps):
# Get a list of properties that use paths
fileProps = set([k for k,v in cmd.options.iteritems() if v.get('type', '') in ['dir', 'file']])
# For path properties, substitute the string values
for k,v in jobProps.setdefault('package', {}).iteritems():
if k in fileProps:
jobProps['package'][k] = v.replace('/Volumes/myserver/', '//myserver/')
 

Getting GUI to work under Ubuntu

Thanks to Rangi Sutton of Kanuka Studio

 

     

  1. Add to /etc/apt/source.list (this is for Gutsy Gibbon)
  2. deb http://apt.wxwidgets.org/wxpython gutsy-wx main
  3. Run the following to add wxwidgets pgp key:
  4. $ wget -q http://apt.wxwidgets.org/key.asc -O- | sudo apt-key add -
    (returns)
    OK
  5. Update apt-get repo:
  6. $ sudo apt-get update
  7. Install python 2.4:
  8. $ sudo apt-get install python2.4
  9. And change python link:
  10. $ cd /usr/bin ; sudo rm python ; sudo ln -s python2.4 python
  11. Change default-version in this file to 2.4:
  12. $ sudo vi /usr/share/python/debian_defaults
  13. Install wxWidgets 2.8:
  14. $ sudo apt-get install python-wxgtk2.8

     

 

Some website/instructions/info here: http://www.wxwidgets.org/

 

Where is the GUI Preferences file?

The prefs file can be found in the following locations.

Linux

$HOME/qube/qube_guiPreferences.conf

Windows

c:/Documents and Settings/username/qube/qube_guiPreferences.conf

OS X

~/Library/Preferences/qube/qube_guiPreferences.conf
 

Windows

 

I'm getting "file not available" errors on my Windows jobs

Most likely, your drives are not mapped correctly on the Worker. Here are some notes on how to make sure your Workers can properly map drives at execution time:

  1. The Worker will automatically try to map a) all the drives mapped on the submitting machine, and b) any additional maps specified on the submitting machine using the Configuration GUI in the section "Windows Drive Map." If you need to reference a domain account, use the Windows domain specification format (DOMAINUSER) in the login field.
  2. On the Worker, Qube will not automatically fill in authentication for any of the drives that were mapped on the submitting machine, so you will need to set up in advance on the Worker, the authentication for either the submitting user (Worker in user mode) or the proxy user (Worker in proxy mode) making sure to check "map at login."
  3. Drive maps that were specified in the Configuration GUI will be authenticated using the login and password information specified.

Check out the next two articles for more information on drive sharing with Qube.

 

Render errors that say "file not found." when using UNC paths.

 

     

  1. Our system will automatically map Windows drives based upon whether the jobs and Workers have "auto_mount" enabled. We will detect the maps on the client machine, add any maps specified in the client configuration, and send them along with the job to be automatically mapped on the Worker at execution time. If you don't refer to mapped drives, and instead use UNC, this would be irrelevant.
  2. Your servers will need to allow the Qube proxy user ("qubeproxy") full access to the server. In order to authenticate, the proxy account and password should be added to the AD server so that when the qubeproxy attempts to reference the UNC path, it can be authenticated. The password we locally install on each Worker is:
    Pip3lin3P@$$wd

    Of course, the qubeproxy user will need to be added to appropriate groups in order to have read and write permission

    to the server.
  3. We only reference the PDC for authentication, so if you have a BDC, you may see some difficulties with authentication of some machines are binding to a BDC or other secondary domain controller.
  4. Each job is a description of where to find the scenefile and where to write the output. This description must remain consistent across your farm, or the job will fail. For example, if you reference the scenefile at myservermayamyscene.ma when you submit the job, it can't be located on the Worker at yourservermayamyscene.ma and still work. It will fail.
  5. In order to troubleshoot problems with drive mapping initially submit test jobs that try to reference the directories in question, so that you can verify the jobs are able to access the server properly, for example: qbsub --host main1 dir 192.168.1.200Live_Jobs Once you get the correct directory output, you should be able to submit the render job as well.
  6.  

 

 

How to troubleshoot problems with drive mapping.

Verify the job has drive maps. Verify the Worker has auto_mount turned on. Make sure the drive isn't automounting as part of the profile: Go to "Start Menu->My Computer" on the machine in question. Pull down Tools->Map Network Drive There should be a checkbox for "Reconnect at logon." You'll want to unmap the drive, and make sure that option is unchecked whenever you map the drive on that machine.

One can also submit a "test" job to check on the drive maps used on the Worker:

  1. Launch the QubeGUI
  2. Select the menu item Submit->Commandline Job...
  3. Set the "Command" to "net use" (without the " " quotes)
  4. Submit the job
  5. Refresh the GUI and check the "Stdout" Panel for the results of the mapped drives
 

How can a Windows machine be locked/unlocked when users logon/logoff?

You can use Windows' logon/logoff scripts to automatically lock/unlock a machine when users logon/off. Basically, you'd call "qblock <machinename>" in the logon script, and "qbunlock <machinename>" in the logoff script. To set up logon/logoff scripts for local logins, you edit settings in the Windows' "group policy editor":

  1. "Start Menu" -> "Run..."
  2. Type "gpedit.msc", enter-- launches the group policy editor.
  3. In the gpedit, in the left pane, choose "User Configurations" -> "Windows Settings" -> "Scripts (Logon/Logoff)"
  4. On the right pane, double-click on the "Logon", choose "Add"
  5. In the "Script Name", type "C:Program Filespfxqubebinqblock", or browse to the file.
  6. In the "Script Parameter", type "%COMPUTERNAME%".
  7. Hit "OK".
  8. Do the same for the "Logoff" script, but substitute "qbunlock" for "qblock". You also need to make sure that all users have permissions to "qblock" a machine. With qube 4.0, users do have this permission by default, but to make sure, see the "qbusers --list" output, and look for the line for user "[default]". If it looks like:
---l jcg krmpbuicseyqg-vft      [default]

you're good (the 4th column's "l" means the default users have lock permission).

 

We're running in proxy mode, but the qblogin GUI pops up. How do we disable it?

You could remove the "auth" from the "Startup" items for users on windows workstations.

 

Why do I get the GUI login window?

In order for you to operate the Workers the "user" mode, each user will need to register their domain login and password with the Supervisor. That way, the Worker service can authenticate as the submitting user in order to execute the job. The GUI window you see comes up in order to make it a little easier for the user to perform this step.

 

How do I set up debugging for a supervisor or worker crash on Windows?

Briefly, set up Dr. Watson to get a crash dump. From the Start Menu, run these commands:

Start->Run->drwtsn32 -i
Start->Run->drwtsn32

More information on Dr. Watson can be found at Microsoft:

http://support.microsoft.com/kb/308538

 

How do I look at the last few lines of an output log on Windows?

On Unix, the utility is called "tail." However, you will have to find a replacement. Look here for Unix tools for Windows: http://unxutils.sourceforge.net/

 

UNC path is an invalid current directory path. UNC paths are not supported. Defaulting to Windows directory.

From Microsoft: You must make a registry entry to be able to use a UNC path as the current directory.

WARNING: Using Registry Editor incorrectly can cause serious, system-wide problems that may require you to reinstall Windows NT to correct them. Microsoft cannot guarantee that any problems resulting from the use of Registry Editor can be solved. Use this tool at your own risk.

Under the registry path:

HKEY_CURRENT_USER
Software
Microsoft
Command Processor


add the value DisableUNCCheck REG_DWORD and set the value to 0 x 1 (Hex).

 

Renders submitted through the command line fail or lock up

Due to changes in render software architecture, a mechanism called JobObject which is used by the Qube! worker disrupts the internal code in common renderers such as 3dsmax and AfterEffects. The worker must be notified not to use the JobObject. To do this, specify the disable_windows_job_object flag when submitting your jobs

ex. qbsub --flags disable_windows_job_object  MyRenderer scene.ma

For more information on windows job objects, please refer to the Microsoft Developer Article:

MSDN - Job Objects
http://msdn2.microsoft.com/EN-US/library/ms684161.aspx

 

Linux/OS X

 

Set up Linux or OS X to handle jobs with UNC paths

Let's say you've got a server called "server," and on this server you keep a maya directory with projects in them. Let's call the project "default," a and the scenefile "myscene.mb."

So if you want to use UNC, this is what it might look like:

Project: \\server\maya\projects\default

Render: \\server\maya\projects\default\images

Scene: \\server\maya\projects\default\scenes\myscene.mb

This is what you'd need to submit a job. Alas, on the OS X side, it won't make sense.

First, you need to mount the drive using NFS or SMB. I'll leave that as an exercise, but what you should end up with is something like this (the underlying structure is what matters, so you can have the mount be whatever):

Project: /Volumes/maya/projects/default

Render: /Volumes/maya/projects/default/images

Scene: /Volumes/maya/projects/default/scenes/myscene.mb

Now you need to create a symlink so the path to server will work (you'll need to do this a root or an Admin user on each Worker):

mkdir /server
ln -s /Volumes/maya /server/maya

So if you do an 'ls' of /server/maya, you should see projects.

One small change to how you submit, and you should be good to go:

Project: //server/maya/projects/default

Render: //server/maya/projects/default/images

Scene: //server/maya/projects/scenes/myscene.mb

 

Drive Mounting: Remote files I access in OS X aren't visible on the Worker

The problem you describe is caused by a difference between the remote file services available to a logged in user (such as yourself) and those available to the host without anyone one logged in. In this particular case, when I refer to "logged in," I mean running a Finder desktop. Remote (and local) file systems accessed via the Finder are all mounted under /Volumes.

Qube runs as a daemon, and so it doesn't access the Finder at all. In general, any file you access remotely from the Finder is going to be inaccessible to any Worker running on the farm unless you take steps to make sure the Worker has those file systems already mounted.

You should consult your OS X administration documentation to learn more about how to mount your file servers either statically or dynamically so they are available to your Workers at render time. You will also want to set similar mounts on your client machines so that the paths to the files you access when you submit the job will be consistent with your Workers. Here's a link: http://www.bombich.com/mactips/automount.html

 

What about drive mounting on Mac OSX 10.5 (Leopard)?

 

NFS

Use the Utilities/Directory Utility application.

 

Samba

Since netinfo is gone, you'll have to manage the automount maps manually. Here is an article on how to create an automount map specially for Samba shares:

http://www.stress-free.co.nz/automounting_samba_shares_in_leopard

Alternatively, you can use the /etc/fstab. Here's an article on how to do that:

http://www.macosxhints.com/article.php?story=20071028194033157

 

How do I get an AFP drive to mount automatically when the job executes?

Normally, the Finder will automatically mount the AFP share if the "mount at login"box is checked. However, since the Worker doesn't launch a Finder, you will have to set the mount in a .login.

 

How do I set the hostname on OS X?

sudo scutil --set HostName name

This technique is referenced in the following TechNote: http://docs.info.apple.com/article.html?artnum=302044

 

Job Types

 

Can't locate JobType.pm

If your job logs contain the following error message:

Can't locate JobType.pm in @INC (@INC contains: ...

download from our FTP site (pub/jobtypes) the JobTypeLib package.

 

How do I set my own shared directory for job types?

Set the worker_template_path for the Worker to point to the directory containing the Job Types. Note, that on Windows, you must use UNC and the path separator is a forward slash "/".

 

     

  1. Try flipping the slashes to the other direction and see if that solves your problem
  2. worker_template_path = "//qubesupervisor.as.com/jobtypes"
  3. You may not be able to use worker_template_path in the qbwrk.conf. If that is the case, you will need to modify the local qb.conf.
  4. If you ever have problems with the qbwrk.conf, use the command line tool:
  5. qbconfigfile qbwrk.conf
  6. It will show you a fully expanded version of your qbwrk.conf that you can check for errors.
  7. Every change to the qbwrk.conf only requires a reconfiguration:

    qbadmin worker --reconfig

    while a change to the qb.conf requires a restart of the Worker.

  8. You can see the current configuration of the Worker:
  9. qbadmin worker --config host

     

 

 

I'm working on a Job Type, and I want to run a different version of Perl or Python?

User mode: Set the user's PATH environment variable to point to the version of the scripting language you prefer

Proxy mode: the proxy user's PATH environment variable to point to the version of the scripting language you prefer

 

Callbacks

 

What is the callback language "qube?"

  • unblock-subjob-self
  • block-subjob-self
  • fail-subjob-self
  • kill-subjob-self
  • migrate-subjob-self
  • preempt-subjob-self
  • interrupt-subjob-self
  • suspend-subjob-self
  • resume-subjob-self
  • mail-subjob-status
  • unblock-self
  • block-self
  • fail-self
  • kill-self
  • migrate-self
  • preempt-self
  • interrupt-self
  • suspend-self
  • resume-self
  • mail-status
  • mail-license-status
  • mail-report-status
 

Where is the output from the executed callback code?

Look in the .cb file for the job.

 

I tried to call a routine in my job submission script from the callback, and it didn't work.

The problem lies with the "code" in your callback. Callback code is literally a string interpreted and executed by the built-in interpreter selected by the "language" field. Since a job you submit is actually a data object submitted to the Supervisor, it doesn't share any code space with script that submitted it, and consequently you can't reference it.

If your socket script is a little too complicated to pack into a string without some serious debugging and maintenance grief, I'd recommend you save it out as a script and call it externally from your callback. (You can use the os.system() call).