Knowledge Base
General
Can you give an overview of Qube's architecture from a workflow standpoint?
Yes, here is a sample workflow that showcases Qube's main components:
- An artist submits a job from either Client machine (through the QubeGUI, in-application submission, command-line, python, etc)
- This creates a package of information (strings, numbers, etc) that are sent to the Supervisor and stored in the MySQL database.
- The Supervisor identifies available Workers to process the job.
- The Supervisor sends the job package to the Worker. 5. The Worker service then launches the respective backend (script or executable) that reads the job package and launches the appropriate commandline or executable for the rendering, etc.
- The application (like maya) then reads in the scene (stored in a central location) and then renders the resulting frames to a central location (like a NAS or other file server). Note that no file staging/copying is done locally to the workers to minimize network traffic.
- The artist or anyone else, can view the current status of a job through the QubeGUI, command-line, python, etc.
What are the main recommended hardware components used by Qube
From a hardware standpoint, the main things recommended are:
- Server machine to act as the Supervisor
- A File Server or NAS to store the scenes, textures, and rendered images in a central location.
- Either artist workstations or a dedicated farm of Worker machines to process/render.
What does a job CPUs, a job's subjobs, and the Host Resource "host.processors" mean with respect to Machine Cores?
Qube does not explicitly restrict a job to run on a particular core. It leaves that up to the applications to determine. If this is the case, then what to these terms used in Qube mean and how do they relate to machine cores?
The terms can be misleading. Here is a summary of the terms and their meanings:
- Job Terms
- CPUs: This is the number of (render) processes to concurrently run. Those processes can use multiple cores, though that is left up to the processes. It does not have a direct relation to the number of machine cores.
- subjobs: These are the actual processes being run by the jobs. They are not a dependent or child job, but rather a process parented under a job.
- reservations host.processors=1: This is the number of process slots (see below) used for each job process. If one sets in the job host.processors=2, then each process for that job will use 2 process slots (though not limited to 2 cores). Reservations can also be used for reserving things like memory or licenses for each process.
- Worker Terms
- host.processors: This is the number of subjobs or process that can run concurrently on a Worker. Think of them as "process slots". It is not directly tied to the number of cores on the Worker, though it is set by default to the number of cores on the system. To have a worker run only a single render process, but with the capability to use all the cores, set host.processors=1 or lock the Worker to only have 1 unlocked slot.
Putting it together...
The job's CPUs value refers to the number of subjobs (or processes) the Supervisor should dispatch to run the job. Every job has at least 1 subjob/process, and each subjob/process runs in a slot on a host. The number of subjob slots a single subjob takes up is controlled by the host.processors resource reservation. By default a single subjob takes up a single slot (host.processors=1) when submitting a job.
Therefore...
When you submit a job and request a number of "CPUs," you are not actually asking for Qube to map processes to processor/cpus/cores. Rather the "cpus" value is represents the number of "subjobs" to dispatch to various Workers. By default, each subjob takes up one "slot," the number of slots on a Worker determined by the "host.processors" resource. That value is set by the worker_cpus configuration variable. If it's set to "0," then host.processors is automatically set to the number of cores on the Worker. Any other value is what host.processors is set to.
That's why every job by default has a reservation of "host.processors=1" It means "look for a worker with 1 slot open, and then fill it with a subjob, and reduce the number of slots on the Worker by 1."
When a subjob launches, Qube is executing an instance of an application, which may be a simple command line, or it may be an elaborate interactive session with Maya (Maya Job Type). At this level, we are simply running the application, and depending upon the OS to determine actual CPU utilization.
Now for a practical example...
To have each Worker on your Qube farm render a single frame at a time using all cores (with a multi-threaded renderer like Maya), just reduce the number of available host.processor resource slots on the Worker to 1. This can be done by "Locking" off the number of available process slots to 1 (done through the QubeGUI or commandline).
Installation
What about CentOS?
CentOS 5 is apparently compatible with our RHEL 4
I'm getting an error on an Windows installer about unable to detect Qube
Run the registry editor on the machine (Start menu, "Run..." then "regedit" and hit enter), and check the following path.
HKEY_LOCAL_MACHINESOFTWAREPipelineFXqube
There should be BASE_DIR and INSTALL_DIR, which should point to, by default, respectively:
BASE_DIR C:Program Filespfx
INSTALL_DIR C:Program Filespfxqube
How to Install Qube from the command line
OS X:
mount, install, unmount it:
hdiutil attach dmgfile
installer -pkg /Volumes/volume/package.pkg -target /
hdiutil detach /Volumes/volume
Linux:
rpm -ivh rpmfile
Windows:
The msiexec.exe command will perform an MSI installation via the command line.
msiexec -i msifile
The various flags supported by the installer are:
- INSTALL_WORKER_SERVICE
- INSTALL_WATCHDOG_SERVICE
- INSTALL_USER_PATH
- INSTALL_ADMIN_PATH
- INSTALL_MAYA_JOB_TYPE
- INSTALL_MAYA_API
Setting them to 1 will have the same effect as clicking the checkbox in the interactive installer.
MSI installation with logging
Sometime you need to see what's going wrong with the MSI installer. You can use the command line msiexec to install with logging output to a file:
msiexec /i mymsifile.msi /Lime logfile.txt
or more verbose
msiexec /i mymsifile.msi /L*vime logfile.txt
where mymsifile.msi is the path to the MSI file.
How to have the worker service run as a particular user on Windows
Configure the service to log on a a particular user. This user must be in the local Administrator's group, and the following User Rights Assigments policies must be applied to both the Administrator's and Network Service groups:
- replace a process-level token
- act as part of the operating system
- adjust memory quota for process
My Supervisor install fails before completing
On Windows, I recommend you try backing out of Qube, uninstalling MySQL, and reinstall Qube. On OS X. You could uninstall MySQL, or you can run the normal installer and make sure to use the Customize option and deselect MySQL before installing the Supervisor.
How do I get past an installer stuck on the MySQL database install phase?
There is also an alternate method if you cannot reset the password, in the configuration file:
Windows
C:winntqb.conf or C:windowsqb.conf
Linux/OSX
/etc/qb.conf
Add these lines:
database_user = root
database_password = "yourpassword"
You can use any account you wish. However the user account you choose, must be capable of creating and deleting databases.
How do I set the Qube database with a different user?
Our installer assumes a new installation of MySQL, so it should probably have only the default users and passwords, that way, we can add our own access. We leave the root password blank. You can configure the Supervisor by editing its qb.conf to set an alternate user and password with the following variables:
database_user
database_password
I get this error: libwx_gtk2_aui-2.7.so.1: cannot open shared object file: No such file or directory
Install GTK 2.7. The installers are located on our FTP site.
How to I reset the Shutdown Policy?
- Go to Administrative Tools > Local Security Policy.
- Go to Security Settings > Local Policies > User Rights Assignment.
- Double-click on "Shut down the system".
- Click Add User or Group....
- Enter "Administrators" as object. Click OK. Repeat for Power Users and Users.
- May need to confirm with network user name and password.
- Click OK at the "Shut down the system Properties" dialog.
- Confirm that "Administrators," "Power Users" and "Users" shows in the Securities Setting for "Shut down the system".
- Close Local Security Policy dialog.
Newly installed Workers are listed as "down"
This is probably the result of a firewall either on the Worker or the Supervisor. Disabling all firewalls and restarting the Workers should fix the problem. If security issues require the firewalls, open the following ports to TCP/IP and UDP:
- 50001
- 50002
- 50011
What ports are needed by Qube to "punch" through the firewall?
See post on Qube_Knowledge_Base#Newly_installed_Workers_are_listed_as_"down"
Jobs fail with "ERROR: unsupported perl version 5.010000"
To get around this issue, install either Perl 5.6 or Perl 5.8.
Configuration
Where is the qb.conf file?
Linux: /etc/qb.conf
OS X: /etc/qb.conf
Windows XP: WINDOWSqb.conf
Windows Vista: PROGRAM DATAPfxQubeqb.conf
How to turn off preemption
Set supervisor_preempt_policy = disabled and the supervisor will not preempt jobs either passively or aggressively.
What if I want to lock down certain hosts to specific groups that will only run jobs submitted to those groups?
As we discussed, a host group is like an "alias" for a set of machines. You can assigned a host to more than one group, but jobs sent to the group will only run on those machines. A cluster is a priority scheme that will allow a job to run anywhere there is an available machine, but it could get preempted by a job that has a cluster specification that matches the machine.
Since your customer wants to divide up the farm strictly so that jobs intended for machines assigned to a project can't run elsewhere (even if hosts are available), you'd use a group.
Here's how I'd set up the qbwrk.conf:
[project1]
worker_groups = "project1"
worker_cluster = "/project1"
[project2]
worker_groups = "project2"
worker_cluster = "/project2"
[project3]
worker_groups = "project3"
worker_cluster = "/project3"
[project4]
worker_groups = "projetc4"
worker_cluster = "/project4"
[project5]
worker_groups = "project5"
worker_cluster = "/project6"
[xcube1]: project1
[xcube2]: project2
[xcube3]: project3
[xcube4]: project4
[xcube5]: project5
Yes, but what if I want each client to only submit to a specific group?
To do that, you would need to go back to the cluster, establish worker restrictions, then set the client cluster to submit jobs to the appropriate cluster.
1. You will need to modify the qbwrk.conf:
[project1]
worker_cluster = "/"
worker_restrictions = "/project1"
[project2]
worker_cluster = "/"
worker_restrictions = "/project2"
[project3]
worker_cluster = "/"
worker_restrictions = "/project3"
[project4]
worker_cluster = "/"
worker_restrictions = "/project4"
[project5]
worker_cluster = "/"
worker_restrictions = "/project6"
[xcube1]: project1
[xcube2]: project2
[xcube3]: project3
[xcube4]: project4
[xcube5]: project5
2. You will need to set the following on each client qb.conf (for example):
client_cluster = "/projectA"
You need to set this on each client, so that the client will by default only submit jobs to the cluster you specify in the qb.conf.
The cluster setting is a hierarchy, so you don't necessarily need to put each host in a cluster. The restriction will limit the host to only run jobs submitted with the appropriate cluster spec, and the client_cluster will limit the cluster a job will be submitted to.
As a caution, if a user submits a job with a different cluster setting, the job will not go out with the default set in the qb.conf, but rather the one specified by the user, so they should not submit the job with a cluster setting.
How do I set up a host so that job will only run that type of host?
For each Maya Worker create a host property called "host.maya" and set it equal to 1. You can either do this with the Configuration GUI or using a qbwrk.conf:
[hostname01]
worker_properties = host.maya=1
When you submit a job, the requirement must then include:
host.maya=1
What are all the numbers that go into the license resource?
Qube! keeps track of 4 numbers for licensing.
- qb.conf controls 1,
- the supervisor maintains 1,
- The tool qbupdateresource controls 2 of them.
The qb.conf value is set like this, for example:
license.maya=50
This value of 50 can only be set in the qb.conf. This is the number of licenses allocated to the farm.
Let's say the output from the qbadmin command
% qbadmin supervisor --resources
license.maya=20/50
The value of 20 is the Supervisor-tracked license resource for the farm. The value of 20 means that 20 licenses are in use from a total allocation of 50.
The final 2 components, you won't see in the output of qbadmin. They represent
- "licenses currently in use across the facility" and
- total licenses in your facility.
This is because the Supervisor needs to differentiate between licenses it's using and licenses used outside of the farm. Since the Supervisor already knows how many licenses it is using, it can determine how many don't belong to the farm and adjust the available resources accordingly.
For example, if you issued a series of qbupdateresource calls
% qbupdateresource --name license.nuke --total 50 --used 10
% qadmin supervisor --resource
license.nuke=10/50
% qbupdateresource --name license.nuke --total 40 --used 10
% qadmin supervisor --resource
license.nuke=20/50
In the example because the total number of licenses in the whole facility dropped from 50 to 40, the supervisor compensates its in use number by 10 since now there are 10 less to work with.
I need to get rid of the worker and/or the supervisor tabs in the configuration GUI
Remove either the worker or the supervisor.
Cross-Platform rendering: Linux or Mac to Windows
If going from Windows using UNC paths to Linux or Mac, one can use symbolic links to map the UNC paths to absolute paths.
Cross-Platform rendering: Windows to Linux or Mac
If going from Linux or OSX to Windows, this gets a bit trickier. Having all paths within the scenefile be relative paths is usually essential. If that is done, then one needs to then just make sure that the path that the rendering Worker is using is valid.
For example, if one submits from a Mac to the file:
/Volumes/mynet/myproject/myscene.ma
The Windows PC will likely expect something like:
//mynet/myproject/myscene.ma
You can first try this out manually when submitting a job by modifying the scenefile name that is being submitted.
Automation of this can be done by modifying the submission dialog .py file and adding a postDialog callback to adjust the paths. We are also working on solutions for client-side path translation that may handle this in future versions of Qube.
Administration
Temporarily take hosts out of the farm.
- Ban the worker using qbadmin worker --remove.
- Stop and then disable the qubeworker service on that host.
This stops the worker from showing up again if I make a --clearbanned call.
So to reinstate a worker
- Reenable and re-start the qubeworker service
- qbadmin worker --clearbanned. This brings back only those workers with an active service.
How do you remove a duplicate "down" host?
I think you can remove the offending host by using qbadmin and referring to it by the MAC address:
qbadmin worker --remove 00:30:48:5A:71:5D
If that doesn't worker, you'll need to remove it by name or IP:
qbadmin worker --remove LAX-RF-029
If you're lucky, you'll get the down host removed. If both get removed, just use --clearbanned and restart the Worker:
qbadmin worker --clearbanned
Supervisor
I can't seem to qbping the Supervisor, even though I know it is up
Have you checked your firewall settings:
Unix:
iptables -L
Supervisor won't start because it can't open port 50002
There is another system which is included on linux by default which directly conflicts with port numbers we use:
- 50001
- 50002
- 50011
If a site is unable to start their supervisor, they may need to disable the hplip service, or they can change their supervisor port number. However if they do this, every single client and worker will also need to reflect these settings (for example):
Supervisor qb.conf
supervisor_port = 10001
supervisor_sub_port = 10002
Worker qb.conf
worker_port = 10011
How do I run the Supervisor service as some other user?
- In the Services Control Panel, right-click the qubesupervisor service and select Properties from the menu.
- Click the LogOn tab and then
- click the radio button to set "This Account" with the proper login and password.
How do I backup the Supervisor?
You can use standard backup tools. Here is a list of files that are critical:
- supervisor_logfile: /var/spool/supelog
- supervisor_logpath: /var/spool/qube
- qb_directory: /usr/local/pfx/qube
- /etc/my.conf
- /etc/qb.conf
- /etc/qbwrk.conf
- /etc/init.d/supervisor
- innodb_data_home_dir: /var/lib/mysql
Migrating Supervisor to new host
- Before migration, you should let all jobs finish or at least reach a termination state (complete, fail, kill).
- Shutdown the Supervisor and MySQL.
- Install the new Supervisor. (Shutdown the new Supervisor and MySQL if they get started.)
- Copy over these files:
- Start up new MySQL and Supervisor.
- Update Workers and clients with new qb_supervisor setting.
supervisor_logfile: /var/spool/supelog
supervisor_logpath: /var/spool/qube
/etc/qb.conf
/etc/qbwrk.conf
/etc/qb.lic
/etc/my.conf
innodb_data_home_dir: /var/lib/mysql
My job package variables are getting truncated
Fix by enlarging the field size of job.data
% mysql -u root qube
mysql> ALTER TABLE job MODIFY data LONGTEXT;
Force a status change through the database
% mysql -u root qube
mysql> UPDATE job SET status = 0x140 WHERE id = <myjobid>;
I am getting 'Invalid agenda item name "1". Skipping slice.' warnings in the QubeGUI. What's doing on and how do I fix this?
Cause: Likely you have just recently reset your qube database on the same machine that was previously running a qube supervisor. The MySQL database was cleared, but the job log files are still present. The descrepency between those log files and what is stored on the supervisor mysql database is what the QubeGUI is likely issuing these warnings about.
Solution: Delete or move the job log files. They can be found at:
- Windows: C:Program Filesqubelogsjob
- Linux: /var/spool/qube/job
How do I reset the Supervisor MySQL database?
(Example commands provided for OSX platform)
- login to your Supervisor
- open a Terminal window
- run the following command
sudo /Applications/pfx/qube/utils/upgrade_supervisor -reset
Note: you may need to restart the Supervisor as well
sudo SystemStarter stop supervisor
sudo SystemStarter start supervisor
Worker
Add a lag between worker job launches
The qb.conf setting you need to use is:
worker_job_start_delay
The field is in seconds.
worker_job_start_delay = 10
How do I restart the Worker remotely? (Windows)
Submit a remote job:
qbsub --host hostname "net stop qubeworker && net start qubeworker"
(Alternate) Install sshd from Cygwin
ssh hostname "net stop qubeworker && net start qubeworker"
How do I reboot the Workers remotely?
To reboot a Worker:
qbadmin worker --reboot hostname
Reboot all Workers (Windows):
qbsub --flags host_list shutdown r
How do I centralize my worker job logs?
One can place all of the job logs (containing stdout,stderr,etc) directly in a central location. This requires modifying the configuration for both the Supervisor and the Workers.
- On the Supervisor, open the Configuration GUI. Under Supervisor Settings->Path Settings, set the "Job Log Directory" to a network path.
- Note: Use UNC paths and forward slashes (/) if on Windows.
- Note: This can also be manually set directly in the Supervisor's qb.conf file by setting the supervisor_logpath parameter.
- On each Worker, open the Configuration GUI. Under Worker Settings->Advanced Settings, set the "Job Log Directory" to a network path.
- Note: Use UNC paths and forward slashes (/) if on Windows.
- Note: This can also be manually set directly in the Worker's qb.conf file (or on the Supervisor's qbwrk.conf file) by setting the worker_logpath parameter.
How do I login to the local "qubeproxy" account on a Worker?
Logging into the "qubeproxy" local user is useful for troubleshooting if you are running in "proxy" mode for that Worker. The "qubeproxy" account is a local machine user account. The username and password for this account is:
- Username: qubeproxy
- Password: Pip3lin3P@$$wd
How do I reset the proxy password?
- Get the encrypted password string by using qblogin:
- Paste the proxy_password entry in the qb.conf:
qblogin --display --user proxyuser
where proxyuser is the username for the proxy user. After successfully entering the password, an encrypted version of the password will be output.
proxy_password = password
where password is the encrypted string.
Maya
How do I set up Maya to do path translation?
Your Windows clients need to translate the paths into something understandable by the Linux/Mac OS X Workers. To do this, we sometimes recommend the use of the MEL command dirmap. It has the capability to do the translation, and we have support for it in our Job Type. It has some limitations, so it's not for every situation.
In order to set up the dirmap, you will need to edit each users userSetup.mel file. Copy it around. In it, you add a line to enable dirmapping:
dirmap -en 1;
Then, you add the map such that the first directory is the FROM and the second is the TO mapping:
dirmap -m "<windowsDirectory>" "<linuxDirectory>"
For example:
dirmap -en 1;
dirmap -m "R:Project" "/uniserver/project"
To test if you have it set up correctly:
- launch Maya
- bring up a Maya shell
- Type dirmap -gam
You should then see your mappings as output.
When you submit the job, the mappings should be translated when the job gets submitted. It may take some finagling to get everything working.
My Maya job won't launch
Looks like your account isn't set up to include the maya bin directory in the PATH environment. Make sure you set up the MAYA_LOCATION as well.
If your shell is /bin/bash put the following in your $HOME/.profile:
export QBDIR=/Applications/pfx/qube
export ALIAS_LOCATION=/Applications/Alias
export MAYA_LOCATION=$ALIAS_LOCATION/maya7.0/Maya.app/Contents
export PATH=$QBDIR/bin:$QBDIR/sbin:$MAYA_LOCATION/bin:$PATH
On csh/tcsh, the following into your $HOME/.cshrc or $HOME/.tcshrc:
setenv QBDIR /Applications/pfx/qube
setenv ALIAS_LOCATION /Applications/Alias
setenv MAYA_LOCATION $ALIAS_LOCATION/maya7.0/Maya.app/Contents
setenv PATH $QBDIR/bin:$QBDIR/sbin:$MAYA_LOCATION/bin:$PATH
I'd like to use the "waitfor" option in MEL
Unfortunately the "waitfor" option isn't something available in the individual APIs however, there is an equivalent field in the job which the "waitfor" option can take advantage of. It's called "dependency".
Just add into your dependency field something similar to:
"dependency", "complete-job-123155"
Where complete is the state you are looking for, job is the kind of event, and the number is the job id. Note, you should use "done" rather than "complete" if you don't care if the job has failed, been killed, etc...
mental ray
Mental ray service problem
You need to change the permissions on the file below:
C:windowssystem32driversetcservices
The file contains the port numbers for mi. The problem is that under a proxy account, the proxy user may not have permissions to read that file. You could try one of the following:
- Elevate the Proxy Account to Administrator
Or
- Modify the permissions on the service file to give Everyone Read access.
3DS Max
I want to install 3DS Max in a nonstandard location. How do I inform the Job Type?
Edit the default_3dsmax_locations in the jobtypes/3dsmax/job.conf file
In-app submission not showing up with the latest Max jobtype.
The in-application submission not showing up with the latest Max jobtype when selecting the menu item. Also the QubeGUI launching from within Max is not working either. What's going on and how does one fix this?
The new in-application submission for the 3ds Max jobtype calls the QubeGUI executable and provides it the scenefile and other parameters. If the QubeGUI (qube.exe) cannot be found, then no dialog will come up.
If this happens, it is likely a path issue. From the commandline, type "qube.exe". If the GUI does not come up, then it likely cannot be found from within 3ds Max. Add to your System Environment Variables the PATH to where the QubeGUI is located (either C:Program Filespfxqubebin or C:Program Files (x86)pfxqubebin). Alternatively one can adjust the menu.ms script that calls the QubeGUI from within Max.
qbsub
How do I submit a frame render using qbsub?
Bear in mind that when you submit a command via qbsub, the Supervisor dispatches as many "subjobs" as you ask for with the "--cpus" option. Each subjob will execute the command.
That means, if the command is set up to render a range of frames, each subjob will render all those frames, wasting a lot of time and work. If you know how to set up your command to render a single frame, you can use qbsub to instruct the Supervisor to keep a list of frames to render. With the inclusion of a macro term to your command, you can instruct the Worker to request a frame from the Supervisor's list and execute the command on that one frame. Repeat this across all your subjobs, and you're distributing your frames across your farm!
Suppose you have a dumb command that renders frames with a couple of arguments:
Render --start # --end # <scene>
Where # are frame numbers and <scene> is the file.
If you submit the job naively using qbsub:
qbsub -cpus 10 Render --start 1 --end 100 scene
Your going to have each subjob (all 10 of them) render the whole scene from frames 1-100. Not good.
Instead, let's look at rendering a single frame, say 1:
Render --start 1 --end 1 scene
If we submit that naively:
qbsub -cpus 10 Render --start 1 --end 1 scene
We still do the same thing, but only do one frame. What if we could do this, but get each subjob to do different frames. It's pretty straightforward. Just give the Supervisor the list of frames, and change the command to include a placeholder where the frame would go:
qbsub --frames 1-100 --cpus 10 Render --start QB_FRAME_NUMBER
--end QB_FRAME_NUMBER scene
Now, when you submit the job, each subjob will call the Supervisor and ask for a frame to render, and substitute for the QB_FRAME_NUMBER placeholder. Easy! Each subjob will render one or more different frames, and will automatically quit when there are no more to render because the Supervisor keeps track.
My job is finished but I seem to have pending subjobs
Check to see if you have host_list set as a job flag
How to restrict a host to only one kind of job
So when you submit a job, you can do this to keep only one of your job's kind on a host:
qbsub --requirements "not (host.duty.kind has mykind)" --kind mykind command
The cool thing is you can do it with types as well:
qbsub --requirements "not (host.duty.type has cmdline)" command
There is a reverse syntax if you want to use it:
qbsub --requirements "not (cmdline in host.duty.type)" command
This tells the queuing system to filter out all hosts which have your kind of job already running on a host.
For the API:
not (job.type in host.duty.type)
Using the --type and --data with qbsub to submit a job
Here's a normal command line sleep 1000 qbsub:
qbsub sleep 1000
This is how you'd do it with the --data and --type:
qbsub --type cmdline --data '(=(cmdline=sleep "1000"))'
I found the data string by running
qbsub --xml --export job.xja sleep 1000
Examining the job.xja file for the <data></data> pair shows:
<data>(=(cmdline=sleep "1000"))</data>
So you should be able to submit an miGen job, check the xja file in the job log directory for the <data> tags and use the contents as a template.
Running Jobs
What directory will my job run in?
It will run in the same directory as it was submitted in, as long as that directory is valid on the executing Worker.
How to limit the number of renders on a host
The easiest thing to do is to submit your jobs with a memory reservation. The reservation will force the Supervisor to look for hosts with the requisite amount of memory before dispatching the job, and then block out (or reserve) the amount requested. This will serve to limit the number of subjobs running on the host to only the number that it can safely handle.
For example, say your hosts have 4 subjob slots and 2GB of memory. If each render process or thread needs 1GB or memory, you will soon overtax the machine because you will have 4 subjobs each asking for 1GB or more.
If you add a resource reservation (in MB) when you submit the job:
host.memory=1000
Then you will only have at most 2 subjobs running on the host, because that's as much memory as it can handle. Memory is a resource, so you should be able to monitor it in the QubeGUI by selecting a Worker and examining the Properties tab, under host resources.
You can also restrict jobs by limiting the number subjobs per host on a per job basis. If you have hosts with 4 subjob slots, you can just send each job a resource reservation of:
host.processors=4
However, if you have a mix of hosts with different numbers of subjob slots, then you would need to do something like this:
host.processors=1+
This will reserve a minimum of 1 slot per subjob, up to the maximum number of slots on the host. This won't guarantee a host will have multiple subjobs, so you may need to investigate the other options above.
You could reconfigure each host to have only one subjob slot per host. To do this, you will need to log in to the Worker and use the Configuration tool. Go to Worker Settings, then Advanced settings and set the Worker CPUs to 1.
Create a limited resource on each host. For example, if you're working with Maya render jobs, you can create a Maya worker resource with a quantity of 1 per Worker. You'll need to use the Configuration tool, select Worker Settings, then Worker Configuration. Add a Resource called host.maya Worker resource, and a Total of 1.
When you submit the job add this reservation:
host.maya=1
More information on resources and using the Configuration GUI can be found in the Administration manual.
How do I run the same job on every host?
qbsub --flags host_list command
The Worker cannot find a file when rendering. How can I troubleshoot this?
Qube requires that the Workers need to be able to read the scenes and textures on the network. The easiest way to check to see if a particular file or directory can be read by a Worker is to run a commandline job.
To verify that a particular file can be read by the Worker:
- Launch the QubeGUI
- Select the menu item Submit->Commandline Job...
- On Windows: Set the "Command" to "dir <path to a scene/texture/directory>" (without the " " quotes or < >)
- On Linux OSX: Set the "Command" to "ls <path to a scene/texture/directory>" (without the " " quotes or < >)
- Submit the job
- Refresh the GUI and check the "Stdout" Panel for the results if the Worker can see that file
QubeGUI
What image formats are supported by the GUI?
wxImage
This class encapsulates a platform-independent image. An image can be created from data, or using wxBitmap::ConvertToImage. An image can be loaded from a file in a variety of formats, and is extensible to new formats via image format handlers. Functions are available to set and get image bits, so it can be used for basic image manipulation.
Handlers
- wxBMPHandler For loading and saving, always installed.
- wxPNGHandler For loading (including alpha support) and saving.
- wxJPEGHandler For loading and saving.
- wxGIFHandler Only for loading, due to legal issues.
- wxPCXHandler For loading and saving (see below).
- wxPNMHandler For loading and saving (see below).
- wxTIFFHandler For loading and saving.
- wxIFFHandler For loading only.
- wxXPMHandler For loading and saving.
- wxICOHandler For loading and saving.
- wxCURHandler For loading and saving.
- wxANIHandler For loading only.
How do I setup submission-side path translation in the QubeGUI?
The QubeGUI 5.4 version uses the standardized SimpleCmd/SimpleSubmit framework for all of the submission dialogs. These submission dialogs are editable and located in the simplecmds/ directory (see File->Open SimpleCmds Directory...). A postDialog callback can be added to convert all path parameters to what the renderfarm machines expect.
Here is an example of modification to the Nuke (cmdline) submission interface that will convert the paths from OSX to Windows UNC paths:
def create():
cmdjob = SimpleCmd('Nuke (cmdline)', hasRange=True, canChunk=True, help='Nuke render jobtype', [b]postDialog=postDialog[/b])
...
def postDialog(cmd, jobProps):
# Get a list of properties that use paths
fileProps = set([k for k,v in cmd.options.iteritems() if v.get('type', '') in ['dir', 'file']])
# For path properties, substitute the string values
for k,v in jobProps.setdefault('package', {}).iteritems():
if k in fileProps:
jobProps['package'][k] = v.replace('/Volumes/myserver/', '//myserver/')
Getting GUI to work under Ubuntu
Thanks to Rangi Sutton of Kanuka Studio
- Add to /etc/apt/source.list (this is for Gutsy Gibbon)
- Run the following to add wxwidgets pgp key:
- Update apt-get repo:
- Install python 2.4:
- And change python link:
- Change default-version in this file to 2.4:
- Install wxWidgets 2.8:
deb http://apt.wxwidgets.org/wxpython gutsy-wx main
$ wget -q http://apt.wxwidgets.org/key.asc -O- | sudo apt-key add -
(returns)
OK
$ sudo apt-get update
$ sudo apt-get install python2.4
$ cd /usr/bin ; sudo rm python ; sudo ln -s python2.4 python
$ sudo vi /usr/share/python/debian_defaults
$ sudo apt-get install python-wxgtk2.8
Some website/instructions/info here: http://www.wxwidgets.org/
Where is the GUI Preferences file?
The prefs file can be found in the following locations.
Linux
$HOME/qube/qube_guiPreferences.conf
Windows
c:/Documents and Settings/username/qube/qube_guiPreferences.conf
OS X
~/Library/Preferences/qube/qube_guiPreferences.conf
Windows
I'm getting "file not available" errors on my Windows jobs
Most likely, your drives are not mapped correctly on the Worker. Here are some notes on how to make sure your Workers can properly map drives at execution time:
- The Worker will automatically try to map a) all the drives mapped on the submitting machine, and b) any additional maps specified on the submitting machine using the Configuration GUI in the section "Windows Drive Map." If you need to reference a domain account, use the Windows domain specification format (DOMAINUSER) in the login field.
- On the Worker, Qube will not automatically fill in authentication for any of the drives that were mapped on the submitting machine, so you will need to set up in advance on the Worker, the authentication for either the submitting user (Worker in user mode) or the proxy user (Worker in proxy mode) making sure to check "map at login."
- Drive maps that were specified in the Configuration GUI will be authenticated using the login and password information specified.
Check out the next two articles for more information on drive sharing with Qube.
Render errors that say "file not found." when using UNC paths.
- Our system will automatically map Windows drives based upon whether the jobs and Workers have "auto_mount" enabled. We will detect the maps on the client machine, add any maps specified in the client configuration, and send them along with the job to be automatically mapped on the Worker at execution time. If you don't refer to mapped drives, and instead use UNC, this would be irrelevant.
- Your servers will need to allow the Qube proxy user ("qubeproxy") full access to the server. In order to authenticate, the proxy account and password should be added to the AD server so that when the qubeproxy attempts to reference the UNC path, it can be authenticated. The password we locally install on each Worker is:
Pip3lin3P@$$wd
Of course, the qubeproxy user will need to be added to appropriate groups in order to have read and write permission
to the server. - We only reference the PDC for authentication, so if you have a BDC, you may see some difficulties with authentication of some machines are binding to a BDC or other secondary domain controller.
- Each job is a description of where to find the scenefile and where to write the output. This description must remain consistent across your farm, or the job will fail. For example, if you reference the scenefile at myservermayamyscene.ma when you submit the job, it can't be located on the Worker at yourservermayamyscene.ma and still work. It will fail.
- In order to troubleshoot problems with drive mapping initially submit test jobs that try to reference the directories in question, so that you can verify the jobs are able to access the server properly, for example: qbsub --host main1 dir 192.168.1.200Live_Jobs Once you get the correct directory output, you should be able to submit the render job as well.
How to troubleshoot problems with drive mapping.
Verify the job has drive maps. Verify the Worker has auto_mount turned on. Make sure the drive isn't automounting as part of the profile: Go to "Start Menu->My Computer" on the machine in question. Pull down Tools->Map Network Drive There should be a checkbox for "Reconnect at logon." You'll want to unmap the drive, and make sure that option is unchecked whenever you map the drive on that machine.
One can also submit a "test" job to check on the drive maps used on the Worker:
- Launch the QubeGUI
- Select the menu item Submit->Commandline Job...
- Set the "Command" to "net use" (without the " " quotes)
- Submit the job
- Refresh the GUI and check the "Stdout" Panel for the results of the mapped drives
How can a Windows machine be locked/unlocked when users logon/logoff?
You can use Windows' logon/logoff scripts to automatically lock/unlock a machine when users logon/off. Basically, you'd call "qblock <machinename>" in the logon script, and "qbunlock <machinename>" in the logoff script. To set up logon/logoff scripts for local logins, you edit settings in the Windows' "group policy editor":
- "Start Menu" -> "Run..."
- Type "gpedit.msc", enter-- launches the group policy editor.
- In the gpedit, in the left pane, choose "User Configurations" -> "Windows Settings" -> "Scripts (Logon/Logoff)"
- On the right pane, double-click on the "Logon", choose "Add"
- In the "Script Name", type "C:Program Filespfxqubebinqblock", or browse to the file.
- In the "Script Parameter", type "%COMPUTERNAME%".
- Hit "OK".
- Do the same for the "Logoff" script, but substitute "qbunlock" for "qblock". You also need to make sure that all users have permissions to "qblock" a machine. With qube 4.0, users do have this permission by default, but to make sure, see the "qbusers --list" output, and look for the line for user "[default]". If it looks like:
---l jcg krmpbuicseyqg-vft [default]
you're good (the 4th column's "l" means the default users have lock permission).
We're running in proxy mode, but the qblogin GUI pops up. How do we disable it?
You could remove the "auth" from the "Startup" items for users on windows workstations.
Why do I get the GUI login window?
In order for you to operate the Workers the "user" mode, each user will need to register their domain login and password with the Supervisor. That way, the Worker service can authenticate as the submitting user in order to execute the job. The GUI window you see comes up in order to make it a little easier for the user to perform this step.
How do I set up debugging for a supervisor or worker crash on Windows?
Briefly, set up Dr. Watson to get a crash dump. From the Start Menu, run these commands:
Start->Run->drwtsn32 -i
Start->Run->drwtsn32
More information on Dr. Watson can be found at Microsoft:
http://support.microsoft.com/kb/308538
How do I look at the last few lines of an output log on Windows?
On Unix, the utility is called "tail." However, you will have to find a replacement. Look here for Unix tools for Windows: http://unxutils.sourceforge.net/
UNC path is an invalid current directory path. UNC paths are not supported. Defaulting to Windows directory.
From Microsoft: You must make a registry entry to be able to use a UNC path as the current directory.
WARNING: Using Registry Editor incorrectly can cause serious, system-wide problems that may require you to reinstall Windows NT to correct them. Microsoft cannot guarantee that any problems resulting from the use of Registry Editor can be solved. Use this tool at your own risk.
Under the registry path:
HKEY_CURRENT_USER
Software
Microsoft
Command Processor
add the value DisableUNCCheck REG_DWORD and set the value to 0 x 1 (Hex).
Renders submitted through the command line fail or lock up
Due to changes in render software architecture, a mechanism called JobObject which is used by the Qube! worker disrupts the internal code in common renderers such as 3dsmax and AfterEffects. The worker must be notified not to use the JobObject. To do this, specify the disable_windows_job_object flag when submitting your jobs
ex. qbsub --flags disable_windows_job_object MyRenderer scene.ma
For more information on windows job objects, please refer to the Microsoft Developer Article:
MSDN - Job Objects
http://msdn2.microsoft.com/EN-US/library/ms684161.aspx
Linux/OS X
Set up Linux or OS X to handle jobs with UNC paths
Let's say you've got a server called "server," and on this server you keep a maya directory with projects in them. Let's call the project "default," a and the scenefile "myscene.mb."
So if you want to use UNC, this is what it might look like:
Project: \\server\maya\projects\default
Render: \\server\maya\projects\default\images
Scene: \\server\maya\projects\default\scenes\myscene.mb
This is what you'd need to submit a job. Alas, on the OS X side, it won't make sense.
First, you need to mount the drive using NFS or SMB. I'll leave that as an exercise, but what you should end up with is something like this (the underlying structure is what matters, so you can have the mount be whatever):
Project: /Volumes/maya/projects/default
Render: /Volumes/maya/projects/default/images
Scene: /Volumes/maya/projects/default/scenes/myscene.mb
Now you need to create a symlink so the path to server will work (you'll need to do this a root or an Admin user on each Worker):
mkdir /server
ln -s /Volumes/maya /server/maya
So if you do an 'ls' of /server/maya, you should see projects.
One small change to how you submit, and you should be good to go:
Project: //server/maya/projects/default
Render: //server/maya/projects/default/images
Scene: //server/maya/projects/scenes/myscene.mb
Drive Mounting: Remote files I access in OS X aren't visible on the Worker
The problem you describe is caused by a difference between the remote file services available to a logged in user (such as yourself) and those available to the host without anyone one logged in. In this particular case, when I refer to "logged in," I mean running a Finder desktop. Remote (and local) file systems accessed via the Finder are all mounted under /Volumes.
Qube runs as a daemon, and so it doesn't access the Finder at all. In general, any file you access remotely from the Finder is going to be inaccessible to any Worker running on the farm unless you take steps to make sure the Worker has those file systems already mounted.
You should consult your OS X administration documentation to learn more about how to mount your file servers either statically or dynamically so they are available to your Workers at render time. You will also want to set similar mounts on your client machines so that the paths to the files you access when you submit the job will be consistent with your Workers. Here's a link: http://www.bombich.com/mactips/automount.html
What about drive mounting on Mac OSX 10.5 (Leopard)?
NFS
Use the Utilities/Directory Utility application.
Samba
Since netinfo is gone, you'll have to manage the automount maps manually. Here is an article on how to create an automount map specially for Samba shares:
http://www.stress-free.co.nz/automounting_samba_shares_in_leopard
Alternatively, you can use the /etc/fstab. Here's an article on how to do that:
http://www.macosxhints.com/article.php?story=20071028194033157
How do I get an AFP drive to mount automatically when the job executes?
Normally, the Finder will automatically mount the AFP share if the "mount at login"box is checked. However, since the Worker doesn't launch a Finder, you will have to set the mount in a .login.
How do I set the hostname on OS X?
sudo scutil --set HostName name
This technique is referenced in the following TechNote: http://docs.info.apple.com/article.html?artnum=302044
Job Types
Can't locate JobType.pm
If your job logs contain the following error message:
Can't locate JobType.pm in @INC (@INC contains: ...
download from our FTP site (pub/jobtypes) the JobTypeLib package.
How do I set my own shared directory for job types?
Set the worker_template_path for the Worker to point to the directory containing the Job Types. Note, that on Windows, you must use UNC and the path separator is a forward slash "/".
- Try flipping the slashes to the other direction and see if that solves your problem
- You may not be able to use worker_template_path in the qbwrk.conf. If that is the case, you will need to modify the local qb.conf.
- If you ever have problems with the qbwrk.conf, use the command line tool:
- It will show you a fully expanded version of your qbwrk.conf that you can check for errors.
- You can see the current configuration of the Worker:
worker_template_path = "//qubesupervisor.as.com/jobtypes"
qbconfigfile qbwrk.conf
Every change to the qbwrk.conf only requires a reconfiguration:
qbadmin worker --reconfig
while a change to the qb.conf requires a restart of the Worker.
qbadmin worker --config host
I'm working on a Job Type, and I want to run a different version of Perl or Python?
User mode: Set the user's PATH environment variable to point to the version of the scripting language you prefer
Proxy mode: the proxy user's PATH environment variable to point to the version of the scripting language you prefer
Callbacks
What is the callback language "qube?"
- unblock-subjob-self
- block-subjob-self
- fail-subjob-self
- kill-subjob-self
- migrate-subjob-self
- preempt-subjob-self
- interrupt-subjob-self
- suspend-subjob-self
- resume-subjob-self
- mail-subjob-status
- unblock-self
- block-self
- fail-self
- kill-self
- migrate-self
- preempt-self
- interrupt-self
- suspend-self
- resume-self
- mail-status
- mail-license-status
- mail-report-status
Where is the output from the executed callback code?
Look in the .cb file for the job.
I tried to call a routine in my job submission script from the callback, and it didn't work.
The problem lies with the "code" in your callback. Callback code is literally a string interpreted and executed by the built-in interpreter selected by the "language" field. Since a job you submit is actually a data object submitted to the Supervisor, it doesn't share any code space with script that submitted it, and consequently you can't reference it.
If your socket script is a little too complicated to pack into a string without some serious debugging and maintenance grief, I'd recommend you save it out as a script and call it externally from your callback. (You can use the os.system() call).










