How Qube! Works
Qube! is a job queuing and dispatch system from PipelineFX, and while it’s designed to be task-agnostic, it’s primary use in the market is for managing render farms in the VFX and CGI industry.
Qube! is composed of three primary components:
- the supervisor: this is the dispatch manager, and is always a dedicated machine
- clients: these submit jobs to the farm (“producers” in queueing parlance)
- workers: these are the execution hosts in the farm (“consumers” in queueing parlance)
Qube’s licensing paradigm is simple: the supervisor is both the license server and the sole consumer of those licenses, and the installation of any Qube component on a machine does not require a license.
Any number of worker licenses are installed on the supervisor and keyed to the supervisor’s MAC address, and a single license is consumed for every worker that is running one or more jobs. Internally, the supervisor merely counts how many workers are running jobs (regardless of whether that worker is running single or multiple jobs), and decrements the license count by this number. In the absence of metered license usage, when the number of workers running jobs equals the number of licenses, the supervisor will not start new jobs on any more machines.
Metered licensing allows the supervisor to “oversubscribe” the number of installed licenses; it will continue to dispatch jobs to additional workers regardless of the number of installed (or “prepaid”) licenses, allowing the number of workers running jobs to exceed the number of “prepaid” licenses. The license usage is tracked at per-minute precision and reported in 15-minute batches to a cloud-based service at metered.pipelinefx.com, and any license usage in excess of the number of prepaid licenses for every 1-minute interval is stored in units of worker-minutes.
The metered license billing for any time frame can be computed by summing up the worker-minutes for that interval and multiplying by the per-minute rate of $0.00166/minute, which works out to $0.10/hour, billed in 1-minute increments. Customers are billed in arrears, and customers who remain delinquent past 90 days have their metering authorization revoked by a PipelineFX at the license portal (via a simple checkbox), which in turn de-authorizes the supervisor at the next reporting cycle.
The Qube! Supervisor
The supervisor should always be a dedicated and highly-available host, with the exception that it’s sometimes convenient to combine it with lightweight services such as license servers which should also be highly available and whose absence would mean that at least some of the jobs running on the farm would fail.
The supervisor is continually saving the state of the jobs and workers to a local PostgreSQL database, so it may be relied upon to pick up where it left off should the service or supervisor host need to be rebooted while the farm is processing jobs.
The supervisor doesn’t need access to the authored content or applications referenced by the jobs (the scene files, textures, etc), its sole function is to match jobs to workers.
The supervisor inspects any new job’s requirements and compares those to the list of workers to discover which workers may be viable dispatch candidates for that job. This is an ongoing process as workers come on- and off-line, jobs have their requirements modified post-submission, external resource such as third-party licenses become exhausted or available, or jobs complete and free up worker resources.
Network Connectivity Requirements
The supervisor must be able to have two-way conversations over TCP/IP with the workers and clients (with either end initiating the conversation), and also UDP connectivity between the supervisor and the workers.
Hostname lookup on the supervisor for the worker hostnames is preferred.
The Qube! Clients
Clients submit jobs to the supervisor, which in turn dispatches those jobs to any workers it deems a suitable candidate for that job. Jobs are typically submitted from inside third-party applications such as Maya, Cinema4D, and Adobe AfterEffects.
For more complex workflows, Qube has a Python API which can be used to construct and submit jobs. Jobs are represented by python dictionaries, and these lists of these dictionaries are passed to a qb.submit() call.
Clients access the jobs’ authored content as local or network-mounted file systems, and the jobs contain references to these paths. In the event that cross-platform rendering is necessary, Qube provides the capability for cross-platform path translation maps applied per the worker’s operating system, or customized for individual workers. In either case, these maps can be defined centrally on the supervisor or on the remote worker.
The Qube! Workers
Workers accept (or possibly reject) jobs dispatched to them by the supervisor. A client may also be a full- or part-time worker.
Workers do not have to be pre-defined on the supervisor; the worker service attempts to auto-register with the supervisor when the service starts, and the supervisor can be configured to accept any host attempting to register, any host from a particular IP address range and/or naming convention, or only specifically named hosts.
Qube! does not have an intrinsic file transfer agent (FTA). The workers must have access to the same file system as the clients, or in the case of a FTA external to Qube, the workers should have a path translation map defined such that it can map between the client’s pathing and the file system accessible to the worker.
Avoiding Worker Oversubscription
One of the primary criteria for the selection of viable dispatch candidates for any one job is the concept of job slots. Workers will have one or more job slots defined, and the default is a job slot for every core installed on the worker host.
A worker’s slot value is usually represented in the user interfaces as used / total, so an empty 24-core worker will show 0/24, and a completely full worker will show 24/24.
Jobs are submitted with a reservation value, the simplest reservation case being the number of job slots that a single copy of that job will reserve or consume as it runs on a worker. The normal practice is to reserve a job slot for every core that the job is expected to utilize when running on a worker: a task which can spawn 8 thread should reserve 8 job slots, and a task which is expected to utilize all cores on a host should reserve all slots on a worker. There is a simple reservation syntax for dynamically reserving all cores regardless of the core/slot count of the worker.
The supervisor will compare a job’s slot reservation against a worker’s available slot count (which is the difference of total slots and currently reserved), and only workers with sufficient available slots will be considered as viable dispatch candidates.
The total slot count for a worker can be quickly modified so that the worker will no longer accept new jobs or work. Typically the total slot count is set to 0, and if the worker is currently running a job, it’s used slot count will remain unchanged until the job completes. In this case, an example of a locked worker completing a job accepted prior to being locked may look like 16/0, A locked worker that is not running any jobs will look like 0/0.
It’s optional when locking a worker to immediately purge any jobs running on it, or allow them to finish. If allowed to finish, the worker will not accept any new jobs while it remains locked.
Client machines doing double-duty as workers are usually locked during the day (and purged when locked in the morning to kick any remaining jobs off so that the user may have full access to their workstation), and then unlocked once the user is done at the end of the day.
Qube! System Metrics
PipelineFX offers our customers a way to collect, store, and view various system performance metrics relating to the supervisor host and Qube in general.
The online performance metrics are a feature introduced in Qube 6.10.
- An FAQ of System Metrics / Online Performance Charts is available at https://www.pipelinefx.com/system-metrics/
- An overview of this System Metrics in our Online Docs is available at http://docs.pipelinefx.com/
The gist of it is that you install the qube-dra and qube-system-metrics packages on your supervisor (if your supervisor can access the internet), and it should begin to start reporting data within the next 10 minutes.
If your supervisor cannot access the internet, you can install the DRA on another host, either on the same network or in a DMZ; the only requirement is that your supervisor can reach port 5001 on this DRA host, and the DRA host must be able to reach port 443 on https://metrics.pipelinefx.com
You then login to the Qube online portal at https://metered.pipelinefx.