Render Farm Management for Digital Media Pipelines



Try It FREE 30 Days


EDU Site Subscription

Microsoft Azure



AWS Google Oracle



“We’ve rendered over 5 million jobs and still counting…” - ReelFX
AWS Google Oracle

All articles:

CEO dishes insight into render farm management


The PipelineFX CEO dishes insight on render farm management, pipeline efficiency and people’s wishful thinking about cloud-based production support.

Nothing brings studio tech mavens more enjoyment than showing off their render farms. They beam ear-to-ear grins as they walk you through entire floors filled with domino-like racks of servers, all humming in harmony within exalted, almost mystical refrigerated airspace. Nothing brings studio tech mavens more moments of absolute terror than when those glistening, gazillion dollar render farms are brought to their knees by a production short on time and long on visuals dictated by the latest jump-cut loving music video director castoff with a taste for mirrored surfaces and pouting on set. If you listen closely, you can hear Scotty and Kirk in the distance…”I’m givin’ her all she’s got, Captain...Well, all she’s got isn’t good enough! We need more power!”

Enter Richard Lewis. Richard is CEO of PipelineFX, makers of Qube! render farm management software and providers of a robust set of consulting, design and educational services being used all over the world. Steeped in the technology from his years working with Square USA on the production of Final Fantasy: The Spirits Within, Richard understands first-hand how technology, in the hands of artists, can lead to breathtaking work plagued by tragic business practices.

I recently had a chance to sit with Richard and talk about the rise of the global production pipeline, effective methods of render farm management and this newfangled thing (which is actually a new skin on a really old thing) called the “cloud.”

Dan Sarto: Talk to any visual effects supervisor or CTO and invariably the discussion turns to pipeline and production efficiency. Pipeline, Pipeline, Pipeline! Why the recent emphasis?

Richard Lewis: Pipeline efficiency is a key thing. There’s a lot to consider with pipelines. Brute force means buy more rendering machines. Just keep buying rendering machines and keep doing everything the way you’re doing it. When the market collapsed a few years ago, everyone cut their R&D departments. The expensive programmers that wrote all the custom software, many [of those jobs] were eliminated. Now, even though the economy has recovered, no one has rehired all of their R&D department [staff]. In fact there are some incredibly small R&D departments from what we used to see. Square had 40 people, DreamWorks when I visited them in 2003, I think they had 100 R&D people.

But I always say, don’t get tricked into writing something you can buy. If you can buy it there is not a chance you can write it for less, no way. This is a problem across the industry. I have an architectural degree; I grew up in art school. Artists just want to “do it.” It’s a creative thing. It doesn’t make any business sense. Unfortunately we’ve lost The Asylum, we’ve lost Cafe FX, these were artist-driven companies. They did great work.

In this business, you need to know what things cost, you need to do things efficiently especially relative to the competition. They can do things intelligently from a business standpoint and they can do things at a low cost.

DS: So where does render farm management play into this equation?

RL: Render farm management is automation software and it’s completely an efficiency play. We’ll ask companies, “What’s your CPU utilization average per month right now?” No idea, not a clue. They’ll respond, “I don’t know but, our rendering is just slow, it’s taking too long. We need more machines.”

Qube!'s new artist view GUI.

Qube!'s new artist view GUI.

DS: We need more something.

RL: Yeah, probably you don’t. You need to first know what your utilization is, the mix of applications, your average jobs per host, and what’s your average render time, those kinds of things. Our software has a database and we collect all that. With a lot of charting built-ins you always know those things and then you can make intelligent informed decisions.

Do I need more software licenses? Was that why my render pending load was bound? I didn’t have enough RenderMan licenses? And you can say, “Look, our render pipeline is bound by render licenses not by available hosts, or I have plenty of render licenses, but not enough available hosts. Maybe we should use our desktops at night before we go buy a bunch more.” It’s just business metrics. So business intelligence is really kind of the next phase. I remember when Troy [Brooks], my co-founder, left Square, went to work for Electronic Arts and got involved in game pipelines. You know [back then], game pipelines were just sort of one year. Just build a team, build a game, ship it for Christmas, tear it down, start over.

There was no saving of assets. There was no sharing, no information, no one shared resources. He used to say that if render pipeline efficiency was kind of a five game pipeline, they’re about a minus three. It was the Wild West. But back then, the game revenue was so crazy, it would just matter to get that thing out the door on time. We don’t care about efficiency if it’s pretty profitable. But, over time industries mature and you get competitors and eventually it does matter to do it efficiently. Visual effects still, honestly, we’re still really not there. Our selling point is you should really do this as efficiently as you can. But it’s not on the top of everyone’s list. Payroll and artist compensation, as always, that’s the cost of a visual effects business.

But consider this. [As an artist]First I’m going to think about something, then do something, drawing or making something on a computer in 2D or 3D.

Then you render it, so you can see what you’ve done. Then you review it with your boss or peers or someone, then you think about it again and then you redo it again and then you re-render it. Well that cycle, that creative iterative loop, is unique to computer graphics. If you are sculpting something, you whack it and you just step back and review it. There it is. But on a computer, you’ve got to process these frames.

So there’s a whole process to it that goes in your production tracking system schedule. There is a daily so the right people can review it with you and sign off on it, and then you can go back, think about it, and review it. But over time that rendering piece keeps getting longer because you’ve added characters. Now there is skin on them, now there's cloth, now the background is in, now our lighting team has done their work and that just keeps getting longer.

You’re compressing the amount of time to review, re-think and redo. So even if you keep the same amount of work and you are okay with all the infrastructure you’ve got, by putting in a more efficient render pipeline, you’re going to decrease the render piece of that iterative loop and you will increase the review, re-think and redo time. So the work quality goes up. Scientists and astronomers at NASA’s Conceptual Image Lab (CIL) were using a mishmash of off-the-shelf software to create project visuals, including Maya, Mental Ray, Lightwave and Cinema 4D.

Images from NASA’s Conceptual Image Lab

Images from NASA’s Conceptual Image Lab

DS: It just provides that much more time to qualitatively assess and redo the work rather than waiting for it to render...

RL: That’s right… And you will accomplish more on iterative loops. So if I can revise it 36 times instead of 24 it’s going to be a lot better. Or studios and customers can take more work so they can do the same 24 iterative cycles, but now they can do an entire other job. That means a lot more revenue for those same people.

So there is never a case where you would not want to decrease the render piece of that iterative loop. As much money as you can make, you cannot have a fast enough render pipeline. You never will. We will never have the fast enough desktop. No laptop we carry will ever be fast enough. It’s not possible. We’re always waiting on it.

DS: It would make sense then that anybody responsible for managing such a pipeline would immediately go to the solutions that would translate into those types of savings. Why don’t they?

RL: There are some challenges. Most of our customers have been through multiple render management systems. It kind of goes like this in the small studio. They will start up with something they know, doesn’t matter what it does, they know it and it’s one piece they don’t have to figure out because there is a world to figure out with the graphics. A lot of studios are taking on bigger projects than they have ever done before and that’s really the problem. They would just like to reduce risk by using the devil they know. We’ve heard that a lot. “We’d love to have something new and better, but we already know all the problems with this other thing and we just sleep better at night.” But you run out of that eventually though, because eventually you’re not going to be able to afford just more servers and more RenderMan licenses to throw brute force at it. You know, you can keep putting buckets under the holes in the roof, but eventually the whole floor is covered in buckets and you can’t walk anywhere. So you really do need a new roof.

A lot of visual effects studios are starting to have CTOs, they’re starting to have IT directors. It’s more than just working with a reseller, they’re actually drawing visuals and architecting their infrastructure. A lot of them have to do these multiple site infrastructures. How do we get our dailies moved to the Vancouver studio so that the producer can watch it there, but, we did it and rendered it down here?

With these distributed pipelines, it’s the whole workflow they have to think about. Our software, because it’s automation and it has a database and has triggers and things, is a piece of that glue [that holds the distributed pipeline together]. We have customers that have created pretty nifty dailies systems that when the rendering is done, the files get scheduled to be transferred, so overnight they are prioritized and transferred. If they are using Shotgun production tracking, Shotgun is notified that it’s transferred. It’s scheduled in the dailies program and then they watch it the next day, sign off and it’s this whole loop.

So while that kind of automation really helps a lot, it’s still very custom. Those are the things you can’t buy, the glue between all these applications. I always tell people don’t fire your programmers. Write all the stuff you can’t buy, stuff that automates your business. That’s your secret sauce and your difference. But, we are still a long way from our industry all owning commercially supported solutions for most of what they do. 80% of pipelines still are not commercially supported solutions. And that’s always more expensive.

Qube! integrated with all the lab’s software, replacing their open sourced render management code and upping their output and production efficiency.

Images from NASA’s Conceptual Image Lab

Images from NASA’s Conceptual Image Lab

DS: That seems to be a huge and expensive burden. What do you see next for management of these large globally distributed pipelines?

RL: Well, if they’re running some kind of proprietary render management system there’s few studios that want to try to support that in a remote location or a foreign country. There is no documentation on their internal software, the smart guy that wrote isn’t moving to Mumbai or wherever, he’s somewhere on the wrong time zone, and it’s not a deep team of support engineers on that one custom application. It's usually one guy, two guys. So it’s very difficult to spin up another satellite studio and run a whole bunch of proprietary software. There are a lot of challenges, and we [PipelineFX] have an opportunity there, even if a studio has a lot of proprietary software, to win the satellite studios.

But, beyond that the big challenge is the studio itself has to distribute the pipeline across every application they use and that’s sort of far beyond us. But, if they can do that, and if they can move all their textures and all their scene files and ship them around the world, then they can certainly use our software. We have a way of running multiple supervisors and you can choose the supervisor you’re submitting to. People have even written software that profiles jobs, and if it’s low data and makes sense, takes a long time to render like a Houdini distributed simulation, that’s worth sending far away to use remote resources. But in general, shipping your rendering away is kind of a hairy thing. It normally doesn’t make a lot of sense. Few people do it effectively. The cost and bandwidth and transfer time to wait for the return of the frames, even if the frames only take, let’s say, five minutes to render, it’s still an HD stereo set of frames coming back, it’s a lot of data and does it really makes sense to send it out? And that segues into the cloud discussion.

DS: Please! What is the deal with “the cloud?” How can you possibly push all this sophisticated and gear-intensive processing up to a faceless “cloud” of computing resources? It’s supposed to eliminate the need to have dedicated resources and it’s more cost-effective and you…

RL: It’s not that I’m not a fan of the cloud, it’s just I’ve been through all this already. If you remember in the dot com era we had storage area networks. I sat in a Computer Associates conference, watching the CEO’s of ten venture capital funds, and they went out and bought up all the data systems, all the EMC storage, stuck them in data centers all over. A storage cloud - they didn’t use the word cloud, but that’s what it was – an internet based storage cloud. You’re going to plug an Ethernet cable in the wall and get data storage, tier one, backed up, ultra-reliable high speed, primary storage…

DS: They built bunkers all around the world to house this stuff.

RL: [It would show up] On your utility bill, just like your phone bill and electric bill. They all went out of business and it never happened. So that’s what the cloud is to rendering. It’s not going to happen. I know NVIDIA is doing stuff and Autodesk is doing stuff, but that’s not what it’s [the cloud is] for. The internet is not for that either. The internet is not for us to have high speed data storage for our hospital medical records for the ER [to access] this moment. That’ll never happen. Hospitals buy tier one data storage in their data center, guaranteed delivery when they need it. That’s the only way that can operate. You’re not going to get it through the internet. The internet doesn’t deliver us 10 GB to the home…

DS: As it is, ISP’s are beginning to put governors on the amount of data that people can get with all the video being watched online, let alone real high speed applications…

RL: What percentage is Netflix of the internet [traffic]? The internet is an amazing thing, but it isn’t going to be the edge condition. Digital media rendering is an edge business case in the world of business. I use Google docs extensively and to me, that’s the cloud. I have no idea where that spreadsheet is. No clue. It’s in Google somewhere. But, it’s always there, every time I open it. But it's small data, very low security requirement. I have a password and it requires no specialized software.

There are three problems with the cloud for digital media content creation. Number one is licensing. Recently that has been addressed a little bit. I understand Pixar is allowing services tied to Weta [Digital] out of Australia to rent RenderMan licenses on demand, but they’re toeing the water with that. I know NVIDIA has a GPU thing going with Autodesk for still renders for 3ds Max but again, it’s kind of a specialized little thing. Licensing is number one. You cannot just on demand rent all the software you need to do a visual effects project, just for three days. It’s just not available. How do I get that, if no one is offering it? So until all that changes, and I don’t expect it ever to, licensing is number one.

Assuming you bought all of your own licenses that you need and you’re going to float them to the cloud, number two is bandwidth. And maybe you have unbelievable bandwidth to your cloud storage wherever that is, but I don’t know anyone that does. I know a lot of people that have very high speed bandwidth to private clouds, which is a whole different thing. That’s just on a network. It’s in a data center, it’s in another city, you own it, it’s yours. IBM has been running that on demand forever. [For years] Companies have been putting up remote data centers for disaster recovery and whatever over private high speed links. That’s not the Cloud. “Cloud” means I send it to the internet and I don’t know where it is, and I don’t know who has it.

And then the third part is security. In our business, stuff gets leaked all the time. “Where is our stuff? I don’t know.” So you have to convince them [the studios] that it [your cloud storage] is secure. You have to provide the kind of bandwidth you really do need. There are special things like Sohonet, and companies that are addressing that as a more private [solution]. But, again, it’s not [accessing] the cloud from my living room…

DS: It’s more like your own private network.

RL: Right. And after saying all that, we have a customer that uses Qube! in the cloud, and it's Xtranormal. Xtranormal does those funny little animations. Writers use it and write dialog, and little guys talk to each other and it’s free, it’s on the internet, you pay for extra characters and things. But you see, it is user generated content, no security. It is a free renderer and it is really small bandwidth. And they use a combination of servers in their office for all the heavy lifting and the Amazon Cloud for audio encoding and for preview renders.

But, what I learned in working with them is, and I have had studios tell me this, “Oh, we are just going to fire up 500 nodes in the Amazon cloud to do this render.” I say, “Oh really? They’ve got 500 nodes waiting for you whenever it is you decide to be ready?” They would have to have a million machines, so that 500 are always available for you. How could they? Then, in working with them I found out you are likely to get 25 machines. After you have used those 25 for a really long period of time and paid them a lot of money, maybe they will give you 50, maybe. And even then, the Amazon cloud nodes are not designed for digital media rendering. You can’t send a 50 GB Harry Potter scene to an Amazon cloud node, and you would never want to. We don’t want to think in visual media that we’re an edge case, but we are. We are a bunch of artists who like computers. And that’s not an average situation.

DS: So where do you think folks like Autodesk and NVIDIA are headed with this? What is their angle, their strategic vision?

RL: I’m just going to go out on a limb and predict they’re not going to make any money doing this. NVIDIA is not going to make any money; Autodesk is not going to make any money. Eventually from a business standpoint there are better places to put their money to make money. But the good thing is it does enable a little bit more use of the technology.

We’ve even thought maybe schools just use our software and they allot, as a grant, a certain number of hours of rendering to every student. Beyond that maybe the student pays on a credit card themselves for additional hours of rendering. That’s very sort of cloud-ish but it is on capacity that the school owns. Maybe someday it could be on capacity outside the school but the easiest would be to do that around their home.

But that’s a different kind of licensing model. It’s a service to the student. If they really want to do great work for their portfolio and spend a lot of time on it they can pay a little more. Maybe more rendering is a service. Put more of the burden on the student to eventually rent the applications they’re using rather than ask the school to provide it.

That would take a lot of coordination of vendors and things like that. But, that’s kind of an answer of how we could start to help more people do more rendering without a huge capital expense. So Autodesk and NVIDIA are letting people taste GPU rendering with a cloud offering. That’s a single artist or a single architect that wants to do a still rendering. But if you’re using 3ds Max and you have 100 artists and you’re doing animation, can you really do that work and guarantee to your customers it will be done on time by using a shared unscheduled cloud resource? I mean, I just don’t know how those business models work. I used to rent equipment in Hawaii in the day when SGI equipment was really expensive. RFX, one of my resellers, did a lot of rentals in LA but that is a very hairy business. It’s a similar business. I’m going to carry expensive computing capacity, waiting for you to need it and then you want to use it you want it really cheap.

LMU’s lab was filled with isolated desktops that limited students to rendering on single workstations. Long render times tied up computers and prevented students from accessing their project work.

LMU’s lab was filled with isolated desktops that limited students to rendering on single workstations. Long render times tied up computers and prevented students from accessing their project work.

DS: For a very short period of time…

RL: For a short period of time. Then, whatever I bought is depreciating like mad.

DS: Yeah. Sounds like a good plan to me!

RL: Right, and it didn’t work out on enterprise storage. You know, your disk drives are doubling every whatever - Moore’s Law and all that. So those companies that invested millions of dollars in data center storage, two years later you can buy all that for 5% of the cost. So it’s just not a business model. A better business model is to outsource complete management of your facilities, the way General Motors does with EDS...

DS: I was going to say the EDS model.

RL: Yeah, then you’re bringing in experts who provide value, and you don’t have to train people and recruit people and hire people. That’s really an HR kind of a problem. And so to that extent that’s where our training and our consulting comes in. We’re not your render wrangler, but we’ll get a render wrangler up to speed, trained, and support them so they can be effective.


DS: Where do you see render and pipeline management headed in the next 2-5 years?

RL: I honestly think we are at the very beginning of computer graphics and render pipeline [management growth curve]. This whole “pipeline, pipeline, pipeline” you’re hearing is because everyone is now seriously thinking about it. Another barometer is when I started with AutoCAD; I started an AutoCAD user group. You go to the book store, there is not a book on AutoCAD, the command reference was all you got and it just said what the commands did. No idea how to do anything. There were no classes at high school or community college, we were on our own to talk to each other about how to use it. Well, go and look for all the books on Amazon on render pipelines.

DS: There's zero.

RL: Yeah we have a draft of something, but you know, there aren’t any. There are few books on pipeline, period! It’s just the Wild West. It’s still the very beginning. We just got computers in the 80’s for the first time with a floppy disk! This is all just brand new.

Almost all the computers in the world are not managed in terms of rendering. They’re not optimally managed, they’re not tied together, they’re not shared. In universities, we have some very smart clustering, which is priority based. So if I am an After Effects user on a Mac, but I’m not at school today, that NUKE user in the next room can actually render on my machine and nobody even knows it. It’s just because no one is logged into this machine yet and it’s a part of a cluster. Now if I walk in and send a job, I’m going to push that job out automatically and start my job just because of the way the management is set up. Just that base level of sharing resources in a school? Hardly anyone is doing that. Universities have hundreds and hundreds, if not thousands, of computers times the numbers of CPU cores doing nothing most of the time. No one is touching them. Now if a student got access to anywhere near that kind of capacity, their experience in that creative iterative loop of computer graphics can be multiplied ten to 100 times.

So I don’t think there is going to be some revolutionary thing in pipelines. It’s not like the cloud is going to come in and solve everybody’s problem, we’re going to do animated features on an expense account, dialed up over the internet or anything like that. What we are seeing with our partners like Shotgun, TACTIC, RV from Tweak, we are seeing small software companies like ours that start up to address the infrastructure of digital media.

The Foundry, Auto Desk and Adobe, they all make applications, but, they don’t make management systems between the applications. So we’ve made those and we are all talking to each other, we’re using Qt and Python and modern languages, making things portable, with plug-ins, all talking to each other. Now we are starting to see studios adopt commercial infrastructure and applications [technology] and start to have some IT discipline.

Customers are demanding it now. I was in Germany. Marvel uses some studios there and they did a whole IT audit, and they wanted to know the brand name of their [the German studios’] firewalls. Lots of these studios were like, “I use a Linux firewall that’s a .exe file,” or “there is no brand.” Marvel said, “That’s not good enough, we need a brand.” So a lot of that discipline is starting to be required by the customer. So for us, we think we have a good ten years of just helping people get to a baseline of efficiency within render pipelines. There are some leading studios trying to do these big distributed kinds of things, but I think 95% of the digital media market in the world needs to try to get their act together at home...

You know, it’s a niche business and we find it interesting. We like rendering in computer graphics, we like the work, and the companies. We do a lot of post-production and broadcast, fast turnaround. We have The Daily Show and Comedy Central, and The Dr. Phil Show and people like that. They just need a reliable, fast, easy to use system that works with every software they might need. And there are thousands and thousands of those. So it’s an enormous market, it’s very fragmented, people use stuff that’s free, they render on their desktop, they don’t share resources yet. It sounds very basic and it’s not really glamorous. But that’s the blocking and tackling of getting into the computer graphics market, which is just getting started.

Comments are closed.