Friday, August 6, 2010

EDA Workshop Debate Erupts Over Parallel Programming

The Electronic Design Processes (EDP) workshop may be a small, technical, IEEE-sponsored event, but that didn't stop a lively debate over the feasibility of parallel processing from erupting last week. The debate comes as EDA vendors, including Cadence, are working hard to port their software to multicore processor platforms. Presentations started with a somewhat skeptical presentation by Patrick Groeneveld of Magma Design Automation, and then moved on to a really, really skeptical presentation by Patrick Madden of the State University of New York (SUNY) Binghamton. These were followed by what I'd call a balanced, hands-on presentation from Tom Spyrou of Cadence. (He has been working to parallelize the Encounter Digital Implementation system, and I interviewed him last year about the challenges that entails).
If there was a silent presence in the room, it was Gene Amdahl, whose "Amdahl's Law" reminds us that the gains you get from parallelizing code are sharply limited by any sections that cannot be parallelized. As Patrick Groeneveld pointed out, if you can parallelize 50 percent of the code, you can only get a maximum 2X theoretical speedup. The reality is probably closer to 0.8X. Since all multi-tool EDA flows include some non-parallelizable code, no one should expect a 7X speedup on 8 processors.
Tom Spyrou (right) answers a question from Patrick Groeneveld (orange shirt) at the EDP Workshop April 8. Patrick Groeneveld - The "Last Resort"
"Parallelism is the last thing I look at if I have a problem in my flow," said Patrick, who serves as Magma's chief technologist. "I will try everything else first because I know parallelism has some problems." Sometimes, he noted, parallelizing code can actually cause a slowdown because of the "hidden costs and extra overhead" of making code parallel.
Patrick identified six problems he's experienced in trying to parallelize EDA software:
  • Contention arbitration. Two processors are running and both want to use a shared resource. They write to memory at the same time, and - crash! This is typically fixed by putting locks in the code, which leads to the next problem.
  • Easy to lock up performance. Locking every read "trashes your multi-threaded performance."
  • Finding intelligent partitions. You want to chop up the chip and send each to a different processor, which works for some problems, but "partitioning is evil" for synthesis and optimization.
  • Repeatability is the silent killer. If you want repeatability (getting the same result from the same operation), you will lose processing efficiency in parallelism.
  • Load distribution. It is very hard to estimate workloads in advance, but it's necessary to schedule CPU resources.
  • The sheer complication of things. Not many programmers are capable of writing parallel code. Bottom line: "parallelism works but gains are limited. Don't be disappointed."
Patrick Madden: Advocates are "Delusional"
Anyone who was a little depressed from Patrick Groeneveld's presentation was not helped by Patrick Madden, whose first words were that Groeneveld was "way too optimistic" about parallel programming. I was not surprised. In a panel (with the real Gene Amdahl) that I wrote about a couple of years ago, Madden said that multicore programming is "the end of the world as far as I'm concerned."
Madden, however, is working on a problem that should be important to parallel programmers - speeding up the "hard serial bottlenecks" that cannot be parallelized, and thus give rise to the limitations of Amdahl's Law. And where do we find such bottlenecks? In papers and books by parallel programming experts, he suggested. He told of one Design Automation Conference (DAC) paper that presented a 100-fold speedup with a parallelized algorithm. It later turned out that a time-worn serial algorithm actually did a faster job of solving the same problem, with no special hardware required.
"I would think that when I pick up a technical paper from a major university, things should be correct," Madden said. "Instead there are monumental flaws presenting a new technology that's orders of magnitude slower than the existing solution." With a slide including a photo of Bernie Madoff, he suggested that some of the published parallel processing research is fraudulent.
Audience members reacted, saying that Madden was "too skeptical," focusing on a small amount of bad research, and not giving new programming methods a chance to catch up. "If I do video, a lot of stuff is embarrassingly parallel and that's how it's done," one said. Madden agreed, but went on to suggest that some parallel programming advocates are "delusional" and are seeing what they wish to see after taking grant money to do parallel computing research.
Tom Spyrou: Yes, We Can Parallelize Legacy Code
It may have been a hard act to follow, but Tom Spyrou spoke next, in a calm presentation that showed how legacy EDA code can be parallelized within the context of Amdahl's Law. Spyrou focused on the "fine grained" problems that are most difficult to parallelize.
"For multicore, with 4 to 8 cores, there are shortcuts you can take to get legacy code to run in parallel," he said. "When it comes to manycore, 128 cores and all that, I think it will have to be written from the ground up. I think a lot of fine-grained problems aren't going to scale, just like [Madden] said."
Still, some clever things can be done today. One is to create a "parallel reducer server" for parasitic reduction. The idea is to take a net and reduce it to a math transfer function, using parallel calls to the server. There are no threads or thread safety issues, and the approach provides data locality, which will be important for manycore.
Another example is parallel noise analysis. It is hard to make legacy code thread-safe, but you can use a Copy on Write fork() construct as a short-term solution. It shares only "read" memory with its parent, but if memory is modified, it is automatically duplicated in the child.
While some easy-to-parallelize applications are more scalable, Syprou said that the overall Encounter netlist-to-GDSII flow is now running about 2X faster on 4 cores, thanks to the work that's been done to parallelize legacy code.
My Perspective
I think there's some truth in each of these presentations. Legacy serial code is really hard to parallelize, and not everything should be parallelized. University research may well focus too much on parallel programming and not enough on serial bottlenecks. But let's not throw up our hands and say it can't be done. Let's get to work and do what we can to speed up EDA software for multicore platforms, just as Spyrou has been doing since 2006 at Cadence.

Richard Goering

No comments:

Post a Comment