Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
priznat
Jul 7, 2009

Let's get drunk and kiss each other all night.

boovax posted:

Thanks, I'll check this out. I noticed that the block memory generator only offers RAM with an AXI interface, ROM is only available with 'native', so I guess I'd have to write (or does Xilinx make a generic AXI4-Lite shim) a shim to get it to AXI/accessible from PS.

Curious if Xilinx has improved the IP block creator with 2013.3, I think you can create AXI4-Lite peripherals with that, in theory.

I looked at using it back with 2013.1 but it was not worth the time/effort in messing with at that time.

Adbot
ADBOT LOVES YOU

Schmerm
Sep 1, 2000
College Slice
Could you simply infer a ROM from VHDL/Verilog? Both Xilinx and Altera support this for synthesis, and you can use the $readmemh Verilog system function to initialize the ROM contents from a file.

Altera: http://www.altera.com/literature/hb/qts/qts_qii51007.pdf p38
Xilinx XST (probably works in Vivado too): http://www.xilinx.com/support/documentation/sw_manuals/xilinx13_1/xst.pdf p192

Delta-Wye
Sep 29, 2005
Is there a fairly clean way of implementing a #ifdef-sort of thing in VHDL and/or Xilinx tools? I have some device-specific code within a larger project that I would like to decouple from the hardware. Primitives are being used for a pretty good reason, so having the ability to dynamically select code blocks based on family would be great. It appears as if a generate statement would do what I want, but I don't want to pass a new generic (family) all over the project. Is there a built in macro or something?

I think in code would be better, but I'm sure I can select .vhd files within the makefile as well (as it has all the part information to pass to the tools), just seems clunky to maintain multiple repetitive vhd files. Thoughts?

priznat
Jul 7, 2009

Let's get drunk and kiss each other all night.
VHDL configurations perhaps? I haven't used them in ages but they are handy if you are targeting different base devices.

Delta-Wye
Sep 29, 2005

priznat posted:

VHDL configurations perhaps? I haven't used them in ages but they are handy if you are targeting different base devices.

Thanks! I'll check it out. It looks like the parts share a unimacro - any reason I can't just instantiate that instead? I thought I would give it a try and I'm having a bit of trouble and I don't want to waste a bunch of time going down that road if it's a dead end :ohdear:

priznat
Jul 7, 2009

Let's get drunk and kiss each other all night.
I am fairly certain unimacros are consistent between families, however they often have a generic where you set the family (spartan6, virtex7 etc) so just check for those and use a config for that.

Usually the differences are just additional features or ports and you could set up a generic minimum one that is good across a few families.

movax
Aug 30, 2008

Posted on the Xilinx forums about this as well, and I'm afraid of the answer, but for Zynq designs, can you export_hardware to SDK without being in a project/block-design flow?

I hope so :( Really want to avoid block diagram designer if I can (or anything that is primarily GUI driven because I need this to be 100% scriptable / adaptable to our internal build-system).

priznat
Jul 7, 2009

Let's get drunk and kiss each other all night.
I couldn't find a way myself with 2013.2, I just did a block with the Zynq in ip integrator, set up the mio/emio and punched the needed axi ports out. Exported that to the sdk, and further hw changes didn't affect it for the most part.

Sucky solution I know, there has got to be some scriptable way to do it.

I miss doing Zynq stuff already :(

minidracula
Dec 22, 2007

boo woo boo
Does anyone have reasonably current information on ModelSim pricing these days? I don't want to go through the whole song and dance with contacting a Mentor rep if I can avoid it. Also, are they still making a specific Xilinx Edition? I know that since December 2010, the agreement with Xilinx to ship ModelSim XE and let Xilinx generate licenses for it expired, but they still make an Altera Edition that Altera still packages, so I'm curious if they still have vendor-specific versions, or if buying from Mentor just gets you one (probably more costly) unlimited product.

movax
Aug 30, 2008

I don't have access to the quote anymore, but QuestaSim was insanely well priced compared to our previous Cadence Incisive license. Didn't need any vendor-specific licensing, just picked the mix language support. Both Xilinx and Altera support it for generation of simulation libraries.

minidracula
Dec 22, 2007

boo woo boo
Anyone in this thread working with (or has worked with, in the past) Achronix parts?

ante
Apr 9, 2005

SUNSHINE AND RAINBOWS
Are there any good books on VHDL or HDL theory and practices in general?


I'm looking for a primer on pipelining, but good reference seems pretty meager on the internet. The only readily searchable result is by a dude who stylises himself "vhdlguru" and has a completely broken example.

blorpy
Jan 5, 2005

Think of HDL as just being the syntax. It's really easy to pick it up.

What you seem to be asking is about semantics. In particular, what are you pipelining? What is it running on? FPGA? These kinds of things are a lot harder to learn, and if you know them, the language is just a thin layer on top.

Go read P&H and H&P :)

ante
Apr 9, 2005

SUNSHINE AND RAINBOWS
I'm implementing some cryptographic algorithms on an FPGA and I want to run numbers through it as fast as possible. I'm working my way through a textbook that I can't remember the name of right now. What are P&H and H&P?

The syntax is easy, that's no problem. I want to know actual best practices for exactly the above scenario. Most of the internet focuses on processes, and doesn't really get into the nitty gritty of how they work.


I'm having a hard time

An example:

Say I implement MD5 (or whatever) in a process. My testbench gives it an input, and the output pops out instantly in the simulation.

code:
...
port (
	  Din : in STD_LOGIC_VECTOR (31 downto 0);

	  Dout : out STD_LOGIC_VECTOR (31 downto 0);
         );


...
process (Din)
	type BUFF is array (0 to 4) of STD_LOGIC_VECTOR (31 downto 0); 
	variable W: BUFF;
begin
	W(0) := Din;
	for t in 1 to 4 loop
		W(t) := W(t - 1) + 5;
	end loop;
	Dout <= std_logic_vector(W(4));
end process;
Naturally, that's not how it actually works in an FPGA. Say it takes 20ns, but I know that if I put CLK in the sensitivity list, I can do it in 4 clocks of 5ns, still have the first hash done in the same time, but have three more in the pipeline. I don't know how to do that, though, it seems like I should have to write out three processes with CLK in the sensitivity lists to make them all concurrent.

As I said, I've got a textbook and I'm pretty much reading it front to back, but it's slow going and pipelines are a few hundred pages away.

priznat
Jul 7, 2009

Let's get drunk and kiss each other all night.
Is the textbook "Contemporary Logic Design" by Randy H Katz? Because that's a good'un.

It takes some getting used to the switch over to concurrent processes, and using clocking etc.

If you're writing code that you want to be synthesizable down the road it's usually best to stay away from variables in VHDL. They're fine for some applications, especially in testbenches, but usually you'll want to use signals. The differences between the two is signals change their value at the end of a process whereas variables change them right away. I can't even recall how variables get synthesized, a latch I guess instead of a proper flop.

Something like this would be how to use signals and clock inputs:
code:
process(clk, resetb) is 
    type BUFF is array (0 to 4) of STD_LOGIC_VECTOR (31 downto 0); 
    signal W: BUFF;
begin
   if(resetb == '0') then -- asynchronous reset
      for t in 1 to 4 loop
          W(t) <= (others => '0'); -- initialize the values to zero
      end loop;
   elsif(clk'event and clk == '1') then -- every rising edge of the clock, this stuff happens
      W(0) <= Din;
      for t in 1 to 4 loop
          W(t) <= W(t-1) + 5;
      end loop;
   end if;
end process;

(this code probably has some syntax errors, I'm just going off the top of my head)

Another way you could do it is create a component that performs the calculation and have the flop (clock stage) inside that and then stamp it out using a generate statement. It's good to think of components in HDL as discreet boxes that have their inputs and outputs and that you can put down a whole lot of them in order to run things in parallel. Break components down into simple, manageable parts and it can make things a whole lot easier.

JawnV6
Jul 4, 2004

So hot ...

ante posted:

What are P&H and H&P?

Patterson & Hennessy and Hennessy & Patterson. Building a computer from the sand up.

Basic pipelining is covered in P&H, but it's specific to a MIPS-like 5 stage microprocessor. You have a single-cycle one and have to add all the logic to put buffers in between the stages and ferry information back and forth to avoid hazards (e.g. the computed result of a previous instruction not being available as an argument for the next).

Pipelining in the general sense is putting buckets (flops) between each set of active logic to store intermediate results and your tooling should figure out the fastest the clock can run.

spoof
Jul 8, 2004
Just a little out of scope of this discussion, but this is actually something that HLS is good for and incidentally what we used when we wrote our AES code. You can easily trade off resources for latency or Fmax with a few compiler directives. You wouldn't want to write your whole project using HLS but it can be great for individual modules in the design.

edit: HLS = High Level Synthesis, basically (somewhat limited)C-to-HDL.

spoof fucked around with this message at 16:13 on May 16, 2014

karasu
Jan 3, 2008
Maybe some of you are interested in the project I've chosen for my bachelors thesis.

I've written a 3D lookup table for video color processing in a Cyclone V FPGA. A 3D LUT is commonly used for color grading and display calibration and such. It consists of a 3D lattice of support points that are equally spread over the domain of possible input RGB triplets. The module accepts an RGB triplet, looks up the appropriate value in the 3D LUT, and outputs that value. Looks like this, but denser:



Now having a complete LUT that stores values for all possible 24+ bit input RGB values is hardly practical since it makes the computation, storage and high throughput access difficult. This is why only a subset of input values is stored, such as a 33 by 33 by 33 cube, which can be fit inside the embedded SRAM of an FPGA. All input values that lie between these points are calculated by interpolating beween the eight nearest points.

My goal for the module was to achieve as high of a throughput as I could, since the module uses a lot of limited resources such as memory and multiplicators. A 33 cubed LUT of RGB24 values means about 120 kbyte of embedded RAM, and you have to do a whopping 42 multiplications for the trilinear interpolation per pixel. There are more efficient interpolation algorithms but I discovered those too late.

So I have the memory and DSP blocks running at 300 MHz, which is pretty close to the limit of what the Cyclone V can do and requires the fastest speed grade. The other parts of the module run at 150 MHz, which means that each DSP block processes two streams in each 150 MHz period. The memory blocks process four streams at once, since I use both ports at 300 MHz. I also spread the LUT data over many smaller memories, since that way I can access all data words for the trilinear interpolation in a single clock cycle. That results in 176 Gbit/s of memory bandwidth and 25 GMAC/s, and the module can process 600 million pixels per second, enough for 4K video at 60 fps.

Fitting this in the FPGA was a pretty difficult. Quartus doesn't manage to fit this efficiently when you just give it the vhdl, which results in an fmax of about 220 MHz. I tried various optimization settings and a seed sweep but that didn't improve things by much, since hundreds of paths fail, especially at the clock domain crossing between the 300 MHz and 150 MHz domains. Just look at this ugly mess:



Then I tried a hand optimized floor plan with Logilock regions, since that was a feature I have been waiting to try out in a project for quite a while. I wrote the module with many hierarchical components, so it was easy to place certain parts to specific areas. For example, Quartus wasn't using the dedicated input/output registers of DSP and memory blocks consistently which would always fail timing so I assigned a tiny region on these components to fix that. I also fit the module in a quadrant of the FPGA that has a regional clock, which probably has less skew than a global one. This improved things and I have it running at 300 MHz. This is the final fit:



Even has 100 ps of slack. The usage of ALMs and LABs is pretty high and some parts of the design are ridiculously dense, like in front of a DSP block.

By the way, I love the way Quartus allows you to visualize a fit. I don't know how the tool are in Xilinx land, but the ability to see a failing path by just right clicking in timequest is pretty convenient.



Now I wish my company was doing a project where you actually need that kind of throughput. They usually only need 80 million pixels per second. :eng99:

karasu fucked around with this message at 06:28 on Aug 23, 2014

JawnV6
Jul 4, 2004

So hot ...
Very cool project and writeup! Especially at the BS level.

Whenever we had significant P&R issues it was always kicking it over to FAE's. I have a few distinct memories of the partitioner making some poor choices that left P&R impossible to do.

karasu posted:

Now I wish my company was doing a project where you actually need that kind of throughput. They usually only need 80 million pixels per second. :eng99:
That's the unfortunate thing about FPGA's. It's such an odd niche. I have one for hobbyist grade projects where a micro doesn't have enough flexibility. Even then some of it is "sensor glue" that I could gang some arduinos together and write code on a real PC. A few steps above that is as a DSP replacement for video applications, which it sounds like your company is doing.

Then there's big-box emulation for ASIC developers who need more cycles for HW verification or early SW development. That's where FPGA's are truly indispensable. I used to do a lot of work on those, developing transactors and such. I miss that work.

ante
Apr 9, 2005

SUNSHINE AND RAINBOWS
That is really rad. Could you go into more detail about how those memory block pictures work?

movax
Aug 30, 2008

Very slick design! Looks like a Cyclone V SX, are you using the ARM cores for anything?

karasu
Jan 3, 2008

ante posted:

That is really rad. Could you go into more detail about how those memory block pictures work?

Sure, are you referring to the floor plan? Those pictures are from the chip planner in Quartus II which allows you to examine the whole FPGA and check which resources your design is using. The blue shaded blocks are logic array blocks (LABs), which implement most of the common logic you have in your design such as registers, adders and stuff. The red shaded blocks are the DSP blocks, Alteras hardware multiplicators. They changed them quite a bit compared to the Cylone IV, you can configure them in different ways. Each DSP block can do one 27x27, two 18x18 or three 9x9 multiplications in one clock cycle. Pretty handy for my design is a configuration where the DSP block does two 18x18 multiplications and adds the results together. That's more or less a linear interpolation by itself so that's how the blocks in my design are used.

The yellow/brownish shaded resouces are memory blocks, with each block implementing up to 10 Kbits of memory that you can access with two ports simultaneously. These ports can have different clocks and both can read and write to the memory. You can chain these together to create larger memories. The other blocks are components such as PLLs, pin logic, DRAM controllers and so on. A darker shading means that a block is being used in the design, with the intensity increasing with the utilization of that block.

What I did with the floor plan was forcing the fitter to use a certain region of the FPGA for specific components. I did a lot of experimenting how many LABs the incoming and outgoing clock domain crosings need (quite a few, many duplicate registers invloved) and then forced a logilock region on that next to the DSP blocks and memories. The DSP blocks need a little more space than the memory since the bit widths at the input are wider, and I do some rounding at the output.


movax posted:

Very slick design! Looks like a Cyclone V SX, are you using the ARM cores for anything?

Not yet, but to me that is one of the most exciting areas I'd like to work on in the future. What I hope to accomplish is to have the ARM calculate the 3D LUTs, depending on some FPGA based image analysis modules such as histograms. I also want to be able to reconfigure the FPGA module with a new LUT data set in a very short time like like during vertical blanking, so you don't even notice the change. It's also absolutely necessary to do the reconfiguration at that time since during active picture there is no way to access the memory due to both ports being used. Sending 120 kbyte of LUT data in 60 microseconds from the ARM cores to the module at that specific time is an interesting task. I would probably realize this by adapting the interface of the module to support AXI-Streaming or something equivalent. When a video packet arrives, the module works as usual, but with a packet having a configuration identifier it switches over from the usual address generators to a simple internal counter that sweeps through the address range sequentially and writes each incoming data word to the LUT memory. That's a fairly simple change. The synchronization, and DMA transfer from the ARM cores, is something that would require quite a lot of time for me I imagine.

Unfortunately I am a bit lacking in domain knowledge and I don't know if there is an application that would benefit from this kind of video processing setup. It's also way out of the niche that my company specializes in, so I am not sure how far I can develop this idea all by myself.

minidracula
Dec 22, 2007

boo woo boo
Anyone have any favorite exemplar product specifications or data sheets for FPGA IP they care to point me at? I'm looking for examples to imitate and learn from as I go through the existing documentation for one of our IP products and work on cleaning up the formatting, presentation, and content.

As an aside: every time I find myself tackling a technical communications (i.e. technical publishing/technical documentation) project, I beg for better tools. This feels like the only consistent thread throughout all of the various times I've done this. I've used FrameMaker, TeX (LaTeX, ConTeXt, LyX), Lout, XML (DocBook, DITA), Word (god forbid), random bodged-together HTML BS, plain ASCII text, InDesign/InCopy, Scribus, and to some degree, everything sucks and is found wanting (or I haven't reached whatever particular state of tech comm serenity). I'd really like to stick with TeX, or an XML toolchain that actually worked and wasn't broken, clunky, and found severely wanting compared to its marketing hype, but the main problems there (with both of those options, but more so with TeX) is the pain incurred on others who have no easy way to provide edits or feedback.

I doubt anyone has this, but with the preceding paragraph in mind, if anyone does have some sort of templates they use for this sort of thing that they can share with me, I'd love to take a look at those.

Hyvok
Mar 30, 2010

minidracula posted:

Anyone have any favorite exemplar product specifications or data sheets for FPGA IP they care to point me at? I'm looking for examples to imitate and learn from as I go through the existing documentation for one of our IP products and work on cleaning up the formatting, presentation, and content.

As an aside: every time I find myself tackling a technical communications (i.e. technical publishing/technical documentation) project, I beg for better tools. This feels like the only consistent thread throughout all of the various times I've done this. I've used FrameMaker, TeX (LaTeX, ConTeXt, LyX), Lout, XML (DocBook, DITA), Word (god forbid), random bodged-together HTML BS, plain ASCII text, InDesign/InCopy, Scribus, and to some degree, everything sucks and is found wanting (or I haven't reached whatever particular state of tech comm serenity). I'd really like to stick with TeX, or an XML toolchain that actually worked and wasn't broken, clunky, and found severely wanting compared to its marketing hype, but the main problems there (with both of those options, but more so with TeX) is the pain incurred on others who have no easy way to provide edits or feedback.

I doubt anyone has this, but with the preceding paragraph in mind, if anyone does have some sort of templates they use for this sort of thing that they can share with me, I'd love to take a look at those.

Starting from the obvious (you didn't mention it): have you tried Doxygen (http://www.stack.nl/~dimitri/doxygen/)?

On-topic: I've been playing around in modelsim making a PLL (for an induction heater controller, a project of mine): http://www.dgkelectronics.com/storage/electronics/induction_heater/digi/filter5.png

That is a model with a XOR phase-frequency detector, currently implementing something akin to a "type 2" PFD from a regular 4046 PLL. The problem with this XOR PFD is that there is a frequency dependent phase-shift (because the phase-difference is converted to a value with a timer) and I'd need to divide something with the period length to get rid of it and yeah divide is no good. With some other kind of PFD you then again lose the (easy) way to add deliberate phase-shift. Other than just phase-shifting the feedback or the output signal. Oh well...

movax
Aug 30, 2008

I'm writing some JTAG logic utilizing the UJTAG functionality of Microsemi devices (basically a way to implement your own custom JTAG instructions). I have a case where I want a particular DR to trigger a one-time pulse event. The UJTAG interface provides you signals that indicate CaptureDR, ShiftDR and UpdateDR.

The last one is confusing me a bit: during UpdateDR, I latch the value from the DR into a 'shadow' register that holds the final value that was shifted in. I want this to be the trigger for the counter, which is fine. The counter runs off a completely different clock than the JTAG clock (of course). At the end, I want to the counter to disable itself, and the easiest way is to simply write a 0 back to the trigger register. However, that trigger register is clocked by the UpdateDR signal.

So...I can synthesize if I feed the UpdateDR process a clock that is the OR of the UpdateDR signal + the internal clock, but I feel like that could be a little weird. With them OR'd, the register can be cleared, no problem, and if UpdateDR occurs asynchronously to the internal clock, that register can be updated. Still feel like I'm missing something though.

Star War Sex Parrot
Oct 2, 2003

movax posted:

Still feel like I'm missing something though.
Female companionship.

minidracula
Dec 22, 2007

boo woo boo

movax posted:

I'm writing some JTAG logic utilizing the UJTAG functionality of Microsemi devices
Why on earth are you using Microsemi anything?

movax posted:

Still feel like I'm missing something though.

Star War Sex Parrot posted:

Female companionship.
Harsh! :iceburn:

movax
Aug 30, 2008

Star War Sex Parrot posted:

Female companionship.

:drat: Sadly true.

minidracula posted:

Why on earth are you using Microsemi anything?


Harsh! :iceburn:

The IGLOO Nano is really low power, and has a rather SEU-resistant (supposedly immune) fabric thanks to being flash-based. I used it on a high-frequency DC/DC converter board with no real issues, and planning on using it a lot more going forward as a general utility device.

I did end up working around my problem though and I've got a FPGA that can now help program / in-circuit debug MSP430s, which have the weirdest goddamned JTAG implementation I've ever seen (read: non-compliant). What I did was have a custom JTAG instruction that does nothing in the CaptureDR or UpdateDR phases, but during the ShiftDR phase, ties the TDI line directly to an output pin, meaning a shift of '101' at any frequency faster than 20kHz will give me my desired pulse-low.

I might have to revisit it though if the current interface proves too slow and I need to move more logic into the FPGA; I guess I'll cross that when I get to it. Would be a similar process where I need a flag triggered by the JTAG clock to gate a process run by a different clock -- in the shower earlier, I was thinking I need to get a DFF with a CLR input to get a flag that is clocked and set in one domain, but reset from another.

I've got some SDLC master IP to write in a bit as well -- spree of FPGA stuff before I dive back into hardware-hardware land. At least that will be for a FPGA with 6-LUTs...

movax fucked around with this message at 07:13 on Sep 26, 2014

priznat
Jul 7, 2009

Let's get drunk and kiss each other all night.
A guy in the FPGA emulation group of my department just left and I briefly thought about moving there from my system design + test job but then I remembered how emulation is a loving nonstop pain in the rear end of wrestling with horrible vendor tools and getting poo poo from the higher ups because the asic group's unoptimized RTL won't run faster than 50MHz on an FPGA no matter what you do.

That said I kind of miss it, sometimes.

movax
Aug 30, 2008

priznat posted:

A guy in the FPGA emulation group of my department just left and I briefly thought about moving there from my system design + test job but then I remembered how emulation is a loving nonstop pain in the rear end of wrestling with horrible vendor tools and getting poo poo from the higher ups because the asic group's unoptimized RTL won't run faster than 50MHz on an FPGA no matter what you do.

That said I kind of miss it, sometimes.

That sounds like it could be terrifying -- the tools would be horrid. Cool concept though (academically pretty neat concept) but imagine working with all of those vendor tools of varying degrees of stability...bleh.

JawnV6
Jul 4, 2004

So hot ...
Back when I was doing big box emulation we had a failure where the router made some poor decisions about a giant buffer that it kept trying to split onto 2 FPGA's then couldn't handle the IO between the two. Was the sort of thing you'd kick off a compile in the morning, optimistically hope that it failed before leaving that evening, kick the failure over to the vendors and hope that the next build could be attempted the next morning instead of burning 2 days.

priznat
Jul 7, 2009

Let's get drunk and kiss each other all night.

movax posted:

That sounds like it could be terrifying -- the tools would be horrid. Cool concept though (academically pretty neat concept) but imagine working with all of those vendor tools of varying degrees of stability...bleh.

It's become fairly standard over the last few years to emulate some or all of asic devices in FPGAs (within reason, limited to the PCIe x1 and clocks turned way down). But yeah it can be a huge headache. Previous company the emulation platform was 9 virtex 5s linked via serdes. Then the monster virtex 7 came out and it would have been able to fit everything in that one device.

Now the company I'm at are emulating using a board with 6 virtex 7 100s :haw: I think the emu guy told me build times are about 30 hours on the grid server they got, so they kick off a dozen or so with varied parameters and pray one meets timing :)

Delta-Wye
Sep 29, 2005
I've seen the multi-fpga based asic simulation rigs but I was never clear on how the inter-fpga communication was done. Are there tools to split the asic between FPGAs, or is that done by hand by the testing engineers?

From a ten thousand foot view, I wasn't quite sure how you would go about verifying a design after cutting it up and slowing it down. Even before the last couple of posts, I figured there was something significant being changed because clearly something leaving an ic will be much slower than a signal that stays on-die.

JawnV6
Jul 4, 2004

So hot ...

Delta-Wye posted:

I've seen the multi-fpga based asic simulation rigs but I was never clear on how the inter-fpga communication was done. Are there tools to split the asic between FPGAs, or is that done by hand by the testing engineers?

From a ten thousand foot view, I wasn't quite sure how you would go about verifying a design after cutting it up and slowing it down. Even before the last couple of posts, I figured there was something significant being changed because clearly something leaving an ic will be much slower than a signal that stays on-die.

"Partitioning" is the word you want to search for. There are tools to do it, thought that's as much as I can say without thinking more about it.

In general, the problem of feeding the simulated ASIC realistic inputs is harder than keeping chip to chip communications in sync. Most big boxes offer faculties that stop the design if the testbench SW isn't ready to spoof another transaction. We had a compilation mode that would assume you had to stop it every cycle to wait for software, building in that assumption let it go a lot faster than relying on the HW portion of the transactor to report it.

All my knowledge here is at least 2~3 years out of date.

priznat
Jul 7, 2009

Let's get drunk and kiss each other all night.
A lot of the stuff is block based and they communicate via protocols/busses like AXI. Packing an AXI interface across a serdes (rocketio or whatever Xilinx calls them now) is part of the Xilinx included IP, I think.

In the days before standardized busses like AXI it was a lot more of a pain and the ASIC designers would have to include an interface for the serdes (serializer/deserializer, sorry I hate it when folks use terms without defining them too). Now if the design is partitioned decently it isn't a huge effort to plunk it down, in theory.

The actual pain comes from the massively long iteration time coupled with the crappy tools and the inability to make timing on a set of critical signals.

Oh and how the timing will get WORSE after you add constraints on it following another 24 hour build time!

ante
Apr 9, 2005

SUNSHINE AND RAINBOWS
Had a job interview at Sierra Wireless last year. They've got a multimillion dollar ASIC simulation rig the size of a small car. It was pretty cool, they have departments all over the world that book it on a schedule so that it has no downtime.


They said a few million cells, I think?

priznat
Jul 7, 2009

Let's get drunk and kiss each other all night.
A Palladium system? Those are pretty impressive, there is one at my work too but I have nothing to do with it so far.

Lately it has been all supercaps and low power modes and noisy ADCs..

ante
Apr 9, 2005

SUNSHINE AND RAINBOWS
Yeah, Palladium rings a bell. It was connected to a bunch PCI-E cards on open computers sitting on a desk... With webcams pointed at them to monitor the status LEDs.

movax
Aug 30, 2008

That's pretty slick. I get the occasional marketing blast from Cadence advertising their Proteus platform (I think it's Cadence, might be Mentor). 6-8 Virtex 7000s IIRC for some absurd number of ASIC gates. Haven't gotten a chance to do any real ASIC work thus far, but I'm hoping to start biting into it next summer as part of masters', and perhaps corporately after that.

Adbot
ADBOT LOVES YOU

priznat
Jul 7, 2009

Let's get drunk and kiss each other all night.
I think it is the Proteus system the emu guys at work use. It used to be you'd have to build your own emulation platform but now the CAD tool guys have figured out it is a good place to be to make their own with high density connectors so you can build your addon boards for whatever interfaces you want to stick onto it.

The local Xilinx rep is annoyed because he went from selling $15k FPGAs a pop to just whatever spartans people slap on their products now that most folks just get these off the shelf systems for emulation. Since they buy from Cadence he's cut out of the action.

  • Locked thread