Posted

A long time ago, a colleague described the challenge of getting the widest data eye possible and some of the subtle things that he had to consider. He described what happens as a differential pair turned a corner; one of the two traces would always have to take a longer path. Here’s an example, below:

The problem here is not just that the outer trace is slightly longer than the inner one. What happens to the edge transitions as they propagate down the transmission line? Yuriy Shlepnev of Simberian Inc just posted a video which shown an animated example of exactly that.

If the transition on one trace gets too far ahead of the edge on the other trace, you will get cross-coupling from one to the other which will degrade both the rising and falling edges of the pair. You can correct the edge phase mismatch by adding a small jog-out in the trace with the shorter path close to the source (as shown below), but that causes an impedance discontinuity (and a small reflection). This may or may not be a problem, depending on the frequency that you are transmitting down the diff-pair, but it is worth being aware of.

Author
Categories

Posted

In August 2018, I presented a set of slides which should help engineers get to grips with the question: “What is Risc-V?”. It covers basics of the Risc-V family naming and gives an overview of how it’s implemented in hardware. It’s still fairly self-explanatory, so I’m putting it online without much more of an introduction. Get stuck in and feel free to contact me if you have comments.

RiscV_Starter.pdf [1.14 MB]

Author
Categories

Posted

Google offers a very interesting service which lets you plot a graph of how the world has searched for a particular term. Even better, you can plot a few terms against each other and see how they compare. It’s called Google Trends.

Some search terms are just not going to be useful. For example, I can’t compare the popularity of “Xilinx Vivado” against “Lattice Diamond” because the results get massively skewed with gemstone searches. Some queries work moderately well though. In 2017, I tried it out with terms like “Vivado” vs. “Quartus” and “VHDL” vs. “Verilog”. A sample is shown below.

This plot is for a 5 year period from 2012-2017 with “Xilinx” in blue and “Altera” in Red. What’s interesting is that you can clearly see the December holiday season. You can also see the spike of interest in “Altera” as Intel acquired them and that people are still searching for “Altera” over a year after Intel acquired them.

Take these results with scepticism and ask yourself why people search for things? Maybe it’s because it’s more popular (good), maybe it’s because it harder to use (bad), or maybe it’s something else entirely. I don’t pretend to have the answers.

I recently updated the presentation with results from March 2019. Has anything changed in 2 years? The full presentation is here:

FPGA_Trends_2019.pdf [1.00 MB]

Author
Categories

Posted

This presentation is aimed at beginners, so experienced engineers can ignore this post and move on. I am posting it because VHDL and Verilog are not like normal programming languages; instead of having a nice procession from one statement to another, everything happens at once (sort of).

This presentation shows how to read a simple design and describes how to synthesise the logic in your head. It is a key technique for getting the most out of some of the topics that I will be posting about later on.

How To Read VHDL.pdf [0.96 MB]

Author
Categories ,

Posted

You don't need to dig into the FPGA vendor's library of primitives in order to get a RAM. The tools are often smart enough to figure out what you want and will infer a RAM if you write your code the correct way.

The code below is a basic example of how to write VHDL code which will be interpreted by Vivado, Quartus, Diamond, etc. as a RAM.

-------------------------------------------------------------------------------
--
-- Copyright (c) 2019 CadHut
-- All rights reserved.
--
-- Redistribution and use in source and binary forms is permitted.
--
-------------------------------------------------------------------------------
-- Project Name  : CadHut Training
-- Author(s)     : Iain Waugh
-- File Name     : ram_dp_sync.vhd
--
-- Infer a basic synchronous dual-port RAM
--
-------------------------------------------------------------------------------

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity ram_dp_sync is
  generic (
    G_DATA_WIDTH : integer := 36;       -- Input / Output data width
    G_LOG2_DEPTH : integer := 9         -- log2( Memory Depth )
    );
  port(
    clk : in std_logic;

    i_addr_a : in std_logic_vector(G_LOG2_DEPTH - 1 downto 0);
    i_wr_en  : in std_logic;
    i_data_a : in std_logic_vector(G_DATA_WIDTH - 1 downto 0);

    i_addr_b : in  std_logic_vector(G_LOG2_DEPTH - 1 downto 0);
    o_data_b : out std_logic_vector(G_DATA_WIDTH - 1 downto 0)
    );
end ram_dp_sync;

architecture ram_dp_sync_rtl of ram_dp_sync is

  type t_ram is array (natural range <>) of std_logic_vector(G_DATA_WIDTH-1 downto 0);
  signal ram : t_ram(0 to 2**G_LOG2_DEPTH - 1);

begin  -- ram_dp_sync_rtl

  u_ram : process (clk)
  begin
    if(rising_edge(clk)) then
      o_data_b <= ram(to_integer(unsigned(i_addr_b)));

      if(i_wr_en = '1') then
        ram(to_integer(unsigned(i_addr_a))) <= i_data_a;
      end if;
    end if;
  end process u_ram;

end ram_dp_sync_rtl;
What's going on here? The RAM itself is a 2D array, which is G_DATA_WIDTH bits wide and is 2^G_LOG_DEPTH deep. The memory is held in a signal called ram.

Whenever you want to write to it, you put the data value on i_data_a, the address on i_addr_a and you raise the i_wr_en write enable strobe high for 1 clock cycle.

Reading is much easier; you put the address on the i_addr_b address pins and the data appears at the o_data_b output some time later.

That's not too bad, but it could be better. The code above does not let you decide whether you want to use distributed RAM (made from LUTs) or if you want to use dedicated FPGA memory resources. It also doesn't allow you to have any control over whether the RAM's output is registered or not, so you probably need to add extra flip-flops yourself if you want to run at a decent clock speed (>200MHz, say).

A more practical version of the code is shown below.

-------------------------------------------------------------------------------
--
-- Copyright (c) 2019 CadHut
-- All rights reserved.
--
-- Redistribution and use in source and binary forms is permitted.
--
-------------------------------------------------------------------------------
-- Project Name  : CadHut Training
-- Author(s)     : Iain Waugh
-- File Name     : ram_dp_sync.vhd
--
-- Infer a basic synchronous dual-port RAM with an optional registered output
-- and a choice of how the RAM is implemented
--
-------------------------------------------------------------------------------

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity ram_dp_sync is
  generic (
    G_DATA_WIDTH : integer := 36;       -- Input / Output data width
    G_LOG2_DEPTH : integer := 9;        -- log2( Memory Depth )

    G_REGISTER_OUT : boolean := true;

    -- RAM styles:
    -- Xilinx: "block" or "distributed"
    -- Intel/Altera: "logic", "M512", "M4K", "M9K", "M20K", "M144K", "MLAB", or "M-RAM"
    -- Lattice: "registers", "distributed" or "block_ram"
    G_RAM_STYLE : string := "block"
    );
  port(
    clk : in std_logic;

    i_addr_a : in std_logic_vector(G_LOG2_DEPTH - 1 downto 0);
    i_wr_en  : in std_logic;
    i_data_a : in std_logic_vector(G_DATA_WIDTH - 1 downto 0);

    i_addr_b : in  std_logic_vector(G_LOG2_DEPTH - 1 downto 0);
    o_data_b : out std_logic_vector(G_DATA_WIDTH - 1 downto 0)
    );
end ram_dp_sync;

architecture ram_dp_sync_rtl of ram_dp_sync is

  type t_ram is array (natural range <>) of std_logic_vector(G_DATA_WIDTH-1 downto 0);
  signal ram : t_ram(0 to 2**G_LOG2_DEPTH - 1);

  -- Xilinx
  attribute RAM_STYLE        : string;
  attribute RAM_STYLE of ram : signal is G_RAM_STYLE;

  -- Intel/Altera, Lattice
  attribute SYN_RAMSTYLE        : string;
  attribute SYN_RAMSTYLE of ram : signal is G_RAM_STYLE;

  signal data_b : std_logic_vector(G_DATA_WIDTH - 1 downto 0);

begin  -- ram_dp_sync_rtl

  u_ram : process (clk)
  begin
    if(rising_edge(clk)) then
      data_b <= ram(to_integer(unsigned(i_addr_b)));

      if(i_wr_en = '1') then
        ram(to_integer(unsigned(i_addr_a))) <= i_data_a;
      end if;
    end if;
  end process u_ram;

  -- Either register the outputs or pass them straight through.
  -- Logic runs faster when registered, but there's a 1-cycle penalty.
  out_not_registered : if G_REGISTER_OUT = false generate
    o_data_b <= data_b;
  end generate out_not_registered;

  out_registered : if G_REGISTER_OUT = true generate
    u_reg_out : process (clk)
    begin
      if(rising_edge(clk)) then
        o_data_b <= data_b;
      end if;
    end process u_reg_out;
  end generate out_registered;

end ram_dp_sync_rtl;
The main enhancement here is that we apply a special attribute to the ram signal. The attribute's name changes between vendors because they are not standardised, but there's no harm in specifying both types.

If you want your design to run at a decent speed, it's almost certain that you will want to turn on the output registers by setting G_REGISTER_OUT true.

Author
Categories