Inferring a RAM in VHDL

Posted

You don't need to dig into the FPGA vendor's library of primitives in order to get a RAM. The tools are often smart enough to figure out what you want and will infer a RAM if you write your code the correct way.

The code below is a basic example of how to write VHDL code which will be interpreted by Vivado, Quartus, Diamond, etc. as a RAM.

-------------------------------------------------------------------------------
--
-- Copyright (c) 2019 CadHut
-- All rights reserved.
--
-- Redistribution and use in source and binary forms is permitted.
--
-------------------------------------------------------------------------------
-- Project Name  : CadHut Training
-- Author(s)     : Iain Waugh
-- File Name     : ram_dp_sync.vhd
--
-- Infer a basic synchronous dual-port RAM
--
-------------------------------------------------------------------------------

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity ram_dp_sync is
  generic (
    G_DATA_WIDTH : integer := 36;       -- Input / Output data width
    G_LOG2_DEPTH : integer := 9         -- log2( Memory Depth )
    );
  port(
    clk : in std_logic;

    i_addr_a : in std_logic_vector(G_LOG2_DEPTH - 1 downto 0);
    i_wr_en  : in std_logic;
    i_data_a : in std_logic_vector(G_DATA_WIDTH - 1 downto 0);

    i_addr_b : in  std_logic_vector(G_LOG2_DEPTH - 1 downto 0);
    o_data_b : out std_logic_vector(G_DATA_WIDTH - 1 downto 0)
    );
end ram_dp_sync;

architecture ram_dp_sync_rtl of ram_dp_sync is

  type t_ram is array (natural range <>) of std_logic_vector(G_DATA_WIDTH-1 downto 0);
  signal ram : t_ram(0 to 2**G_LOG2_DEPTH - 1);

begin  -- ram_dp_sync_rtl

  u_ram : process (clk)
  begin
    if(rising_edge(clk)) then
      o_data_b <= ram(to_integer(unsigned(i_addr_b)));

      if(i_wr_en = '1') then
        ram(to_integer(unsigned(i_addr_a))) <= i_data_a;
      end if;
    end if;
  end process u_ram;

end ram_dp_sync_rtl;
What's going on here? The RAM itself is a 2D array, which is G_DATA_WIDTH bits wide and is 2^G_LOG_DEPTH deep. The memory is held in a signal called ram.

Whenever you want to write to it, you put the data value on i_data_a, the address on i_addr_a and you raise the i_wr_en write enable strobe high for 1 clock cycle.

Reading is much easier; you put the address on the i_addr_b address pins and the data appears at the o_data_b output some time later.

That's not too bad, but it could be better. The code above does not let you decide whether you want to use distributed RAM (made from LUTs) or if you want to use dedicated FPGA memory resources. It also doesn't allow you to have any control over whether the RAM's output is registered or not, so you probably need to add extra flip-flops yourself if you want to run at a decent clock speed (>200MHz, say).

A more practical version of the code is shown below.

-------------------------------------------------------------------------------
--
-- Copyright (c) 2019 CadHut
-- All rights reserved.
--
-- Redistribution and use in source and binary forms is permitted.
--
-------------------------------------------------------------------------------
-- Project Name  : CadHut Training
-- Author(s)     : Iain Waugh
-- File Name     : ram_dp_sync.vhd
--
-- Infer a basic synchronous dual-port RAM with an optional registered output
-- and a choice of how the RAM is implemented
--
-------------------------------------------------------------------------------

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity ram_dp_sync is
  generic (
    G_DATA_WIDTH : integer := 36;       -- Input / Output data width
    G_LOG2_DEPTH : integer := 9;        -- log2( Memory Depth )

    G_REGISTER_OUT : boolean := true;

    -- RAM styles:
    -- Xilinx: "block" or "distributed"
    -- Intel/Altera: "logic", "M512", "M4K", "M9K", "M20K", "M144K", "MLAB", or "M-RAM"
    -- Lattice: "registers", "distributed" or "block_ram"
    G_RAM_STYLE : string := "block"
    );
  port(
    clk : in std_logic;

    i_addr_a : in std_logic_vector(G_LOG2_DEPTH - 1 downto 0);
    i_wr_en  : in std_logic;
    i_data_a : in std_logic_vector(G_DATA_WIDTH - 1 downto 0);

    i_addr_b : in  std_logic_vector(G_LOG2_DEPTH - 1 downto 0);
    o_data_b : out std_logic_vector(G_DATA_WIDTH - 1 downto 0)
    );
end ram_dp_sync;

architecture ram_dp_sync_rtl of ram_dp_sync is

  type t_ram is array (natural range <>) of std_logic_vector(G_DATA_WIDTH-1 downto 0);
  signal ram : t_ram(0 to 2**G_LOG2_DEPTH - 1);

  -- Xilinx
  attribute RAM_STYLE        : string;
  attribute RAM_STYLE of ram : signal is G_RAM_STYLE;

  -- Intel/Altera, Lattice
  attribute SYN_RAMSTYLE        : string;
  attribute SYN_RAMSTYLE of ram : signal is G_RAM_STYLE;

  signal data_b : std_logic_vector(G_DATA_WIDTH - 1 downto 0);

begin  -- ram_dp_sync_rtl

  u_ram : process (clk)
  begin
    if(rising_edge(clk)) then
      data_b <= ram(to_integer(unsigned(i_addr_b)));

      if(i_wr_en = '1') then
        ram(to_integer(unsigned(i_addr_a))) <= i_data_a;
      end if;
    end if;
  end process u_ram;

  -- Either register the outputs or pass them straight through.
  -- Logic runs faster when registered, but there's a 1-cycle penalty.
  out_not_registered : if G_REGISTER_OUT = false generate
    o_data_b <= data_b;
  end generate out_not_registered;

  out_registered : if G_REGISTER_OUT = true generate
    u_reg_out : process (clk)
    begin
      if(rising_edge(clk)) then
        o_data_b <= data_b;
      end if;
    end process u_reg_out;
  end generate out_registered;

end ram_dp_sync_rtl;
The main enhancement here is that we apply a special attribute to the ram signal. The attribute's name changes between vendors because they are not standardised, but there's no harm in specifying both types.

If you want your design to run at a decent speed, it's almost certain that you will want to turn on the output registers by setting G_REGISTER_OUT true.

Author
Categories