FPGA-Based Implementation of CNN using High Level Synthesis (HLS)

This repository contains the source files and scripts for implementing a Convolutional Neural Network on FPGA using Vitis High-Level Synthesis (HLS). The design leverages HLS to convert high-level C/C++ descriptions into optimized hardware implementations for FPGAs.

Features

High-performance CNN inference on FPGA.
Implemented using Vitis HLS for efficient synthesis and optimization.
Modular design for easy modification and integration.

Getting Started

Prerequisites

Make sure you have the following tools installed on your system:

Xilinx Vitis HLS
A compatible Xilinx FPGA development board. (I use Digilent Zedboard)

Clone the Repository

Clone this repository to your local machine:

git clone https://github.com/rezaAdinepour/CNN-on-FPGA.git
cd CNN-on-FPGA

Project Structure

The repository is organized as follows:

cnn-fpga-vitis-hls/
├── HW/                     # Hardware implementation files
│   └── src/                
│       ├── Data/           
│       ├── cnn.cpp         
│       └── cnn.h           
├── SW/                     # Software implementation files
│   ├── Data/
│   ├── Header/
│   ├── Model/ 
│   ├── gen_data.ipynb      
│   ├── train.ipynb         
│   └── utils.py            
└── Makefile

Creating the Project in Vitis HLS

Open Vitis HLS.
Create a new project:
- Project Name: Choose a name for your project.
- Project Directory: Set the directory where you want the project files to be stored.
Add the source files:
- Copy the contents of the HW/src/ directory into the src/ directory of your project.
- Add the source files to your Vitis HLS project.
Set the top function:
- Specify the top-level function for synthesis.
Configure synthesis settings:
- Optimize for latency, throughput, or area based on your requirements.

Building the Project

After setting up the project:

Run C Simulation to verify the functionality of the design.
Perform Synthesis to generate the RTL design.
Use C/RTL Co-Simulation to validate the synthesized RTL against the C model.
Export the IP core or generate the bitstream for deployment on FPGA.

Additional Notes

Modify the HW/src/ files to customize the CNN hardware implementation.
Use the provided Python scripts in the SW/ folder for training the model and generating data.
You can find the trained model in the Model/ folder.
Refer to the Vitis HLS documentation for advanced optimization techniques.

Implementation

Training Phase

In the first step, the target network is initially defined in software and trained using the MNIST dataset to utilize its weights in the hardware phase. For this purpose, a network with the following architecture is defined:

Define model as bellow in Codes/SW/utils.py:

def define_model() -> Sequential:
   # Define model.
   model = Sequential()
   model.add(ZeroPadding2D(padding=pad, input_shape=(input_size[0], input_size[1], 1), name="padding_layer"))
   model.add(Conv2D(conv_filter_num, conv_kernel_size, activation="relu", padding="valid", kernel_initializer="he_uniform", input_shape=(30, 30, 1), name="convolution_layer"))
   model.add(MaxPooling2D(pool_size, name="max_pooling_layer"))
   model.add(Flatten(name="flatten_layer"))
   model.add(Dense(10, activation="softmax", name="dense_layer"))
   # Compile model.
   model.compile(optimizer=Adam(), loss="categorical_crossentropy", metrics=["accuracy"])
   # Return model.
   return model

The output of the network after compiling it:

After defining the network structure and training it on the MNIST dataset for 5 epochs, the following results were obtained:

Network Accuracy:
- Accuracy on the training data: 0.9817
- Accuracy on the test data: 0.9801
Inference Time:
- 46.8072 milliseconds for 100 complete samples.

Finally, all the weights and parameters required for the hardware phase were stored in the following three files:

‍conv_weights.h: Contains the weights of the convolutional layers.
dense_weights.h: Contains the weights of the fully connected layer.
definitions.h: Includes certain constants used in the hardware phase.

Additionally, all the MNIST test data, including the images and their corresponding labels, were saved in two files:

in.dat: Contains the input data.
out.dat: Contains the corresponding labels.

This two file generated in Codes/SW/gen_data.ipynb file.

Inference Phase

In this phase, it is necessary to write the code for the Convolutional layer and Fully Connected layer and other important modules. The same structure presented in the software phase will be implemented in the hardware phase.

The hardware implementation is carried out here, while the structure defined in the software phase is maintained. The code for the Fully Connected layer is provided as follows:

void dense(hls::stream<float> & flat_to_dense_stream, int filter, hls::stream<float> & dense_to_softmax_stream)
{
  float flat_value;
  float dense_array[DENSE_SIZE] = { 0 };

  dense_for_flat:
  for (int i = 0; i < FLAT_SIZE / FILTERS; ++i)
  {
    flat_value = flat_to_dense_stream.read();

    for (int d = 0; d < DENSE_SIZE; ++d)
    {
      int index = filter * (FLAT_SIZE / FILTERS) + i;
        dense_array[d] += dense_weights[index][d] * flat_value;
    }
  }

  for (int j = 0; j < DENSE_SIZE; ++j)
  {
    dense_to_softmax_stream.write(dense_array[j]);
  }
}

In this design, the Convolutional Layer 4 has been utilized as follows:

void convolutional_layer(
	  float pad_img0 [PAD_IMG_ROWS][PAD_IMG_COLS],
	  float pad_img1 [PAD_IMG_ROWS][PAD_IMG_COLS],
	  float pad_img2 [PAD_IMG_ROWS][PAD_IMG_COLS],
	  float pad_img3 [PAD_IMG_ROWS][PAD_IMG_COLS],
	  hls::stream<float> conv_to_pool_streams [FILTERS] )
{
  convolution(pad_img0, 0, conv_to_pool_streams[0]);
  convolution(pad_img1, 1, conv_to_pool_streams[1]);
  convolution(pad_img2, 2, conv_to_pool_streams[2]);
  convolution(pad_img3, 3, conv_to_pool_streams[3]);
}

The output of the Convolutional Layers has been mapped to a Feature Map, which has then been flattened for further processing. The definition of this process is as follows:

void flattening(hls::stream<float> &  pool_to_flat_stream, hls::stream<float> &  flat_to_dense_stream)
{
  flat_for_rows:
  for(int r = 0; r < POOL_IMG_ROWS; ++r)
  {
    flat_for_cols:
    for(int c = 0; c < POOL_IMG_COLS; ++c)
    {
      flat_to_dense_stream.write(pool_to_flat_stream.read());
    }
  }
}

The final design of the model is referred to as the CNN. The definition of this model is as follows:

void cnn(float img_in[IMG_ROWS][IMG_COLS], float prediction[DIGITS])
{
  /******** Pre-processing data. ********/

  float pad_img0 [PAD_IMG_ROWS][PAD_IMG_COLS] = { 0 };
  normalization_and_padding(img_in, pad_img0);

  #if 0
    #ifndef __SYNTHESIS__
      printf("Padded image.\n");
      print_pad_img(pad_img);
    #endif
  #endif

  /* Allow parallelism cloning the padded image. */
  float pad_img1 [PAD_IMG_ROWS][PAD_IMG_COLS];
  float pad_img2 [PAD_IMG_ROWS][PAD_IMG_COLS];
  float pad_img3 [PAD_IMG_ROWS][PAD_IMG_COLS];

  float value;

  clone_for_rows:
  for(int i = 0; i < PAD_IMG_ROWS; ++i)
    clone_for_cols:
	for(int j = 0; j < PAD_IMG_COLS; ++j)
    {
      pad_img1[i][j] = pad_img0[i][j];
      pad_img2[i][j] = pad_img0[i][j];
      pad_img3[i][j] = pad_img0[i][j];
    }

  /* Parallel executions start here. */
  dataflow_section(pad_img0, pad_img1, pad_img2, pad_img3, prediction);
}

Synthesis Results

In the final stage, we synthesize the CNN model. The network synthesis reports are provided as follows:

Testbench

In the final stage, a testbench file has been written for the design, which processes 100 initial images from the in.dat file generated in the previous phase. These images are then passed to the network for processing. The network output includes the accuracy of number recognition and the time consumed for the entire process.

For example, our model recognizes 100 initial images from the MNIST test dataset with 100% accuracy.

Inference phase performance:
- 100 images processed in 4.45 milliseconds.
Software phase performance:
- Same 100 images processed in 46.80 milliseconds.
Impact of increasing image count:
- As the number of images increases, network accuracy decreases.
- Example: 500 images processed with 99% accuracy.

In this case, the network has incorrectly identified 5 images as follows:


(a) Wrong prediction for digit 0	(b) Wrong prediction for digit 2	(c) Wrong prediction for digit 2


(d) Wrong prediction for digit 5	(e) Wrong Caption for Image 5

The values that the network has obtained for each wrong output are also provided below:

Prediction for (a)	Prediction for (b)	Prediction for (c)	Prediction for (d)	Prediction for (e)
0: 0.000014	0: 0.000006	0: 0.268470	0: 0.000006	0: 0.000000
1: 0.000002	1: 0.000024	1: 0.006770	1: 0.005681	1: 0.000000
2: 0.000000	2: 0.078408	2: 0.000006	2: 0.000050	2: 0.137538
3: 0.806685	3: 0.048098	3: 0.000007	3: 0.955743	3: 0.003166
4: 0.001617	4: 0.000000	4: 0.025127	4: 0.000271	4: 0.000000
5: 0.037086	5: 0.000003	5: 0.002049	5: 0.037680	5: 0.000000
6: 0.000001	6: 0.000000	6: 0.002049	6: 0.000232	6: 0.000000
7: 0.000960	7: 0.596224	7: 0.006413	7: 0.000000	7: 0.000023
8: 0.003560	8: 0.271232	8: 0.627097	8: 0.000259	8: 0.859272
9: 0.150075	9: 0.006006	9: 0.047987	9: 0.000077	9: 0.000001

‫‪‬‬‫‪ ‫‪‬‬‫‪

Contribution

Feel free to contribute to this project by submitting issues or pull requests. Any improvements or additional features are welcome!

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

For any questions or further information, contact [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
Codes		Codes
Doc		Doc
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FPGA-Based Implementation of CNN using High Level Synthesis (HLS)

Features

Getting Started

Prerequisites

Clone the Repository

Project Structure

Creating the Project in Vitis HLS

Building the Project

Additional Notes

Implementation

Training Phase

Inference Phase

Synthesis Results

Testbench

Contribution

License

Contact

About

Uh oh!

Releases

Packages

Languages

License

rezaAdinepour/CNN-on-FPGA

Folders and files

Latest commit

History

Repository files navigation

FPGA-Based Implementation of CNN using High Level Synthesis (HLS)

Features

Getting Started

Prerequisites

Clone the Repository

Project Structure

Creating the Project in Vitis HLS

Building the Project

Additional Notes

Implementation

Training Phase

Inference Phase

Synthesis Results

Testbench

Contribution

License

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages