Skip to content

aferikoglou/GNWSIS

Repository files navigation

GNΩSIS: An extensible High-Level Synthesis Dataset

This repository provides part of the source code used for the generation and analysis of the GNΩSIS dataset.

Its primary focus is to demonstrate and support the extraction of source code feature vectors from the src_info.json files generated by the HLSAnalysisTools repository.

Additionally, it includes functionality to parse .sqlite files produced by the GenHLSOptimizer, which contain applied HLS directives along with corresponding metrics such as latency, BRAM, DSP, FF, LUT usage, and synthesis time.

Together with the aforementioned tools and repositories, this setup enables users to recreate, analyze, or extend the GNΩSIS dataset and framework effectively.

It also includes a detailed schema of the dataset and utility scripts for visualizing key metrics for individual applications.

Repository Structure

  • data/
    • ApplicationAPLMapping/ Provides the mapping between directive labels and their corresponding action points for each application.
    • ApplicationDatabases/ Includes the corresponding database files for each evaluated FPGA device and target clock frequency.
    • ApplicationDataset/ Provides the source code for each application, accompanied by the src_info.json and kernel_info.txt metadata files.
    • CSVS/ Includes the extracted data from the corresponding SQLite databases.
    • ApplicationBaselineInformation.csv Provides, for each application, the QoR metrics obtained from HLS reports generated without any applied directives or Vitis optimizations, across the evaluated FPGAs and target clock frequencies.
    • ApplicationInformation.csv Offers a quick reference to the top-level function file for each application, including the function name and file extension.
    • SourceCodeFeatureVectors.csv Provides the source code feature vector representation for each application.
  • modules/ Includes the core components of the repository, structured as modular classes.
  • featureVectorGeneration.py Python script that generates the SourceCodeFeatureVectors.csv file containing the extracted source code features.
  • dataAggregator.py Python script that creates the files in the CSVS directory and generates the GNΩSIS.csv.
  • appAnalyzer.py Python script that extracts key insights for a specific application from the dataset.
  • requirements.txt Lists required Python packages to run the repository.
  • .gitignore Specifies intentionally untracked files to ignore.
  • LICENSE.md The license for this project.
  • README.md Project overview, documentation, and usage instructions.

Getting Started

These instructions will get you a copy of the project on your local machine.

Prerequisites

This project was tested on Ubuntu 22.04 LTS (GNU/Linux 5.4.0-187-generic x86_64) with Python 3.10.

In addition, the following libraries are needed:

which can be simply installed using the following command.

python3 -m pip install -r requirements.txt

Run

After downloading the software in the Prerequisites section you can clone this repository on your local machine.

Source Code Feature Vector Generation

python3.10 featureVectorGeneration.py --MODE GNWSIS

Output:

Started source code feature vector generation...

...
The maximum number of array dimensions in the dataset is: 2
The maximum number of arrays in a single application is: 22
The maximum number of loop structures in an application in our dataset is 26
The maximum nesting level in a loop structure in an application in our dataset is 5
The maximum number of loops in each level of our dataset applications are [26, 16, 7, 5, 2]

Finished source code feature vector generation...

Please note that the generated SourceCodeFeatureVectors.csv file contains source code feature vectors for a broader set of applications than those officially included in the GNΩSIS dataset. To ensure consistency with the dataset presented in the manuscript, we recommend using the SourceCodeFeatureVectors.csv located in the data/ directory.

Convert SQLite Data to CSV and Build the Final Dataset

python3.10 dataAggregator.py

Output:

Started database read...

Finished database read...

Aggregating CSV files and computing metrics...

GNΩSIS dataset generated successfully at GNΩSIS.csv

The exact CSV files referenced in the GNΩSIS manuscript are also available in the data/ directory of this repository.

Dataset Schema

The GNΩSIS dataset is organized as a CSV file, where each row corresponds to a distinct hardware design configuration for a specific application, targeting a particular FPGA and clock frequency. It includes both configuration parameters and associated performance and resource utilization metrics.

Configuration Parameters

These columns define the application context and the design parameters:

  • Application_Name: The name of the application being analyzed.
  • Version: Identifier for a specific version or configuration of the application.
  • Device: The target FPGA device (e.g., xczu7ev-ffvc1156-2-e, xcu200-fsgd2104-2-e).
  • Clock_Period_nsec: The clock period for the design, in nanoseconds.

Applied Directives

These fields indicate which design directives have been applied to specific action points within the kernel:

  • Array_1 to Array_22: Represent directives applied to array-related action points (e.g., complete_1).
  • OuterLoop_1 to OuterLoop_26 and InnerLoop_1_1 to InnerLoop_4_2: Capture loop-specific directives such as pipeline_1 or unroll_2.

QoR Metrics

  • Latency_msec: Kernel execution latency, measured in milliseconds.
  • Synthesis_Time_sec: Total time taken to synthesize the design, in seconds.
  • BRAM_Utilization_percentage, DSP_Utilization_percentage, FF_Utilization_percentage, LUT_Utilization_percentage: Resource usage reported as a percentage of the total available on the target FPGA device.
  • Speedup: Performance improvement factor compared to a baseline implementation.
  • BRAMs, DSPs, FFs, LUTs: Calculated absolute resource usage based on utilization percentage and the FPGA's total capacity.

This schema enables thorough exploration of the design-performance trade-offs across multiple configurations and devices. The GNΩSIS dataset is also available on Hugging Face Datasets, enabling users to effortlessly download and utilize it directly.

Visualize Key Insights for an Application from the GNΩSIS Dataset

python3.10 appAnalyzer.py --APPLICATION_NAME <ApplicationName>

Example: Generate the Databases for RodiniaHLS KNN Tiling Application

python3.10 appAnalyzer.py --APPLICATION_NAME rodinia-knn-1-tiling

Output:

Loaded GNΩSIS dataset:

(227819, 93)
                Application_Name         Version                Device  Clock_Period_nsec     Array_1     Array_2  ... LUT_Utilization_percentage Speedup BRAMs DSPs FFs LUTs
0  rodinia_lc_gicov_0_baseline_0  workload_0.cpp  xczu7ev-ffvc1156-2-e               10.0  cyclic_2_2   block_8_2  ...                        101     NaN   NaN  NaN NaN  NaN
1  rodinia_lc_gicov_0_baseline_0  workload_1.cpp  xczu7ev-ffvc1156-2-e               10.0   block_4_2  cyclic_4_2  ...                        101     NaN   NaN  NaN NaN  NaN
2  rodinia_lc_gicov_0_baseline_0  workload_2.cpp  xczu7ev-ffvc1156-2-e               10.0   block_2_2   block_8_2  ...                        101     NaN   NaN  NaN NaN  NaN
3  rodinia_lc_gicov_0_baseline_0  workload_3.cpp  xczu7ev-ffvc1156-2-e               10.0  complete_2  complete_2  ...                        101     NaN   NaN  NaN NaN  NaN
4  rodinia_lc_gicov_0_baseline_0  workload_4.cpp  xczu7ev-ffvc1156-2-e               10.0  cyclic_4_2   block_2_2  ...                        101     NaN   NaN  NaN NaN  NaN

[5 rows x 93 columns]

Filtered dataset for application: rodinia-knn-1-tiling

(4506, 93)
            Application_Name         Version                Device  Clock_Period_nsec      Array_1  ...   Speedup BRAMs  DSPs      FFs     LUTs
169560  rodinia-knn-1-tiling  workload_0.cpp  xczu7ev-ffvc1156-2-e               10.0  cyclic_16_1  ...  1.938975  18.0  17.0   4608.0   4608.0
169561  rodinia-knn-1-tiling  workload_1.cpp  xczu7ev-ffvc1156-2-e               10.0    block_8_1  ...  8.105651  31.0  17.0   4608.0   6912.0
169562  rodinia-knn-1-tiling  workload_2.cpp  xczu7ev-ffvc1156-2-e               10.0  block_128_1  ...  4.495155   6.0  17.0  73728.0  16128.0
169563  rodinia-knn-1-tiling  workload_3.cpp  xczu7ev-ffvc1156-2-e               10.0    block_2_1  ...  2.081211   6.0  17.0   9216.0  16128.0
169564  rodinia-knn-1-tiling  workload_4.cpp  xczu7ev-ffvc1156-2-e               10.0    block_8_1  ...  1.759654  18.0  17.0  36864.0  16128.0

[5 rows x 93 columns]

Analyzing Application: rodinia-knn-1-tiling


Finished Application Analysis

The output of this step is stored in the output/rofdinia-knn-1-tiling directory, which contains synthesizability and feasibility statistics. Designs that are both synthesizable and feasible are filtered, and the Pareto frontier is generated for each FPGA device and target clock frequency. These results are further visualized.

Publication

If you find our project useful, please consider citing our paper:

About

GNΩSIS: An extensible High-Level Synthesis Dataset

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published