Skip to content

This repository contains the final coursework of Deep Learning module in Data Intensive Science course at Cambridge.

License

Notifications You must be signed in to change notification settings

0xInco/Qwen2.5-TimeSeries-Forecasting

Repository files navigation

M2 Coursework – MPhil in Data Intensive Science

This repository contains the submission for the M2 Major Module Coursework of the MPhil in Data Intensive Science programme at the University of Cambridge. It includes all relevant code and the final report.

All code is implemented in Python and organised for clarity and reproducibility. A LaTeX-formatted report is also included.

Python Version

This environment uses Python 3.12, and all required packages are compatible. Please ensure your Conda installation is up-to-date to avoid version resolution issues.

Repository Structure

├── LICENSE
├── README.md
├── requirements.txt
├── environment.yml
├── pyproject.toml
├── 2a_preprocessing.ipynb
├── 2b_flops_test.ipynb
├── 3a_grid_search.ipynb
├── 3b_context_length.ipynb
├── 3c_final_model.ipynb
├── model_comparison.ipynb

├── ctx_len_experiment/
│   ├── best_lm_head_bias_lr0.0001_rank8_ctx{128,512,768}.pt
│   ├── best_lora_state_lr0.0001_rank8_ctx{128,512,768}.pt
│   ├── best_result_lr0.0001_rank8_ctx{128,512,768}.json
│   └── val_inference_metrics_lr0.0001_rank8_ctx{128,512,768}.json

├── data/
│   └── lotka_volterra_data.h5

├── docs/
│   ├── conf.py
│   ├── index.rst
│   ├── make.bat
│   ├── Makefile
│   ├── source/
│   └── _build/ 
│       └── html/  
│           ├── index.html  
│           └── ...  

├── figures/
│   ├── grad_norm_{ctx_len,final,grid_search}.png
│   ├── lora.png
│   ├── lora_a_mean_norm_{ctx_len,final,grid_search}.png
│   ├── lora_b_mean_norm_{ctx_len,final,grid_search}.png
│   ├── model_comparison.png
│   ├── Qwen2_{Decoder_Layer,Model}.png
│   ├── train_loss_{ctx_len,final,grid_search}.png
│   └── val_loss_{ctx_len,final,grid_search}.png

├── final_model/
│   ├── best_lm_head_bias_lr0.0001_rank8_ctx768.pt
│   ├── best_lora_state_lr0.0001_rank8_ctx768.pt
│   ├── best_result_lr0.0001_rank8_ctx768.json
│   └── val_inference_metrics_lr0.0001_rank8_ctx768.json

├── lr_rank_experiment/
│   ├── best_lm_head_bias_lr{1e-05,5e-05,0.0001}_rank{2,4,8}.pt
│   ├── best_lora_state_lr{1e-05,5e-05,0.0001}_rank{2,4,8}.pt
│   ├── best_result_lr{1e-05,5e-05,0.0001}_rank{2,4,8}.json
│   └── val_inference_metrics_lr{1e-05,5e-05,0.0001}_rank{2,4,8}.json

├── report/
│   └── main.pdf

├── src/
│   ├── flops_counter.py
│   ├── lora_utils.py
│   ├── preprocessor.py
│   ├── qwen.py
│   └── __init__.py

├── tables/
│   ├── validation_mae_{ctx_length,grid_search}.csv
│   └── validation_mse_{ctx_length,grid_search}.csv

After setting up the environment using the provided environment.yml or requirements.txt, you can run the .ipynb notebooks directly. However, please note that you’ll need to configure your own Weights & Biases account - this includes running wandb login with your API key before using logging features.

The figures/ folder contains all the plots generated throughout the experiments. The folders lr_rank_experiment/, ctx_len_experiment/, and final_model/ store the best model outputs for each run, including the LoRA state, LM head bias and the corresponding validation inference metrics. The .pt files could be loaded into base models using load_lora_weights and lm_head.bias.data.copy_.

The src/ directory includes utility code used across different stages of the project - such as data preprocessing, FLOPS counting, and model wrappers - serving as a central toolkit.

tables/ contains the MSE and MAE values obtained from the evaluate_inference() function under different hyperparameter configurations.

Clone the Repository

To download this repository, run:

git clone https://gitlab.developers.cam.ac.uk/phy/data-intensive-science-mphil/assessments/m2_coursework/yz929.git

Environment Setup

You can install the dependencies using either Conda or pip.

Using Conda

conda env create -f environment.yml
conda activate m2_env

To deactivate:

conda deactivate

Using pip

pip install -r requirements.txt

Note: PyTorch is installed via pip because conda packages are no longer officially supported.

  • If you have an NVIDIA GPU, pip will try to install the appropriate CUDA version automatically.
  • If you want to ensure compatibility or manually choose a version, visit: https://pytorch.org/get-started/locally/

To install with specific CUDA version (e.g., 11.8), run:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

If you are using CPU only:

pip install torch torchvision torchaudio

Code

After setting up the environment and wandb API, you can run the Python notebooks after changing directories and wandb parameters of your own.

All required data files are included in the data/ directory - no external download is necessary.

Documentation

Sphinx-based automatic documentation for the image processing toolkit is located at:

docs/_build/html/index.html

You can open this file in a browser to explore the full API reference and module descriptions.

Report

The report is written in LaTeX and located inside the report/ directory. The compiled version can be found as main.pdf.

Use of Auto-Generation Tools

ChatGPT was used to assist in proofreading and formatting the LaTeX report. This included improvements to grammar, clarity, and LaTeX formatting (equations, tables).

All suggestions were critically reviewed and selectively integrated to maintain academic integrity.

License

This repository is licensed under the MIT License. For more details, see the LICENSE file.

About

This repository contains the final coursework of Deep Learning module in Data Intensive Science course at Cambridge.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published