Skip to content

Conversation

@philinhphan
Copy link

No description provided.

philinhphan and others added 2 commits June 3, 2025 23:57
This commit completes the setup of the "scaling_laws.ipynb" notebook
as per the issue requirements.

Key changes include:
- Creation of a dummy "gutenberg_poetry.txt" and its processed
  version in "data/gutenberg_poetry/" for initial experimentation.
- Added a notebook cell to calculate and display non-embedding
  parameters (N) for predefined GPT models, following Chinchilla
  guidelines (N = Total - WTE - WPE, accounting for tied lm_head).
- Added notebook cells to derive the formula for training steps (S)
  and to calculate S and total tokens (D) for various model sizes
  and compute budgets.
- Replaced the original Task 3 placeholder with a comprehensive
  markdown guide detailing how to:
    - Perform model training using `train.py`.
    - Record final validation losses.
    - Plot L vs. N.
    - Extract N_opt and D_opt.
    - Fit scaling laws (N_opt vs. C, D_opt vs. C) to derive
      parameters N0, a, D0, and b.

The notebook is now structured for you to replace the dummy
dataset, run the preparatory calculations, and follow the guide
to perform the full scaling law analysis.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant