Skip to content

Conversation

@albertvucinovic
Copy link

@albertvucinovic albertvucinovic commented Jan 26, 2025

Without RoPE:
20250124_23h05m01s_grim

With RoPE:
20250126_12h15m25s_grim

Can still be used without RoPE normally. Everything should work as before. Only if in the config file you add use_rope flag, then it will use RoPE instead of the wpe matrix. rope_base is also a configurable value.

Tested on 4090, so has different mfu (because the calculation is based on A100).

@albertvucinovic
Copy link
Author

Didn't check for checkpoint continuations.

@nitinvetcha
Copy link

Hi,
Could you please provide the weights if possible obtained by training nanoGPT with RoPE ?
It would be very helpful

klei22 added a commit to klei22/nanoGPT that referenced this pull request Sep 14, 2025
…and_languages_to_tokenizer_analysis_scripts

Add additional languages and scripts for analysis
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants