Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 30 additions & 47 deletions notebooks/text_models/labs/text_generation_using_transformers.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -37,24 +37,7 @@
"source": [
"## Setup\n",
"\n",
"In order to run this notebook, you will need `keras_nlp`. KerasNLP is a natural language processing library that works natively with TensorFlow, JAX, or PyTorch. Keras NLP offers transformer layers that are extremely helpful to build the generative model in this notebook.\n",
"\n",
"Uncomment the cell below if you don't have keras_nlp already installed. You may need to restart the kernel once it has been installed."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "-QxR27aY5Ghe",
"outputId": "a31a6553-979c-47ec-aadc-4d829f99d80b"
},
"outputs": [],
"source": [
"#!pip install keras-nlp"
"In order to run this notebook, you will need `keras_hub`. Keras Hub is a pretrained modeling library that aims to be simple, flexible, and fast. It works natively with TensorFlow, JAX, or PyTorch. Keras NLP offers transformer layers that are extremely helpful to build the generative model in this notebook."
]
},
{
Expand Down Expand Up @@ -82,9 +65,9 @@
},
"outputs": [],
"source": [
"import keras_nlp\n",
"import tensorflow as tf\n",
"from tensorflow import keras"
"import keras\n",
"import keras_hub\n",
"import tensorflow as tf"
]
},
{
Expand Down Expand Up @@ -178,7 +161,7 @@
" origin=\"https://storage.googleapis.com/asl-public/text/data/simplebooks.zip\",\n",
" extract=True,\n",
")\n",
"data_dir = os.path.expanduser(\"~/.keras/datasets/simplebooks/\")\n",
"data_dir = os.path.expanduser(\"~/.keras/datasets/simplebooks.zip/simplebooks/\")\n",
"\n",
"# Load simplebooks-92 train set and filter out short lines using MIN_TRAINING_SEQ_LEN\n",
"raw_train_ds = (\n",
Expand All @@ -202,7 +185,7 @@
"source": [
"## Train the Tokenizer\n",
"\n",
"We train the tokenizer using Keras NLP's [compute_word_piece_vocabulary](https://keras.io/api/keras_nlp/tokenizers/compute_word_piece_vocabulary/) from the training dataset for a vocabulary size of `VOCAB_SIZE`, which is a tuned hyperparameter. We want to limit the vocabulary as much as possible, since it has a large effect on the number of model parameters. We also don't want to include *too few* words, or there would be too many out-of-vocabulary (OOV) sub-words. In addition, three tokens are reserved in the vocabulary:\n",
"We train the tokenizer using Keras NLP's [compute_word_piece_vocabulary](https://keras.io/api/keras_hub/tokenizers/compute_word_piece_vocabulary/) from the training dataset for a vocabulary size of `VOCAB_SIZE`, which is a tuned hyperparameter. We want to limit the vocabulary as much as possible, since it has a large effect on the number of model parameters. We also don't want to include *too few* words, or there would be too many out-of-vocabulary (OOV) sub-words. In addition, three tokens are reserved in the vocabulary:\n",
"\n",
"- `\"[PAD]\"` for padding sequences to `SEQ_LEN`. This token has index 0 in both `reserved_tokens` and `vocab`, since `WordPieceTokenizer` (and other layers) consider `0`/`vocab[0]` as the default padding.\n",
"- `\"[UNK]\"` for OOV sub-words, which should match the default `oov_token=\"[UNK]\"` in\n",
Expand All @@ -222,7 +205,7 @@
"source": [
"# Train tokenizer vocabulary\n",
"print(\"Training the word piece tokenizer. This will take 5-10 mins...\")\n",
"vocab = keras_nlp.tokenizers.compute_word_piece_vocabulary(\n",
"vocab = keras_hub.tokenizers.compute_word_piece_vocabulary(\n",
" raw_train_ds,\n",
" vocabulary_size=VOCAB_SIZE,\n",
" lowercase=True,\n",
Expand All @@ -239,7 +222,7 @@
"source": [
"## Load Tokenizer\n",
"\n",
"We use the vocabulary data to initialize [keras_nlp.tokenizers.WordPieceTokenizer](https://keras.io/api/keras_nlp/tokenizers/word_piece_tokenizer/). WordPieceTokenizer is an efficient implementation of the WordPiece algorithm used by BERT and other models. It will strip, lower-case and do other irreversible preprocessing operations. Given a vocabulary and input sentence, the WordPiece tokenizer will convert the sentence into an array of IDs and pad the sentence to the `SEQ_LEN` defined. For example, \n",
"We use the vocabulary data to initialize [keras_hub.tokenizers.WordPieceTokenizer](https://keras.io/api/keras_hub/tokenizers/word_piece_tokenizer/). WordPieceTokenizer is an efficient implementation of the WordPiece algorithm used by BERT and other models. It will strip, lower-case and do other irreversible preprocessing operations. Given a vocabulary and input sentence, the WordPiece tokenizer will convert the sentence into an array of IDs and pad the sentence to the `SEQ_LEN` defined. For example, \n",
"\n",
"```\n",
"vocab = [\"[UNK]\", \"the\", \"qu\", \"##ick\", \"br\", \"##own\", \"fox\", \".\"]\n",
Expand All @@ -257,7 +240,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"**Exercise:** Define the arguments (vocabulary, sequence_length, lowercase) for `keras_nlp.tokenizer.WordPieceTokenizer`. Refer to the [documentation here](https://keras.io/api/keras_nlp/tokenizers/word_piece_tokenizer/)."
"**Exercise:** Define the arguments (vocabulary, sequence_length, lowercase) for `keras_hub.tokenizer.WordPieceTokenizer`. Refer to the [documentation here](https://keras.io/api/keras_hub/tokenizers/word_piece_tokenizer/)."
]
},
{
Expand All @@ -268,7 +251,7 @@
},
"outputs": [],
"source": [
"tokenizer = keras_nlp.tokenizers.WordPieceTokenizer(\n",
"tokenizer = keras_hub.tokenizers.WordPieceTokenizer(\n",
" # TODO: Fill out the arguments\n",
")"
]
Expand All @@ -293,7 +276,7 @@
"outputs": [],
"source": [
"# packer adds a start token\n",
"start_packer = keras_nlp.layers.StartEndPacker(\n",
"start_packer = keras_hub.layers.StartEndPacker(\n",
" sequence_length=SEQ_LEN,\n",
" start_value=tokenizer.token_to_id(\"[BOS]\"),\n",
")\n",
Expand Down Expand Up @@ -325,20 +308,20 @@
"\n",
"We create our scaled-down transformer-decoder-based generative text model model with the following layers:\n",
"\n",
"- One `keras_nlp.layers.TokenAndPositionEmbedding` layer, which combines the embedding for the token and its position. This is diffrent from a traditional embedding layer because it creates trainable positional embedding instead of the fixed sinusoidal embedding.\n",
"- Multiple `keras_nlp.layers.TransformerDecoder` layers created using a loop. \n",
"- One `keras_hub.layers.TokenAndPositionEmbedding` layer, which combines the embedding for the token and its position. This is diffrent from a traditional embedding layer because it creates trainable positional embedding instead of the fixed sinusoidal embedding.\n",
"- Multiple `keras_hub.layers.TransformerDecoder` layers created using a loop. \n",
"- One final dense linear layer.\n",
"\n",
"**Note:** You can take a look at the [source code](https://github.com/keras-team/keras-nlp/blob/v0.6.1/keras_nlp/layers/modeling/transformer_decoder.py#L31) of this layer to see the different components that go into this layer."
"**Note:** You can take a look at the [source code](https://github.com/keras-team/keras-hub/blob/v0.23.0/keras_hub/src/layers/modeling/token_and_position_embedding.py#L11) of this layer to see the different components that go into this layer."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Exercise:** Write the model code using the following layers:\n",
"- Embedding Layer: Use [keras_nlp.layers.TokenAndPositionEmbedding](https://keras.io/api/keras_nlp/modeling_layers/token_and_position_embedding/) with arguments, vocabulary_size, sequence_length, embedding_dim and mask_zero.\n",
"- Transformer Layer: Use [keras_nlp.layers.TransformerDecoder](https://keras.io/api/keras_nlp/modeling_layers/transformer_decoder/) with arguments, num_heads and intermediate_dim. Play around with the number of transformer layers you want to have using the variable `NUM_LAYERS`."
"- Embedding Layer: Use [keras_hub.layers.TokenAndPositionEmbedding](https://keras.io/api/keras_hub/modeling_layers/token_and_position_embedding/) with arguments, vocabulary_size, sequence_length, embedding_dim and mask_zero.\n",
"- Transformer Layer: Use [keras_hub.layers.TransformerDecoder](https://keras.io/api/keras_hub/modeling_layers/transformer_decoder/) with arguments, num_heads and intermediate_dim. Play around with the number of transformer layers you want to have using the variable `NUM_LAYERS`."
]
},
{
Expand All @@ -349,7 +332,7 @@
},
"outputs": [],
"source": [
"inputs = keras.layers.Input(shape=(None,), dtype=tf.int32)\n",
"inputs = keras.layers.Input(shape=(None,), dtype=\"int32\")\n",
"# Embedding layer\n",
"embedding_layer = #TODO: Write the code for the embedding layer\n",
"x = embedding_layer(inputs)\n",
Expand All @@ -362,8 +345,8 @@
"model = keras.Model(inputs=inputs, outputs=outputs)\n",
"\n",
"# set up the loss metric\n",
"loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)\n",
"perplexity = keras_nlp.metrics.Perplexity(from_logits=True, mask_token_id=0)\n",
"loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)\n",
"perplexity = keras_hub.metrics.Perplexity(from_logits=True, mask_token_id=0)\n",
"\n",
"# compile the model\n",
"model.compile(optimizer=\"adam\", loss=loss_fn, metrics=[perplexity])"
Expand Down Expand Up @@ -461,9 +444,9 @@
"id": "WWu0fC2H48kz"
},
"source": [
"We will use the `keras_nlp.samplers` module for inference, which requires a callback function wrapping the model we just trained. This wrapper calls the model and returns the logit predictions for the current token we are generating.\n",
"We will use the `keras_hub.samplers` module for inference, which requires a callback function wrapping the model we just trained. This wrapper calls the model and returns the logit predictions for the current token we are generating.\n",
"\n",
"**Note:** There are two pieces of more advanced functionality available when defining your callback. The first is the ability to take in a `cache` of states computed in previous generation steps, which can be used to speed up generation. The second is the ability to output the final dense \"hidden state\" of each generated token. This is used by `keras_nlp.samplers.ContrastiveSampler`, which avoids repetition by penalizing repeated hidden states. Both are optional, and we will ignore them for now."
"**Note:** There are two pieces of more advanced functionality available when defining your callback. The first is the ability to take in a `cache` of states computed in previous generation steps, which can be used to speed up generation. The second is the ability to output the final dense \"hidden state\" of each generated token. This is used by `keras_hub.samplers.ContrastiveSampler`, which avoids repetition by penalizing repeated hidden states. Both are optional, and we will ignore them for now."
]
},
{
Expand Down Expand Up @@ -511,7 +494,7 @@
},
"outputs": [],
"source": [
"sampler = keras_nlp.samplers.GreedySampler()\n",
"sampler = keras_hub.samplers.GreedySampler()\n",
"output_tokens = sampler(\n",
" next=next,\n",
" prompt=prompt_tokens,\n",
Expand Down Expand Up @@ -554,7 +537,7 @@
},
"outputs": [],
"source": [
"sampler = keras_nlp.samplers.BeamSampler(num_beams=10)\n",
"sampler = keras_hub.samplers.BeamSampler(num_beams=10)\n",
"output_tokens = sampler(\n",
" next=next,\n",
" prompt=prompt_tokens,\n",
Expand Down Expand Up @@ -594,7 +577,7 @@
},
"outputs": [],
"source": [
"sampler = keras_nlp.samplers.RandomSampler()\n",
"sampler = keras_hub.samplers.RandomSampler()\n",
"output_tokens = sampler(\n",
" next=next,\n",
" prompt=prompt_tokens,\n",
Expand Down Expand Up @@ -638,7 +621,7 @@
},
"outputs": [],
"source": [
"sampler = keras_nlp.samplers.TopKSampler(k=10)\n",
"sampler = keras_hub.samplers.TopKSampler(k=10)\n",
"output_tokens = sampler(\n",
" next=next,\n",
" prompt=prompt_tokens,\n",
Expand Down Expand Up @@ -678,7 +661,7 @@
},
"outputs": [],
"source": [
"sampler = keras_nlp.samplers.TopPSampler(p=0.5)\n",
"sampler = keras_hub.samplers.TopPSampler(p=0.5)\n",
"output_tokens = sampler(\n",
" next=next,\n",
" prompt=prompt_tokens,\n",
Expand Down Expand Up @@ -711,7 +694,7 @@
" \"\"\"A callback to generate text from a trained model using top-k.\"\"\"\n",
"\n",
" def __init__(self, k):\n",
" self.sampler = keras_nlp.samplers.TopKSampler(k)\n",
" self.sampler = keras_hub.samplers.TopKSampler(k)\n",
"\n",
" def on_epoch_end(self, epoch, logs=None):\n",
" output_tokens = self.sampler(\n",
Expand Down Expand Up @@ -759,9 +742,9 @@
},
"environment": {
"kernel": "conda-base-py",
"name": "workbench-notebooks.m121",
"name": "workbench-notebooks.m134",
"type": "gcloud",
"uri": "us-docker.pkg.dev/deeplearning-platform-release/gcr.io/workbench-notebooks:m121"
"uri": "us-docker.pkg.dev/deeplearning-platform-release/gcr.io/workbench-notebooks:m134"
},
"kernelspec": {
"display_name": "Python 3 (ipykernel) (Local)",
Expand All @@ -778,7 +761,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.14"
"version": "3.10.18"
}
},
"nbformat": 4,
Expand Down
Loading