GoogleCloudPlatform · takumiohym · Oct 29, 2025 · Nov 10, 2025
diff --git a/notebooks/text_models/labs/text_generation_using_transformers.ipynb b/notebooks/text_models/labs/text_generation_using_transformers.ipynb
@@ -37,24 +37,7 @@
    "source": [
     "## Setup\n",
     "\n",
-    "In order to run this notebook, you will need `keras_nlp`. KerasNLP is a natural language processing library that works natively with TensorFlow, JAX, or PyTorch. Keras NLP offers transformer layers that are extremely helpful to build the generative model in this notebook.\n",
-    "\n",
-    "Uncomment the cell below if you don't have keras_nlp already installed. You may need to restart the kernel once it has been installed."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "colab": {
-     "base_uri": "https://localhost:8080/"
-    },
-    "id": "-QxR27aY5Ghe",
-    "outputId": "a31a6553-979c-47ec-aadc-4d829f99d80b"
-   },
-   "outputs": [],
-   "source": [
-    "#!pip install keras-nlp"
+    "In order to run this notebook, you will need `keras_hub`. Keras Hub is a pretrained modeling library that aims to be simple, flexible, and fast. It works natively with TensorFlow, JAX, or PyTorch. Keras NLP offers transformer layers that are extremely helpful to build the generative model in this notebook."
    ]
   },
   {
@@ -82,9 +65,9 @@
    },
    "outputs": [],
    "source": [
-    "import keras_nlp\n",
-    "import tensorflow as tf\n",
-    "from tensorflow import keras"
+    "import keras\n",
+    "import keras_hub\n",
+    "import tensorflow as tf"
    ]
   },
   {
@@ -178,7 +161,7 @@
     "    origin=\"https://storage.googleapis.com/asl-public/text/data/simplebooks.zip\",\n",
     "    extract=True,\n",
     ")\n",
-    "data_dir = os.path.expanduser(\"~/.keras/datasets/simplebooks/\")\n",
+    "data_dir = os.path.expanduser(\"~/.keras/datasets/simplebooks.zip/simplebooks/\")\n",
     "\n",
     "# Load simplebooks-92 train set and filter out short lines using MIN_TRAINING_SEQ_LEN\n",
     "raw_train_ds = (\n",
@@ -202,7 +185,7 @@
    "source": [
     "## Train the Tokenizer\n",
     "\n",
-    "We train the tokenizer using Keras NLP's [compute_word_piece_vocabulary](https://keras.io/api/keras_nlp/tokenizers/compute_word_piece_vocabulary/) from the training dataset for a vocabulary size of `VOCAB_SIZE`, which is a tuned hyperparameter. We want to limit the vocabulary as much as possible, since it has a large effect on the number of model parameters. We also don't want to include *too few* words, or there would be too many out-of-vocabulary (OOV) sub-words. In addition, three tokens are reserved in the vocabulary:\n",
+    "We train the tokenizer using Keras NLP's [compute_word_piece_vocabulary](https://keras.io/api/keras_hub/tokenizers/compute_word_piece_vocabulary/) from the training dataset for a vocabulary size of `VOCAB_SIZE`, which is a tuned hyperparameter. We want to limit the vocabulary as much as possible, since it has a large effect on the number of model parameters. We also don't want to include *too few* words, or there would be too many out-of-vocabulary (OOV) sub-words. In addition, three tokens are reserved in the vocabulary:\n",
     "\n",
     "- `\"[PAD]\"` for padding sequences to `SEQ_LEN`. This token has index 0 in both `reserved_tokens` and `vocab`, since `WordPieceTokenizer` (and other layers) consider `0`/`vocab[0]` as the default padding.\n",
     "- `\"[UNK]\"` for OOV sub-words, which should match the default `oov_token=\"[UNK]\"` in\n",
@@ -222,7 +205,7 @@
    "source": [
     "# Train tokenizer vocabulary\n",
     "print(\"Training the word piece tokenizer. This will take 5-10 mins...\")\n",
-    "vocab = keras_nlp.tokenizers.compute_word_piece_vocabulary(\n",
+    "vocab = keras_hub.tokenizers.compute_word_piece_vocabulary(\n",
     "    raw_train_ds,\n",
     "    vocabulary_size=VOCAB_SIZE,\n",
     "    lowercase=True,\n",
@@ -239,7 +222,7 @@
    "source": [
     "## Load Tokenizer\n",
     "\n",
-    "We use the vocabulary data to initialize [keras_nlp.tokenizers.WordPieceTokenizer](https://keras.io/api/keras_nlp/tokenizers/word_piece_tokenizer/). WordPieceTokenizer is an efficient implementation of the WordPiece algorithm used by BERT and other models. It will strip, lower-case and do other irreversible preprocessing operations. Given a vocabulary and input sentence, the WordPiece tokenizer will convert the sentence into an array of IDs and pad the sentence to the `SEQ_LEN` defined. For example, \n",
+    "We use the vocabulary data to initialize [keras_hub.tokenizers.WordPieceTokenizer](https://keras.io/api/keras_hub/tokenizers/word_piece_tokenizer/). WordPieceTokenizer is an efficient implementation of the WordPiece algorithm used by BERT and other models. It will strip, lower-case and do other irreversible preprocessing operations. Given a vocabulary and input sentence, the WordPiece tokenizer will convert the sentence into an array of IDs and pad the sentence to the `SEQ_LEN` defined. For example, \n",
     "\n",
     "```\n",
     "vocab = [\"[UNK]\", \"the\", \"qu\", \"##ick\", \"br\", \"##own\", \"fox\", \".\"]\n",
@@ -257,7 +240,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "**Exercise:** Define the arguments (vocabulary, sequence_length, lowercase) for `keras_nlp.tokenizer.WordPieceTokenizer`. Refer to the [documentation here](https://keras.io/api/keras_nlp/tokenizers/word_piece_tokenizer/)."
+    "**Exercise:** Define the arguments (vocabulary, sequence_length, lowercase) for `keras_hub.tokenizer.WordPieceTokenizer`. Refer to the [documentation here](https://keras.io/api/keras_hub/tokenizers/word_piece_tokenizer/)."
    ]
   },
   {
@@ -268,7 +251,7 @@
    },
    "outputs": [],
    "source": [
-    "tokenizer = keras_nlp.tokenizers.WordPieceTokenizer(\n",
+    "tokenizer = keras_hub.tokenizers.WordPieceTokenizer(\n",
     "    # TODO: Fill out the arguments\n",
     ")"
    ]
@@ -293,7 +276,7 @@
    "outputs": [],
    "source": [
     "# packer adds a start token\n",
-    "start_packer = keras_nlp.layers.StartEndPacker(\n",
+    "start_packer = keras_hub.layers.StartEndPacker(\n",
     "    sequence_length=SEQ_LEN,\n",
     "    start_value=tokenizer.token_to_id(\"[BOS]\"),\n",
     ")\n",
@@ -325,20 +308,20 @@
     "\n",
     "We create our scaled-down transformer-decoder-based generative text model model with the following layers:\n",
     "\n",
-    "- One `keras_nlp.layers.TokenAndPositionEmbedding` layer, which combines the embedding for the token and its position. This is diffrent from a traditional embedding layer because it creates trainable positional embedding instead of the fixed sinusoidal embedding.\n",
-    "- Multiple `keras_nlp.layers.TransformerDecoder` layers created using a loop. \n",
+    "- One `keras_hub.layers.TokenAndPositionEmbedding` layer, which combines the embedding for the token and its position. This is diffrent from a traditional embedding layer because it creates trainable positional embedding instead of the fixed sinusoidal embedding.\n",
+    "- Multiple `keras_hub.layers.TransformerDecoder` layers created using a loop. \n",
     "- One final dense linear layer.\n",
     "\n",
-    "**Note:** You can take a look at the [source code](https://github.com/keras-team/keras-nlp/blob/v0.6.1/keras_nlp/layers/modeling/transformer_decoder.py#L31) of this layer to see the different components that go into this layer."
+    "**Note:** You can take a look at the [source code](https://github.com/keras-team/keras-hub/blob/v0.23.0/keras_hub/src/layers/modeling/token_and_position_embedding.py#L11) of this layer to see the different components that go into this layer."
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "**Exercise:** Write the model code using the following layers:\n",
-    "- Embedding Layer: Use [keras_nlp.layers.TokenAndPositionEmbedding](https://keras.io/api/keras_nlp/modeling_layers/token_and_position_embedding/) with arguments, vocabulary_size, sequence_length, embedding_dim and mask_zero.\n",
-    "- Transformer Layer: Use [keras_nlp.layers.TransformerDecoder](https://keras.io/api/keras_nlp/modeling_layers/transformer_decoder/) with arguments, num_heads and intermediate_dim. Play around with the number of transformer layers you want to have using the variable `NUM_LAYERS`."
+    "- Embedding Layer: Use [keras_hub.layers.TokenAndPositionEmbedding](https://keras.io/api/keras_hub/modeling_layers/token_and_position_embedding/) with arguments, vocabulary_size, sequence_length, embedding_dim and mask_zero.\n",
+    "- Transformer Layer: Use [keras_hub.layers.TransformerDecoder](https://keras.io/api/keras_hub/modeling_layers/transformer_decoder/) with arguments, num_heads and intermediate_dim. Play around with the number of transformer layers you want to have using the variable `NUM_LAYERS`."
    ]
   },
   {
@@ -349,7 +332,7 @@
    },
    "outputs": [],
    "source": [
-    "inputs = keras.layers.Input(shape=(None,), dtype=tf.int32)\n",
+    "inputs = keras.layers.Input(shape=(None,), dtype=\"int32\")\n",
     "# Embedding layer\n",
     "embedding_layer = #TODO: Write the code for the embedding layer\n",
     "x = embedding_layer(inputs)\n",
@@ -362,8 +345,8 @@
     "model = keras.Model(inputs=inputs, outputs=outputs)\n",
     "\n",
     "# set up the loss metric\n",
-    "loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)\n",
-    "perplexity = keras_nlp.metrics.Perplexity(from_logits=True, mask_token_id=0)\n",
+    "loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)\n",
+    "perplexity = keras_hub.metrics.Perplexity(from_logits=True, mask_token_id=0)\n",
     "\n",
     "# compile the model\n",
     "model.compile(optimizer=\"adam\", loss=loss_fn, metrics=[perplexity])"
@@ -461,9 +444,9 @@
     "id": "WWu0fC2H48kz"
    },
    "source": [
-    "We will use the `keras_nlp.samplers` module for inference, which requires a callback function wrapping the model we just trained. This wrapper calls the model and returns the logit predictions for the current token we are generating.\n",
+    "We will use the `keras_hub.samplers` module for inference, which requires a callback function wrapping the model we just trained. This wrapper calls the model and returns the logit predictions for the current token we are generating.\n",
     "\n",
-    "**Note:** There are two pieces of more advanced functionality available when defining your callback. The first is the ability to take in a `cache` of states computed in previous generation steps, which can be used to speed up generation. The second is the ability to output the final dense \"hidden state\" of each generated token. This is used by `keras_nlp.samplers.ContrastiveSampler`, which avoids repetition by penalizing repeated hidden states. Both are optional, and we will ignore them for now."
+    "**Note:** There are two pieces of more advanced functionality available when defining your callback. The first is the ability to take in a `cache` of states computed in previous generation steps, which can be used to speed up generation. The second is the ability to output the final dense \"hidden state\" of each generated token. This is used by `keras_hub.samplers.ContrastiveSampler`, which avoids repetition by penalizing repeated hidden states. Both are optional, and we will ignore them for now."
    ]
   },
   {
@@ -511,7 +494,7 @@
    },
    "outputs": [],
    "source": [
-    "sampler = keras_nlp.samplers.GreedySampler()\n",
+    "sampler = keras_hub.samplers.GreedySampler()\n",
     "output_tokens = sampler(\n",
     "    next=next,\n",
     "    prompt=prompt_tokens,\n",
@@ -554,7 +537,7 @@
    },
    "outputs": [],
    "source": [
-    "sampler = keras_nlp.samplers.BeamSampler(num_beams=10)\n",
+    "sampler = keras_hub.samplers.BeamSampler(num_beams=10)\n",
     "output_tokens = sampler(\n",
     "    next=next,\n",
     "    prompt=prompt_tokens,\n",
@@ -594,7 +577,7 @@
    },
    "outputs": [],
    "source": [
-    "sampler = keras_nlp.samplers.RandomSampler()\n",
+    "sampler = keras_hub.samplers.RandomSampler()\n",
     "output_tokens = sampler(\n",
     "    next=next,\n",
     "    prompt=prompt_tokens,\n",
@@ -638,7 +621,7 @@
    },
    "outputs": [],
    "source": [
-    "sampler = keras_nlp.samplers.TopKSampler(k=10)\n",
+    "sampler = keras_hub.samplers.TopKSampler(k=10)\n",
     "output_tokens = sampler(\n",
     "    next=next,\n",
     "    prompt=prompt_tokens,\n",
@@ -678,7 +661,7 @@
    },
    "outputs": [],
    "source": [
-    "sampler = keras_nlp.samplers.TopPSampler(p=0.5)\n",
+    "sampler = keras_hub.samplers.TopPSampler(p=0.5)\n",
     "output_tokens = sampler(\n",
     "    next=next,\n",
     "    prompt=prompt_tokens,\n",
@@ -711,7 +694,7 @@
     "    \"\"\"A callback to generate text from a trained model using top-k.\"\"\"\n",
     "\n",
     "    def __init__(self, k):\n",
-    "        self.sampler = keras_nlp.samplers.TopKSampler(k)\n",
+    "        self.sampler = keras_hub.samplers.TopKSampler(k)\n",
     "\n",
     "    def on_epoch_end(self, epoch, logs=None):\n",
     "        output_tokens = self.sampler(\n",
@@ -759,9 +742,9 @@
   },
   "environment": {
    "kernel": "conda-base-py",
-   "name": "workbench-notebooks.m121",
+   "name": "workbench-notebooks.m134",
    "type": "gcloud",
-   "uri": "us-docker.pkg.dev/deeplearning-platform-release/gcr.io/workbench-notebooks:m121"
+   "uri": "us-docker.pkg.dev/deeplearning-platform-release/gcr.io/workbench-notebooks:m134"
   },
   "kernelspec": {
    "display_name": "Python 3 (ipykernel) (Local)",
@@ -778,7 +761,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.14"
+   "version": "3.10.18"
   }
  },
  "nbformat": 4,