Inspired by How to Build and Agent
Wow, what an amazing write up by Thorsten Ball! I got really inspired to try it out but I could not: turns out, one needs credits for Claude that I wasn't willing to pay for. But I do have Google Gemini so I asked myself: "how difficult could it be to borrow the code and twist it to work with the Google API?" Well, it took longer than I expected but I'm still giving myself credit for doing it in a less than a day after not having written any Go for about 10 years, and, having zero familiarity with the Google Gemini API Libraries, incl. the Go version.
What are we making here: a command-line tool that connects to Gemini and allows it to edit code on our (the client) side.
You gotta have Go on your system and that's easy to achieve. Follow the instructions on Go installation for your platform. If you have an older version-as I did-just uninstall it by following the Uninstalling Go guide, then reinstall the newest version.
Next, head to Google's AI
Studio and get an API
key. Click on the Create API Key button in the top right corner, then
click in the Search Google Cloud projects field of the popup (I didn't
have any existing Cloud Projects) and select Gemini API. Copy and save
the key (you won't be able to view it again) then make sure it's in
your command line environment. I use z shell on macos; you can follow
these
instructions
if you're completely new to this. Once the key is exported it will be
picked up from the environment (so you don't have to—and you do
not want to—hardcode it into the program). I used
GOOGLE_API_KEY for the environment variable name and it works.
Clone the repo at github.com/giik/genai-agent-1. To build and run the code use the following command:
go build ./... && ./agentAgain, this code uses the Google GenAI API from
google.golang.org/genai so we just need to import it. Similar to
Anthropic, the GenAI API provides for multi-turn conversations with
tool/function calling.
The idea is very elegant and there is an excellent, easily understandable description (picture included) in the Gemini API docs. In short, the system prompt (sent to the model from our client at the start of the session) includes the definitions of the available functions/tools. The model can decide to answer with a Text response, or, with a "function call" to one of the tools we've made available. Then, it is our responsibility to run the tool-client side-and forward any output back to the model, which send back its final answer. Very ingenious!
To tie it all together, we need a few pieces:
- write the tools we need, such as a
ReadFilefunction, - let Gemini know of the available tools using the
Toolsfield of theGenerateContentConfig; this includes the tool name and description, and (most importantly) its schema, such as inputs, and output, - respond to Gemini's requests to run tools by calling the corresponding functions with the Gemini-provided parameters, then forwarding outputs to Gemini.
Also, note that on every turn we send to the model a single text part or one or more responses to the Function Call requests. The server is stateless, so the actual requests to it must carry the context the model needs -- typically, all turns in the conversation so far -- but this is handled by the API, not our client code. I am guessing there is some point where the context becomes too large and gets truncated but this is just a wild guess, I don't really know and haven't yet inspected the GenAI Go code to see whether it actually does that.
I added functionality to save the conversation to a markdown file
because the model's responses are in markdown and that makes it easy
to incorporate them. Log files are named like
session-20250518-1150-a1bd959eaa.md so they'll sort nicely if you
need to find one ot the most recent one quickly. The log files are
written to the location specified in the CODE_AGENT_LOGDIR
environment variable which I personally set to
$HOME/Developer/agentlogs (or to /tmp if the environment is not
found). Logging can be turned off completely using the log_session
command line flag like
go build ./... && ./agent -log_session=falseWhen you fire it up, things should look like this:
go build ./... && ./agent
client created ✔
chat with 'gemini-2.0-flash' created ✔
logging conversation to '/Users/****/Developer/agentlog/session-20250521-2301-d1e8345801.md'
[user]: Hello, I am Bob.
[model][text 24 bytes][part 1 of 1] Hi Bob, how can I help?
[user]: What tools are available to you?
[model][text 82 bytes][part 1 of 1] I have access to the following tools: `read_file`, `edit_file`, and `list_files`.
[user]:I recorded a session (see the included file
session-20250520-2142-d2e9e2fbfb.md) in which I asked the model to
create a fizzbuzz Swift program, then edit it to change how far it's
going (the program is included). This worked great. Similarly, it was
able to create a functional SwiftUI view (included) of a
timer/clock. Finally, because I couldn't help myself I did ask it to
read its code (all Go files in the current directory) and write its
own markdown documentation. I've included it without changes, it's not
at all bad.
The last time I wrote a relatively complex server project in Go (or any code in Go) was about 2017. It was a nice break from the daily C++ at work and whatever other hobbies at home. It came back to me in a jiffy but I ask the more experienced Gophers out there to forgive me for any non-idiomatic use or otherwise unorthodox style.
Enjoy!
This document provides an in-depth analysis of the code agent's architecture and functionality, based on the provided Go source code.
-
main.go: The main entry point. It initializes the Google Gemini API client, configures the chat session, and starts the agent's main loop.- It reads the
GOOGLE_API_KEYenvironment variable for authentication. - It defines command-line flags, specifically
log_sessionto enable/disable logging. - It initializes the Gemini API client using
genai.NewClient. - It registers available tools (
readFileTool,editFileTool,listFilesTool) with the Gemini model. - It sets up a system instruction to guide the LLM. Critically, this instruction tells the model how to behave ("Answer concisely. Ask clarifying questions, if necessary.").
- It creates a new chat session using
client.Chats.Create. - It creates an
Agentinstance with the Gemini client, model name, a function to get user input (GetUserMessage), and a log file (if logging is enabled). - Finally, it calls
agent.Runto start the agent's main loop.
- It reads the
-
agent.go: Contains theAgentstruct and the main control loop (Runmethod).- The
Agentstruct holds the Gemini client, model name, a function to obtain user input (GetPrompt), and a log file. - The
Runmethod implements a state machine to manage the conversation flow. The states areUserInput,SendMessage,ProcessResponse, andError. - UserInput: Prompts the user for input using the
GetPromptfunction. - SendMessage: Sends the user's input to the Gemini API using
chat.SendMessage. - ProcessResponse: Handles the response from the Gemini API.
- It iterates through the response parts, handling both text and function calls.
- For function calls, it extracts the function name and arguments and calls the appropriate function using the
FunctionCallfunction (defined infunction_call.go). - It appends the results of the function call (either the output or an error message) to the conversation.
- It logs the conversation to the log file using the
AppendToLogfunction.
- The
DescribeMapEntriesfunction converts a map[string]any into a slice ofEntryInfostructs, making it easier to describe the contents of the map in the logs. It uses reflection to provide detailed descriptions of different data types.
- The
-
function_call.go: Handles the execution of function calls requested by the LLM.- The
FunctionCallfunction takes agenai.FunctionCallstruct as input. - It validates the function name and arguments against a list of expected tools and arguments.
- It uses a
switchstatement to call the appropriate function based on the function name (read_file,edit_file, orlist_files). - It constructs a
genai.FunctionResponsestruct containing the result of the function call (either the output or an error message). - It uses
greenfandredfto indicate success or failure of a function call.
- The
-
edit_file.go: Implements theedit_filetool.- The
EditFilefunction takes the file path, the old string, and the new string as input. - It reads the content of the file using the
ReadFilefunction. - It replaces all occurrences of the old string with the new string using
strings.Replace. - It writes the modified content back to the file using
os.WriteFile. - It includes error handling for cases where the file does not exist or the old string is not found.
- The
CreateNewFilefunction creates a new file with the specified content, including creating any necessary parent directories.
- The
-
read_file.go: Implements theread_filetool.- The
ReadFilefunction takes the file path as input. - It reads the content of the file using
os.ReadFileand returns it as a string.
- The
-
list_files.go: Implements thelist_filestool.- The
ListFilesfunction takes the path as input. - It uses
filepath.WalkDirto recursively traverse the directory specified by the path. - It creates a list of all files and directories within the specified path.
- It marshals the list of files into a JSON string using
json.Marshal.
- The
-
logging.go: Provides logging functionality.- The
LogFileOrNilfunction creates a log file in the directory specified by theCODE_AGENT_LOGDIRenvironment variable (or/tmpif the environment variable is not set). The filename includes a timestamp and a random string. - The
AppendToLogfunction appends a message to the log file, including the role of the message sender (user or model) and the timestamp.
- The
- The user provides input to the agent via the command line.
- The
main.gofile reads the user's input and sends it to the Gemini API. - The Gemini API processes the input and generates a response.
- If the response contains a function call, the
agent.gofile calls the appropriate function using thefunction_call.gofile. - The function call interacts with the file system (using
edit_file.go,read_file.go, orlist_files.go). - The result of the function call is sent back to the Gemini API.
- The Gemini API generates a final response.
- The
main.gofile displays the response to the user. - The entire conversation is logged to a file using
logging.go.
The agent implements several security measures:
- Tool Validation: The
function_call.govalidates that the requested tool exists and that the arguments are correct. - Sandboxing (Implicit): The agent relies on the Gemini API to provide a sandboxed environment for code execution. While the provided code does not explicitly implement sandboxing, the interaction with external resources (file system) are mediated through function calls defined and validated by the agent.
- Limited File System Access: The agent only provides access to a limited set of file system operations (read, edit, list). Direct shell execution or arbitrary code execution is not supported.
- Explicit Sandboxing: Implementing an explicit sandboxing mechanism (e.g., using containers or virtual machines) would further enhance security.
- Access Control: Implementing more fine-grained access control policies would allow for more secure management of file system resources.
- Rate Limiting: Implementing rate limiting would prevent the agent from being overwhelmed with requests.
- Input Sanitization: Implementing more robust input sanitization would prevent injection attacks.
This detailed analysis provides a comprehensive understanding of the code agent's architecture, functionality, and security considerations. It highlights the key components, data flow, and potential areas for improvement.