Google GenAI Client
The GoogleGenAIClient provides a simple interface to interact with Google's Gemini AI models. It supports both synchronous and asynchronous requests, multi-turn conversations, and advanced features like system instructions and thinking budget.
Installation
pip install maticlib
Quick Start
from maticlib.llm.google_genai import GoogleGenAIClient
# Initialize client
client = GoogleGenAIClient(api_key="YOUR_GOOGLE_API_KEY")
# Make a request
response = client.complete("Hello! Tell me about Python")
print(response.content)
Class: GoogleGenAIClient
Constructor Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
model |
str | "gemini-2.5-flash" | The Gemini model to use |
system_instruct |
str | SystemMessage | None | None | System instruction for the model |
api_key |
str | None | Google API key (or use GOOGLE_API_KEY env var) |
thinking_budget |
int | 0 | Token budget for extended reasoning |
verbose |
bool | True | Enable detailed logging |
return_raw |
bool | False | Return raw JSON instead of Pydantic model |
Available Models
gemini-2.5-flash- Latest fast model (recommended)gemini-2.0-flash-exp- Experimental flash modelgemini-pro- Pro model for complex tasksgemini-1.5-pro- Previous generation pro model
Methods
complete()
Make a synchronous completion request.
def complete(input: Union[str, List]) -> Union[GeminiResponse, Dict[str, Any]]
Parameters:
- odede>input (str | List) - Text prompt or list of messages
Returns: GeminiResponse Pydantic model or dict (if return_raw=True)
Example:
response = client.complete("Explain quantum computing")
print(response.content)
print(f"Tokens used: {response.total_tokens}")
async_complete()
Make an asynchronous completion request.
async def async_complete(input: Union[str, List]) -> Union[GeminiResponse, Dict[str, Any]]
Example:
import asyncio
async def main():
response = await client.async_complete("Tell me a joke")
print(response.content)
asyncio.run(main())
get_text_response()
Helper method to extract text content from response.
def get_text_response(response: Union[GeminiResponse, Dict]) -> str
Example:
response = client.complete("Hello!")
text = client.get_text_response(response)
print(text)
Response Model
GeminiResponse
Pydantic model returned by default (when return_raw=False)
Attributes:
content(str) - Extracted text responsecontent_parts(List[ContentPart]) - Multimodal content partsfinish_reason(str) - Completion statusprompt_tokens(int) - Input token countcompletion_tokens(int) - Output token counttotal_tokens(int) - Total tokens usedimage_tokens(int) - Image tokens (if multimodal)audio_tokens(int) - Audio tokens (if multimodal)video_tokens(int) - Video tokens (if multimodal)thinking_tokens(int) - Tokens used for thinkingresponse_id(str) - Unique response identifiermodel_version(str) - Model used for generationraw_response(dict) - Original API response
Usage Examples
System Instructions
from maticlib.llm.google_genai import GoogleGenAIClient
from maticlib.messages import SystemMessage
# Using string
client = GoogleGenAIClient(
system_instruct="You are a helpful Python tutor",
api_key="YOUR_KEY"
)
# Using SystemMessage
client = GoogleGenAIClient(
system_instruct=SystemMessage("You are a helpful Python tutor"),
api_key="YOUR_KEY"
)
response = client.complete("What are list comprehensions?")
print(response.content)
Multi-turn Conversations
from maticlib.messages import HumanMessage, AIMessage
conversation = [
HumanMessage("Hello! I'm learning Python."),
AIMessage("Great! What would you like to know?"),
HumanMessage("What are decorators?")
]
response = client.complete(conversation)
print(response.content)
Using Dictionaries
messages = [
{"role": "user", "content": "What is AI?"},
{"role": "assistant", "content": "AI stands for..."},
{"role": "user", "content": "Tell me more"}
]
response = client.complete(messages)
print(response.content)
Thinking Budget (Extended Reasoning)
client = GoogleGenAIClient(
model="gemini-2.0-flash-exp",
thinking_budget=1000, # Allow up to 1000 thinking tokens
api_key="YOUR_KEY"
)
response = client.complete("Solve this complex math problem: ...")
print(f"Thinking tokens used: {response.thinking_tokens}")
Raw Response Mode
client = GoogleGenAIClient(
return_raw=True,
api_key="YOUR_KEY"
)
response = client.complete("Hello!")
print(type(response)) #
print(response['candidates'][0]['content'])
Error Handling
try:
client = GoogleGenAIClient(api_key="YOUR_KEY")
response = client.complete("Your prompt")
print(response.content)
except ValueError as e:
print(f"Configuration error: {e}")
except httpx.HTTPStatusError as e:
print(f"API error: {e.response.status_code}")
print(f"Details: {e.response.text}")
except Exception as e:
print(f"Unexpected error: {e}")
Environment Variables
# Set API key
export GOOGLE_API_KEY="your-api-key"
# Then use client without passing key
from maticlib.llm.google_genai import GoogleGenAIClient
client = GoogleGenAIClient() # Automatically uses GOOGLE_API_KEY
Best Practices
- Use environment variables for API keys in production
- Enable verbose mode during development for debugging
- Use async methods for concurrent requests
- Monitor token usage to control costs
- Implement retry logic for production systems
- Use system instructions to set consistent behavior
- Cache responses when appropriate to reduce API calls
Rate Limits
Google Gemini API has rate limits that vary by model and tier. Implement exponential backoff and retry logic:
odede>import time
def complete_with_retry(client, prompt, max_retries=3):
for attempt in range(max_retries):
try:
return client.complete(prompt)
except httpx.HTTPStatusError as e:
if e.response.status_code == 429: # Rate limit
if attempt == max_retries - 1:
raise
wait_time = 2 ** attempt
print(f"Rate limited. Waiting {wait_time}s...")
time.sleep(wait_time)
else:
raise