Google GenAI Client

The GoogleGenAIClient provides a simple interface to interact with Google's Gemini AI models. It supports both synchronous and asynchronous requests, multi-turn conversations, and advanced features like system instructions and thinking budget.

Installation

pip install maticlib

Quick Start

from maticlib.llm.google_genai import GoogleGenAIClient

# Initialize client
client = GoogleGenAIClient(api_key="YOUR_GOOGLE_API_KEY")

# Make a request
response = client.complete("Hello! Tell me about Python")
print(response.content)

Class: GoogleGenAIClient

Constructor Parameters

Parameter	Type	Default	Description
`model`	str	"gemini-2.5-flash"	The Gemini model to use
`system_instruct`	str \| SystemMessage \| None	None	System instruction for the model
`api_key`	str	None	Google API key (or use GOOGLE_API_KEY env var)
`thinking_budget`	int	0	Token budget for extended reasoning
`verbose`	bool	True	Enable detailed logging
`return_raw`	bool	False	Return raw JSON instead of Pydantic model

Available Models

gemini-2.5-flash - Latest fast model (recommended)
gemini-2.0-flash-exp - Experimental flash model
gemini-pro - Pro model for complex tasks
gemini-1.5-pro - Previous generation pro model

Methods

complete()

Make a synchronous completion request.

def complete(input: Union[str, List]) -> Union[GeminiResponse, Dict[str, Any]]

Parameters:

odede>input (str | List) - Text prompt or list of messages

Returns: GeminiResponse Pydantic model or dict (if return_raw=True)

Example:

response = client.complete("Explain quantum computing")
print(response.content)
print(f"Tokens used: {response.total_tokens}")

async_complete()

Make an asynchronous completion request.

async def async_complete(input: Union[str, List]) -> Union[GeminiResponse, Dict[str, Any]]

Example:

import asyncio

async def main():
    response = await client.async_complete("Tell me a joke")
    print(response.content)

asyncio.run(main())

get_text_response()

Helper method to extract text content from response.

def get_text_response(response: Union[GeminiResponse, Dict]) -> str

Example:

response = client.complete("Hello!")
text = client.get_text_response(response)
print(text)

Response Model

GeminiResponse

Pydantic model returned by default (when return_raw=False)

Attributes:

content (str) - Extracted text response
content_parts (List[ContentPart]) - Multimodal content parts
finish_reason (str) - Completion status
prompt_tokens (int) - Input token count
completion_tokens (int) - Output token count
total_tokens (int) - Total tokens used
image_tokens (int) - Image tokens (if multimodal)
audio_tokens (int) - Audio tokens (if multimodal)
video_tokens (int) - Video tokens (if multimodal)
thinking_tokens (int) - Tokens used for thinking
response_id (str) - Unique response identifier
model_version (str) - Model used for generation
raw_response (dict) - Original API response

Usage Examples

System Instructions

from maticlib.llm.google_genai import GoogleGenAIClient
from maticlib.messages import SystemMessage

# Using string
client = GoogleGenAIClient(
    system_instruct="You are a helpful Python tutor",
    api_key="YOUR_KEY"
)

# Using SystemMessage
client = GoogleGenAIClient(
    system_instruct=SystemMessage("You are a helpful Python tutor"),
    api_key="YOUR_KEY"
)

response = client.complete("What are list comprehensions?")
print(response.content)

Multi-turn Conversations

from maticlib.messages import HumanMessage, AIMessage

conversation = [
    HumanMessage("Hello! I'm learning Python."),
    AIMessage("Great! What would you like to know?"),
    HumanMessage("What are decorators?")
]

response = client.complete(conversation)
print(response.content)

Using Dictionaries

messages = [
    {"role": "user", "content": "What is AI?"},
    {"role": "assistant", "content": "AI stands for..."},
    {"role": "user", "content": "Tell me more"}
]

response = client.complete(messages)
print(response.content)

Thinking Budget (Extended Reasoning)

client = GoogleGenAIClient(
    model="gemini-2.0-flash-exp",
    thinking_budget=1000,  # Allow up to 1000 thinking tokens
    api_key="YOUR_KEY"
)

response = client.complete("Solve this complex math problem: ...")
print(f"Thinking tokens used: {response.thinking_tokens}")

Raw Response Mode

client = GoogleGenAIClient(
    return_raw=True,
    api_key="YOUR_KEY"
)

response = client.complete("Hello!")
print(type(response))  # 
print(response['candidates'][0]['content'])

Error Handling

try:
    client = GoogleGenAIClient(api_key="YOUR_KEY")
    response = client.complete("Your prompt")
    print(response.content)
    
except ValueError as e:
    print(f"Configuration error: {e}")
    
except httpx.HTTPStatusError as e:
    print(f"API error: {e.response.status_code}")
    print(f"Details: {e.response.text}")
    
except Exception as e:
    print(f"Unexpected error: {e}")

Environment Variables

# Set API key
export GOOGLE_API_KEY="your-api-key"

# Then use client without passing key
from maticlib.llm.google_genai import GoogleGenAIClient
client = GoogleGenAIClient()  # Automatically uses GOOGLE_API_KEY

Best Practices

Use environment variables for API keys in production
Enable verbose mode during development for debugging
Use async methods for concurrent requests
Monitor token usage to control costs
Implement retry logic for production systems
Use system instructions to set consistent behavior
Cache responses when appropriate to reduce API calls

Rate Limits

Google Gemini API has rate limits that vary by model and tier. Implement exponential backoff and retry logic:

odede>import time

def complete_with_retry(client, prompt, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.complete(prompt)
        except httpx.HTTPStatusError as e:
            if e.response.status_code == 429:  # Rate limit
                if attempt == max_retries - 1:
                    raise
                wait_time = 2 ** attempt
                print(f"Rate limited. Waiting {wait_time}s...")
                time.sleep(wait_time)
            else:
                raise

Google GenAI Client

Installation

Quick Start

Class: GoogleGenAIClient

Constructor Parameters

Available Models

Methods

complete()

async_complete()

get_text_response()

Response Model

GeminiResponse

Usage Examples

System Instructions

Multi-turn Conversations

Using Dictionaries

Thinking Budget (Extended Reasoning)

Raw Response Mode

Error Handling

Environment Variables

Best Practices

Rate Limits

See Also