What We’re Building
The reference implementation is a small Python project with a few clear parts:| Part | What it does |
|---|---|
| CLI | Accepts a research topic, model, providers, depth settings, output path, and artifact directory |
| Venice client | Calls chat completions, streaming chat completions, and POST /augment/scrape |
| Search layer | Searches DuckDuckGo by default, with optional arXiv paper discovery |
| Data models | Tracks source URLs, canonical URLs, chunks, evidence, notes, errors, and reports |
| Research agent | Plans searches, reads sources, extracts evidence, analyzes gaps, generates follow-up queries, and writes the final report |
| Artifact writer | Stores auditable JSONL records for queries, research gaps, results, fetches, chunks, source notes, report drafts, errors, and reports |
- Ask Venice to generate diverse search queries for the topic.
- Search the web with one or more providers.
- Deduplicate URLs before reading them.
- Use Venice’s scrape endpoint to turn each public source page into Markdown.
- Split long pages into chunks.
- Ask Venice to extract evidence from each chunk.
- Ask Venice to turn chunk evidence into source notes.
- Identify research gaps and source-balance issues before generating follow-up queries.
- Ask Venice to synthesize the final report with footnote-style citations.
Setting Up the Project
The reference project uses Python 3.13 anduv, but the same code works with a normal virtual environment too.
Create a new project:
pip, create a virtual environment and install the same packages:
.env file for local development:
VENICE_MODEL so you can change the model without editing code. The reference implementation currently defaults to openai-gpt-55, but you can swap it for another chat model available to your Venice account.
Creating the Data Models
Before writing the agent logic, we’ll define the objects that move through the pipeline. These models keep the rest of the code easier to reason about because every source carries provenance: where it came from, which query found it, when it was retrieved, and how it was chunked. Createresearch_agent/models.py:
canonical_url, content_hash, and chunks.
canonical_url lets the agent avoid reading the same source repeatedly when search results differ only by tracking parameters or fragments. content_hash helps catch duplicate pages even when they live at different URLs. chunks lets us summarize long pages in smaller pieces instead of losing useful evidence to context limits.
Add the helper functions below the dataclasses:
Building the Venice Client
Next, we’ll create a small Venice client. You could use the OpenAI Python SDK for chat completions because Venice is OpenAI-compatible, but the reference implementation useshttpx directly so the same client can call Venice’s POST /augment/scrape endpoint.
Create research_agent/venice.py:
from_env() helper keeps secrets out of your source code. It also makes local development convenient because python-dotenv can load VENICE_API_KEY and VENICE_MODEL from .env.
Now add chat completions:
_post_chat_stream() helper that reads server-sent events from streaming chat completions. You can start without streaming, then add it once the rest of the research flow works.
Adding Search Providers
The search layer has two jobs: find source URLs and fetch those URLs through the Venice scraper. The reference implementation uses DuckDuckGo’s HTML endpoint for general web search and arXiv’s Atom API for papers. Createresearch_agent/web.py:
WebSearch class coordinates providers and fetches pages:
Writing Local Artifacts
For research workflows, auditability matters. If the final report says something surprising, you should be able to inspect which source led to it. Createresearch_agent/artifacts.py:
Building the Research Agent
Now that we have Venice, search, models, and artifacts, we can build the actual agent. Createresearch_agent/agent.py:
models.py if you have not added them yet:
ResearchAgent:
run() method coordinates the research passes:
seen_* sets are what keep the agent from wasting time on duplicate sources. URL dedupe catches repeated links. Content hash dedupe catches mirrors, syndicated posts, and pages that redirect to the same final content.
Planning Initial and Follow-up Searches
The first model call turns the topic into search queries:_gap_follow_up_queries(), which asks Venice to return both gap records and queries:
--artifacts is enabled, these records are written to research_gaps.jsonl. That gives you a useful audit trail for why the agent searched for a particular second-pass query.
The parser should be forgiving. If the model returns malformed JSON, the agent falls back to the original topic:
Reading and Summarizing Sources
Now we collect source notes. The agent searches each query, fetches each result through Venice scrape, chunks the Markdown, and summarizes the useful evidence.Writing the Final Report
Once the agent has source notes, it can write the report. Start with a single-pass report writer:Adding the CLI
Now we need a command-line entry point. Createmain.py:
| Option | What it controls |
|---|---|
--iterations | Number of research passes |
--queries | Search queries generated per pass |
--results | Results read per provider for each query |
--providers | Search providers, such as duckduckgo or duckduckgo,arxiv |
--max-sources | Maximum usable sources to collect |
--chunk-chars | Approximate chunk size before source evidence extraction |
--max-chunks-per-source | Number of chunks summarized per source |
--report-style | Final report depth: brief, standard, or deep |
--artifacts | Directory for JSONL audit records |
--output | Path for the final Markdown report |
Running the Agent
Run a quick research pass:brief for a concise source-backed briefing, standard for a fuller survey, and deep for the staged outline/section/editor workflow.
Save auditable artifacts:
source_notes.jsonl shows the summarized source evidence, research_gaps.jsonl shows why follow-up searches were generated, and errors.jsonl shows pages that failed during search, scraping, or summarization.
Privacy and Reliability Notes
A research agent touches several systems, so it helps to be precise about what goes where:| Layer | What sees the data |
|---|---|
| Local CLI | Topic, configuration, source notes, artifacts, and final reports stay on your machine |
| Search provider | Search queries are sent to the provider you choose, such as DuckDuckGo or arXiv |
| Venice scrape | Public source URLs are sent to Venice’s scrape endpoint |
| Venice chat completions | Prompts, source chunks, source notes, and report-generation instructions are sent to Venice |
| Output files | Markdown reports and JSONL artifacts are written locally |
POST /augment/search endpoint instead of querying DuckDuckGo directly. The reference implementation uses lightweight public providers so the demo stays easy to run and understand.
For reliability, keep these defaults conservative:
- Use retries for Venice calls and web requests.
- Add a small
--request-delayif you are reading many pages from the same host. - Cap
--max-sourcesso broad topics do not run indefinitely. - Save
--artifactsfor important reports so you can audit the final output. - Treat the report as a briefing, not ground truth. Follow citations back to the original source when accuracy matters.
Testing the Pieces
You do not need live web requests or Venice calls to test most of the system. The reference repo uses fake Venice and fake web classes to test the research loop, dedupe behavior, artifacts, and report prompts. A useful first test is URL canonicalization:Benchmarking
Many AI providers now have their own deep research workflows, so the reference repo includes a simple benchmark against Perplexity’s Deep Research tool. Both agents were asked to write a report on AI agent framework architecture, then the generated reports were checked into the GitHub repo. This is not meant to be a formal benchmark. It is a practical way to inspect report structure, source coverage, citation quality, and whether the agent over-focuses on one source cluster. That is also why the updated implementation tracksresearch_gaps.jsonl and source balance before follow-up searches.
Extending This Example
Once the baseline agent works, here are practical ways to improve it:- Add a Venice search provider using
POST /augment/search. - Store reports and artifacts in a small SQLite database instead of JSONL files.
- Add source allowlists or blocklists for trusted research domains.
- Add PDF support by combining Venice scrape with document parsing for sources that do not expose clean HTML.
- Add an evaluation set of topics and expected source types so you can compare research quality after prompt changes.
- Add a review step that asks Venice to find unsupported claims in the final report before saving it.