Building an AI News Researcher Using Pydantic Agents

Dwain Barnes
Dec 6, 2024
5 min read

Updated: Dec 7, 2024

If you’ve ever tried to keep up with all the new developments in AI security, you know it can feel like drinking from a firehose. Every week brings fresh announcements, research breakthroughs, and policy updates. But what if you could build your own AI-powered assistant to scan the web and find the latest stories for you?

In this tutorial, I’ll show you how to create a simple “AI News Researcher” an intelligent tool that uses Python, pydantic_ai agents, tavily for web searches, and OpenAI’s language models. We’ll combine these components into a Streamlit app with a friendly user interface, so you can just type a query and get quick insights.

By the end of this, you’ll have a working mini-app you can tweak for your own interests, whether that’s staying current on AI security trends, following healthcare AI breakthroughs, or monitoring climate tech updates.

Why Pydantic Agents?

Pydantic has become a go-to library for data validation and type enforcement in Python. It helps ensure that the data we feed into (and get out of) our models is clean and consistent. The pydantic_ai library builds on that idea, letting you define the structure of inputs and outputs for AI-driven tasks. Instead of just hoping the AI returns a nice, structured answer, we can make sure it does by specifying exact fields and data types.

In short: The AI will return well-formed responses (like article titles, main content, and bullet-point summaries) every time no guesswork required.

Tools and Technologies

pydantic_ai: Helps us define “agents” that produce predictable responses.
tavily-python: A client for the Tavily search API, which will help our agent find fresh content.
openai: Provides access to powerful language models like GPT series.
nest_asyncio: A small utility that allows us to run asynchronous code inside the Streamlit app seamlessly.
streamlit: Our UI layer. With just a few lines, we’ll get a polished web interface to interact with our agent.

Prerequisites

Python 3.11+We’re using Python 3.11 in the code snippet, so let’s stick to that for compatibility.
API Keys:
- OpenAI API key: OpenAI platform
- Tavily API key: Sign up or refer to Tavily’s docs for one.
Conda or Virtualenv (optional):For cleanliness, I recommend using a virtual environment.

Let’s Get Started
Step 1: Create and Activate a Virtual Environment
If you’re using Conda, you can do:

conda create -n myresearchenv python=3.11 -y
conda activate myresearchenv

If you prefer venv, that works too. The key point is to isolate this project in its own environment.

Step 2: Install Dependencies

Run:

pip install pydantic-ai tavily-python openai nest_asyncio devtools httpx streamlit

This will grab all the necessary packages in one go.

Step 3: Set Your API Keys

Replace your_openai_key_here and your_tavily_key_here with the actual keys:

set OPENAI_API_KEY=your_openai_key_here
set TAVILY_API_KEY=your_tavily_key_here

(On Windows, set works; on macOS/Linux, use export.)

Step 4: Review the Code

Below is the entire code snippet. I’ll place it in a file called agent_research.py:

import os
import asyncio
import datetime
from typing import Any
from dataclasses import dataclass

import nest_asyncio
nest_asyncio.apply()

import streamlit as st
from pydantic_ai import Agent, RunContext
from pydantic import BaseModel, Field
from tavily import AsyncTavilyClient
from devtools import debug

OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY", None)
TAVILY_API_KEY = os.environ.get("TAVILY_API_KEY", None)

if not OPENAI_API_KEY:
    raise ValueError("Please set OPENAI_API_KEY environment variable.")
if not TAVILY_API_KEY:
    raise ValueError("Please set TAVILY_API_KEY environment variable.")

tavily_client = AsyncTavilyClient(api_key=TAVILY_API_KEY)

@dataclass
class SearchDataclass:
    max_results: int
    todays_date: str

@dataclass
class ResearchDependencies:
    todays_date: str

class ResearchResult(BaseModel):
    research_title: str = Field(description='Markdown heading describing the article topic, prefixed with #')
    research_main: str = Field(description='A main section that provides a detailed news article')
    research_bullets: str = Field(description='A set of bullet points summarizing key points')

search_agent = Agent(
    'openai:gpt-4o-mini',
    deps_type=ResearchDependencies,
    result_type=ResearchResult,
    system_prompt='You are a helpful research assistant, you are an expert in research.'
                  'When given a query, you will identify strong keywords to do 3-5 searches using the provided search tool.'
                  'Then combine results into a detailed response.'
)

@search_agent.system_prompt
async def add_current_date(ctx: RunContext[ResearchDependencies]) -> str:
    todays_date = ctx.deps.todays_date
    system_prompt = (
        f"You're a helpful research assistant and an expert in research. "
        f"When given a question, write strong keywords to do 3-5 searches in total "
        f"(each with a query_number) and then combine the results. "
        f"If you need today's date it is {todays_date}. "
        f"Focus on providing accurate and current information."
    )
    return system_prompt

@search_agent.tool
async def get_search(search_data: RunContext[SearchDataclass], query: str, query_number: int) -> dict[str, Any]:
    """Perform a search using the Tavily client."""
    max_results = search_data.deps.max_results
    results = await tavily_client.get_search_context(query=query, max_results=max_results)
    return results

async def do_search(query: str, max_results: int):
    current_date = datetime.date.today()
    date_string = current_date.strftime("%Y-%m-%d")
    deps = SearchDataclass(max_results=max_results, todays_date=date_string)
    result = await search_agent.run(query, deps=deps)
    return result.data

st.set_page_config(page_title="AI News Researcher", layout="centered")

st.title("AI Security News Researcher")
st.write("Stay updated on the latest trends and developments in AI Security.")
st.sidebar.title("Search Parameters")
query = st.sidebar.text_input("Enter your query:", value="latest AI security news")
max_results = st.sidebar.slider("Number of search results:", min_value=3, max_value=10, value=5)
st.write("Use the sidebar to adjust search parameters.")

if st.button("Get Latest AI Security News"):
    with st.spinner("Researching, please wait..."):
        result_data = asyncio.run(do_search(query, max_results))

    st.markdown(result_data.research_title)
    st.markdown(f"<div style='line-height:1.6;'>{result_data.research_main}</div>", unsafe_allow_html=True)
    st.markdown("### Key Takeaways")
    st.markdown(result_data.research_bullets)

Here’s what’s happening in the code:

We define SearchDataclass and ResearchResult to structure the data going in and coming out.
We create an Agent that uses OpenAI’s model and pydantic_ai to ensure structured responses.
The agent can call the get_search tool to query Tavily’s search API and incorporate real-time results.
do_search runs the entire agent operation, returning the final ResearchResult.
The Streamlit UI is straightforward: a sidebar for input, a button to kick off the research, and a display area for results.

Step 5: Run the App

Finally, let’s run our Streamlit app:

streamlit run agent_research.py

Open the provided URL in your browser (often http://localhost:8501) and you’ll see your “AI News Researcher” interface. Give it a try: enter a query like “Find the latest AI security news” and hit the button.

You’ll see a loading spinner as the AI goes off, does some queries, gathers results, and presents them. Moments later, you get a nicely formatted article plus bullet points summarising the key takeaways.

Wrapping Up

You’ve just built a simple but powerful AI tool. It’s modular and adaptable, so you can tweak the prompt to fit different research domains, experiment with different search APIs, or refine the UI design. With pydantic_ai, you’ve got a strong guardrail ensuring that your AI always returns nicely formatted, predictable output.

As AI continues to evolve, tools like this can help us filter noise and focus on what matters most. Instead of scrolling through countless pages of search results or sifting through social media chatter, just ask your personal research assistant for a summary—and enjoy the curated, current insights it provides.

Check out the code on my GitHub.

Happy researching!