3 Key Considerations: Tools, Costs, and Contexts

3.1 Learning Objectives

By the end of this section, you will be able to:

Distinguish between web interfaces and API approaches and understand when each is appropriate
Compare open-source versus frontier model options and their trade-offs for academic research
Evaluate the three major frontier model providers (OpenAI, Anthropic, Google) for your needs
Understand key technical concepts like context windows and their practical implications
Make informed decisions about tool selection based on your research requirements and technical comfort level

3.2 Why This Matters for Your Research

Before diving into specific tools, you need to understand the landscape of options available to you. Making the right choice about which tools to use can mean the difference between a frustrating experience that wastes your time and a transformative workflow that enhances your research capacity. This chapter will help you navigate the key decisions and understand why we’re focusing on Google Gemini for this workshop.

3.3 Two Ways to Use LLMs: Web Interfaces vs. APIs

The first major decision is how you want to interact with LLMs. There are two primary approaches:

3.3.1 Web Interfaces (What We’ll Focus On)

What it is: Using LLMs through a browser interface like ChatGPT, Claude, or Gemini. You type questions, upload documents, and get responses in real-time.

Benefits:

No coding required
Immediate access
Perfect for exploratory research
Good for one-off tasks
Built-in features like document upload and citation

Limitations:

Manual process for each query
Time-consuming for repetitive tasks
Harder to maintain consistency across large projects
Limited ability to process hundreds of documents systematically

3.3.2 APIs (Application Programming Interfaces)

What it is: Using code to send requests to LLM services programmatically. Instead of typing in a web interface, you write scripts that automatically send queries and process responses.

Benefits:

Can process thousands of documents automatically
Consistent methodology across large datasets
Reproducible workflows
Cost-effective for large-scale projects
Can integrate with existing data analysis pipelines

Limitations:

Requires coding skills (Python, R, etc.)
More complex setup and debugging
Need to handle rate limits and error management
Steeper learning curve

3.3.3 Our Workshop Focus

Because this workshop assumes little previous LLM experience and no coding background, we’ll focus primarily on web interfaces—tools you can start using immediately. However, in our final section, we’ll discuss how we used APIs to classify 18,000 Chinese lending projects, showing you what becomes possible when you’re ready to scale up.

3.4 Open-Source vs. Frontier Models

3.4.1 Open-Source Models

What they are: AI models whose code and weights are publicly available. Examples include Meta’s Llama, Mistral, and various models from Hugging Face.

Benefits:

Privacy: You can run them on your own servers
Reproducibility: Exact model versions remain available
Cost: Can be free if you have computing resources
Customization: Can fine-tune for specific tasks

Limitations:

Capability gap: Generally less capable than frontier models
Technical complexity: Require significant technical skills to deploy
Infrastructure costs: Need expensive cloud computing for larger models
Inconsistent quality: Wide variation in performance

Our Experience with Open-Source Models

In our Chinese lending classification project, we tested Meta’s Llama 3.3 alongside frontier models. It was really bad. While open-source models are improving rapidly, they’re not yet competitive with frontier models for complex research tasks.

3.4.2 Frontier Models

What they are: The most advanced models from major AI companies: OpenAI (ChatGPT), Anthropic (Claude), and Google (Gemini).

Benefits:

Superior performance: Best available capabilities for most tasks
Ease of use: Polished interfaces and user experience
Regular updates: Continuous improvements and new features
Reliability: More consistent and predictable outputs

Limitations:

Cost: Subscription fees for full access
Privacy concerns: Your data goes to third-party companies
Less control: Can’t customize or guarantee model availability
Black box: Don’t know exactly how they work

For most academic researchers starting with LLMs, frontier models are the better choice. They’re simply more capable and easier to use, allowing you to focus on your research rather than wrestling with technical infrastructure.

3.5 The Three Frontier Model Providers

All three major providers offer both free and paid tiers. I strongly recommend paying for at least one service—paid tiers provide better data privacy, higher usage limits, and faster access to new models.

3.5.1 OpenAI (ChatGPT)

Strengths: Deep Research tool, strong reasoning models (o3 Pro)
Best for: Complex problem-solving, comprehensive research synthesis

3.5.2 Anthropic (Claude)

Strengths: Excellent for coding and writing tasks
Best for: R/Python programming assistance, high-quality text generation

3.5.3 Google (Gemini)

Strengths: Largest context window, good citations, NotebookLM integration
Best for: Working with large documents, academic research workflows

3.6 Why We’re Focusing on Google Gemini

While all three providers have their strengths, Google Gemini offers several advantages particularly relevant for academic research:

3.6.1 1. Massive Context Window

What is a Context Window?

A context window is how much text an AI can “remember” and work with at one time. Think of it like the AI’s working memory. Current context windows:

Gemini 2.5 Pro: 1 million tokens (roughly 750,000 words)
OpenAI GPT-4: ~200,000 tokens (roughly 150,000 words)
Anthropic Claude: ~200,000 tokens (roughly 150,000 words)

In practical terms: Gemini can process about 10-15 typical academic papers simultaneously, while other models can handle 2-3 papers. This is transformative for literature reviews and cross-document analysis.

This enormous context window means you can:

Upload multiple research papers simultaneously
Work with entire book chapters or reports
Maintain context across long conversations
Analyze patterns across large document collections

3.6.2 2. Built-in Citation Features

When you upload documents to Gemini, it automatically cites the specific portions where it finds information. This is invaluable for academic workflows where you need to trace claims back to source materials.

3.6.3 3. NotebookLM Integration

NotebookLM allows you to upload up to 300 documents and ask questions across the entire corpus. It provides exact text passages from your PDFs, making it excellent for exploratory analysis. In our ODI research, we used NotebookLM to analyze a decade of annual reports from Chinese policy banks—something that would have taken weeks manually.

3.6.4 4. Strong Performance on Benchmarks

Understanding LLM Benchmarks

LLM benchmarks are standardized tests that measure model performance across different tasks. Popular benchmarks include:

MMLU: Measures knowledge across academic subjects
HumanEval: Tests coding capabilities
HellaSwag: Evaluates common-sense reasoning

You can track current performance at Vellum’s LLM Leaderboard.

Important caveats:

Benchmarks don’t always capture what’s useful for your specific research
Goodhart’s Law applies: “When a measure becomes a target, it ceases to be a good measure.” Companies now optimize specifically for benchmarks, which may not reflect real-world performance.

Gemini 2.5 Pro performs competitively on major benchmarks, though remember that benchmark performance doesn’t always translate to usefulness for your specific research needs.

3.7 The Reality of Provider Competition

Despite our focus on Gemini for this workshop, I personally pay for premium access to all three major providers. Here’s why:

Models update frequently: What’s best today may not be best next month. The competitive landscape changes rapidly.

Each has unique strengths:

I use Claude most often for coding (R and Python) and high-quality writing
I use ChatGPT’s Deep Research for doing lengthy, high quality exploratory research
I use Gemini for working with large document collections

This will all be outdated soon: The specific model capabilities I’m describing will likely be different by the time you read this. The field moves that fast.

3.8 Key Technical Concepts

3.8.1 Context Window (Revisited)

Think of context window as the AI’s “working memory.” Larger windows allow for:

More complex conversations
Better understanding of document relationships
Ability to maintain consistency across longer projects

3.8.2 Tokens

A rough conversion: 1 token ≈ 0.75 words in English. So 1 million tokens ≈ 750,000 words ≈ 1,500 pages of double-spaced text.

3.8.3 Model Versions

Providers regularly release new model versions. Pay attention to:

Performance improvements: Better accuracy, reasoning, or specialized capabilities
Cost changes: New models may be more or less expensive
Feature additions: New capabilities like image analysis or coding tools

3.9 Making Your Choice

For this workshop, we’ll use Google Gemini because:

It’s excellent for document-heavy academic work
The citation features support good research practices
The large context window enables ambitious projects
NotebookLM provides unique research capabilities

However, I encourage you to experiment with all three providers. They each have strengths, and the best choice depends on your specific research needs, technical comfort level, and budget.

3.10 Cost Considerations

3.10.1 Free Tiers

All providers offer free access with limitations:

Usage caps (messages per day/hour)
Access to older or less capable models
Fewer features

3.10.2 Paid Tiers ($15-30/month typically)

Higher usage limits
Access to latest models
Better data privacy protections
Priority access during high-demand periods

3.10.3 API Pricing

For programmatic use, you pay per token processed. Costs vary by model and provider, but typically range from $0.25-15 per million tokens.

3.11 Getting Started

For this workshop, you’ll need a free Google account and access to Gemini. We’ll walk through the setup process and begin exploring how these tools can enhance your research workflow.

Remember: the goal isn’t to become an expert in any particular tool, but to understand how to evaluate and use these capabilities effectively for your research. The specific tools will continue evolving, but the principles we’re learning will remain relevant.

In our next section, we’ll move from theory to practice with hands-on prompt engineering—the skill that transforms mediocre AI outputs into genuinely useful research assistance.