8 Coding Assistance: Benefits, Pitfalls, and Best Practices

Tentative time: 12 minutes

8.1 Learning Objectives

By the end of this section, you will be able to:

Understand the “jagged frontier” of AI coding capabilities and recognize where LLMs excel versus where they struggle
Apply best practices for AI-assisted coding whether you’re a beginner or experienced programmer
Avoid common pitfalls like “vibe coding” and over-reliance on AI-generated solutions
Use LLMs strategically to enhance your data analysis workflow and research reproducibility
Recognize when AI coding assistance can help bridge the gap to programmatic approaches for larger projects

8.2 Why This Matters for Your Research

Many researchers work with data analysis code—whether in R, Python, Stata, SPSS, or MATLAB. You might write code yourself, supervise research assistants who code, or wish you could move from Excel to more reproducible analytical workflows.

LLMs have become surprisingly capable coding assistants, but they come with significant caveats. When used well, they can dramatically reduce the time spent on routine coding tasks, help you learn new approaches, and make software development best practices more accessible. When used poorly, they can create more problems than they solve.

This chapter will help you navigate both the promise and the pitfalls of AI-assisted coding.

8.3 The Jagged Frontier: Where LLMs Excel and Struggle in Coding

Let’s apply our jagged frontier concept to coding tasks:

Inside the Frontier (LLMs excel):

Writing standard data manipulation and analysis code
Translating between programming languages (R to Python, etc.)
Explaining what existing code does, line by line
Generating unit tests and documentation
Creating first drafts of visualization code
Debugging common error messages
Implementing standard statistical methods
Writing repetitive or boilerplate code

Outside the Frontier (LLMs struggle):

Understanding your specific research context and data quirks
Making architectural decisions for complex projects
Optimizing code for performance at scale
Handling edge cases specific to your domain
Choosing the right statistical approach for your research question
Understanding implicit assumptions in your analytical workflow

The Jagged Edge (Be Extra Careful):

Complex, multi-step analyses where small errors compound
Domain-specific packages with limited training data
Code that handles sensitive or proprietary data
Statistical methods that require deep contextual understanding

8.4 The Promise: What LLMs Can Do for Research Coding

8.4.1 Dramatic Speed Improvements for Routine Tasks

LLMs can handle many coding tasks in seconds that might take hours to research and implement manually:

Data cleaning and transformation: “Convert this wide survey data to long format and handle missing values”
Statistical analysis: “Run a fixed-effects regression with clustered standard errors”
Visualization: “Create a publication-ready plot showing trends by region and year”
Code translation: “Convert this Stata code to R using tidyverse approaches”

8.4.2 Learning Acceleration

LLMs can serve as patient, knowledgeable tutors:

Explain unfamiliar code: Understand what a colleague’s analysis script actually does
Learn new languages: Bridge from Stata to R, or R to Python, with guided examples
Discover best practices: Learn about code style, testing, and documentation approaches
Understand error messages: Get help diagnosing and fixing coding problems

8.4.3 Enhanced Research Reproducibility

LLMs can help implement software development best practices that make research more robust:

Unit testing: Automatically generate tests to verify your code works as expected
Documentation: Create clear explanations of what your analysis code does
Code organization: Structure projects following established conventions
Version control: Learn to use Git for tracking changes in your analysis

8.5 The Perils: “Vibe Coding” and Common Pitfalls

8.5.1 The “Vibe Coding” Problem

What is “Vibe Coding”?

“Vibe coding” refers to the phenomenon where AI confidently generates code that looks correct and professional, giving users a false sense that everything is working properly. The code might run without errors but produce incorrect results, or contain subtle bugs that only become apparent later.

The code often appears sophisticated and well-structured, making it difficult to spot errors, especially for less experienced programmers.

Common manifestations:

Code that runs but produces incorrect statistical results
Functions that don’t exist (AI “hallucinations”)
Mixing syntax from different programming languages
Solutions that work for simple cases but fail with real data complexity

8.5.2 The 0-to-90% Problem

One of the most frustrating aspects of AI coding assistance is what I call the “0-to-90% problem.” LLMs can quickly generate code that appears to solve 90% of your problem, but then you spend hours debugging the remaining 10%.

Why this happens:

AI lacks context about your specific data quirks
Complex analyses have interdependencies AI doesn’t understand
Edge cases and real-world data messiness aren’t captured in training data
AI can’t validate its own outputs against your research objectives

8.5.3 Language-Specific Limitations

AI coding performance varies significantly by programming language, reflecting the training data available:

Strong Performance:

Python (extensive open-source examples)
R (large community, well-documented packages)
JavaScript (massive web presence)

Weaker Performance:

Stata (limited online documentation compared to open-source alternatives)
SPSS (proprietary, fewer public examples)
MATLAB (specialized, less public code)

For languages with limited training data, AI is more likely to hallucinate syntax or suggest outdated approaches.

However, there’s an important caveat: while LLMs excel at open-source languages, these languages also evolve rapidly. LLMs may have outdated code in their training data and might not know about recent changes in package functionality. Fortunately, many newer models now include internet search capabilities, allowing them to learn about recent updates that occurred after their training cutoff.

8.6 Best Practices for AI-Assisted Coding

8.6.1 Use the Most Advanced Models Available

Model capabilities are improving rapidly, and using the most advanced models makes a significant difference in code quality and reliability. This typically requires paid subscriptions ($20/month), but it’s a worthwhile investment. As one researcher put it: “Twenty dollars a month is peanuts for freeing you to walk your dog instead of being stuck in debugging hell.”

The free versions of AI models are often significantly less capable than their paid counterparts, leading to more errors and frustration.

8.6.2 For Beginners: From Excel to Code

If you’re new to coding or primarily use Excel for data analysis:

Start Small:

Use AI to help transition simple Excel tasks to code
Ask for heavily commented code that explains each step
Focus on reproducible analyses rather than one-off calculations

Example Prompt:

I currently analyze survey data in Excel but want to learn R. I have a dataset with household income by region and year. Can you show me how to:
1. Load the data from a CSV file
2. Calculate mean income by region
3. Create a simple bar chart
4. Export the results

Please include comments explaining what each line does, and use tidyverse style.

8.6.3 For Intermediate Users: Enhance Your Skills

If you already code but want to improve:

Iterative Development:

Break complex tasks into smaller steps
Test each component before combining
Use AI to explain unfamiliar code patterns

Quality Improvements:

Ask AI to review your code for potential improvements
Request help with testing and documentation
Learn about style guides and best practices

8.6.4 For Advanced Users: Accelerate Development

If you’re already comfortable coding:

Rapid Prototyping:

Use AI for boilerplate code and standard patterns
Get help with unfamiliar packages or methods
Explore alternative approaches to complex problems

Cross-Language Translation:

Convert analyses between R, Python, and Stata
Learn new programming paradigms
Adapt code from different domains

8.6.5 Effective Debugging with AI

When you encounter errors, AI can be incredibly helpful if you provide complete information:

What to include in your debugging request:

Full error message: Copy the entire error output, not just the summary
Complete code: Share the code that’s causing the problem
System information: In R, include sessionInfo() output; in Python, include package versions
Expected vs. actual behavior: Explain what you thought should happen
Screenshots: Sometimes visual context helps, especially with data display issues

Example debugging prompt:

I'm getting an error in R when trying to merge two datasets. Here's the error message:

[paste full error message]

Here's my code:
[paste complete code block]

Here's my sessionInfo():
[paste sessionInfo() output]

I expected this to create a merged dataset with 1000 rows, but instead I'm getting this error. What's going wrong?

This comprehensive approach helps AI understand your specific context and provide more accurate solutions.

8.7 Creating Your Coding Assistant

Here’s a refined version of a coding assistant prompt that works well for research:

# [Your Name] - Research Coding Assistant

## About Me
I'm a [your role] working on [your research area]. I primarily use [your main language] for data analysis and have [your experience level] programming experience.

## My Coding Context
- **Primary language**: [R/Python/Stata/etc.]
- **Typical tasks**: [Survey data analysis, econometric models, etc.]
- **Style preferences**: [Tidyverse, PEP 8, etc.]
- **Current project**: [Brief description]

## How to Help Me Code
1. **Follow style guides**: Use [tidyverse style guide/PEP 8/etc.]
2. **Break down complex tasks**: Divide large projects into manageable steps
3. **Explain your logic**: Help me understand the approach, not just the syntax
4. **Ask clarifying questions**: Don't assume what I want - ask for specifics
5. **Use best practices**: Include error handling, comments, and testing where appropriate
6. **Prefer functional approaches**: Use purrr/map functions over loops when possible

## What I Need Most Help With
- [Specific coding challenges you face]
- [Areas where you want to improve]
- [Types of analyses you do frequently]

When providing code, please:
- Include comments explaining key steps
- Use clear variable names
- Suggest alternative approaches when relevant
- Point out potential pitfalls or limitations

8.8 Hands-On Activity: AI-Assisted Data Analysis

Let’s practice using AI for a common research task:

Scenario: You have survey data with household income, education levels, and geographic regions. You want to analyze income inequality patterns.

Step 1: Plan Your Analysis Before asking AI for code, outline what you want to do: - Load and examine the data - Calculate summary statistics by region - Create visualizations - Run statistical tests

Step 2: Iterative Development Instead of asking for everything at once, try:

I have survey data with columns: household_id, income, education, region. 
First, help me write R code to:
1. Load the data and examine its structure
2. Check for missing values and outliers
3. Calculate basic summary statistics

Please use tidyverse style and include comments.

Step 3: Build Complexity Once the basic code works, add layers:

Now help me create a visualization showing income distribution by region, 
using ggplot2. I want to compare medians and show the spread of the data.

Step 4: Validation Always test AI-generated code: - Run it on a subset of your data first - Check that outputs make sense - Verify calculations manually for simple cases

8.9 Moving from Chat to IDE: Advanced Coding Tools

8.9.1 Specialized AI Coding Environments

While chatbots are great for learning and quick questions, specialized coding environments offer more sophisticated assistance:

Cursor - An AI-native code editor that can:

Edit code in place rather than generating separate blocks
Understand your entire project context
Run code and iterate based on results
Provide inline suggestions as you type

Other Tools:

GitHub Copilot - Autocomplete suggestions within your existing IDE
Claude Code - Anthropic’s command-line coding assistant
Windsurf - Another AI-powered development environment

8.9.2 Benefits of IDE-Based Assistance

Context Awareness:

Sees your entire project, not just individual prompts
Maintains consistency across files
Understands project structure and dependencies

Iterative Development:

Can run code and learn from errors
Makes targeted edits rather than rewriting everything
Preserves your existing code while making improvements

Professional Workflow:

Integrates with version control (Git)
Supports debugging and testing
Enables collaborative development

8.10 Bridge to Programmatic Approaches

Understanding AI coding assistance helps prepare you for the programmatic approaches we’ll discuss in our advanced section. When you’re ready to process thousands of documents or run large-scale analyses, you’ll use APIs rather than web interfaces.

The Connection:

Web coding practice → API development skills
Small-scale analysis → Large-scale automation
Interactive debugging → Robust, reproducible pipelines

If you become comfortable with AI-assisted coding through chat interfaces, you’ll be better prepared to work with research assistants or collaborators who can help you scale up to API-based approaches.

8.11 What Can Go Wrong (and How to Fix It)

8.11.1 Common Issues:

Over-reliance on AI output

Problem: Accepting code without understanding how it works
Solution: Always ask AI to explain the code and test it yourself

“Vibe coding” pitfalls

Problem: Code that looks professional but contains subtle errors
Solution: Validate outputs, especially statistical results

Getting stuck in debugging loops

Problem: AI fixes create new problems, leading to endless iterations
Solution: Start over with a simpler approach, or ask for human help

Language-specific limitations

Problem: Poor performance with specialized tools like Stata
Solution: Be extra careful with verification, consider alternative approaches

8.11.2 Validation Strategies:

Test with known data: Use datasets where you know the expected results
Manual verification: Check AI calculations against hand calculations
Cross-platform validation: Compare results across different tools
Peer review: Have colleagues examine AI-generated code
Documentation: Keep detailed notes about what the code is supposed to do

8.12 What You Can Do Now

For Beginners:

Try a simple data task - Use AI to help convert an Excel analysis to code
Focus on understanding - Ask AI to explain every line of code it generates
Start with small datasets - Practice with manageable examples before tackling large projects

For Intermediate Users:

Create your coding assistant using the template above
Experiment with code review - Ask AI to critique and improve your existing code
Learn new approaches - Use AI to explore unfamiliar packages or methods

For Advanced Users:

Try specialized tools - Experiment with Cursor or GitHub Copilot
Focus on best practices - Use AI to improve code documentation and testing
Explore cross-language translation - Convert analyses between R, Python, and Stata

8.13 The Bigger Picture

AI coding assistance is most valuable when it amplifies your existing skills rather than replacing your judgment. The goal isn’t to become an AI expert, but to use these tools strategically to:

Reduce time on routine tasks so you can focus on analysis and interpretation
Learn new approaches that enhance your research capabilities
Improve code quality through better documentation and testing
Enable more ambitious projects by making complex analyses more accessible

Remember: AI is a coding co-pilot, not an autopilot. It can help you fly faster and explore new territories, but you remain the pilot responsible for navigation and safe landing.

As you become more comfortable with AI-assisted coding, you’ll be better prepared to consider programmatic approaches for larger-scale research projects—the topic of our advanced section.

Up next: Case Study - Large-Scale Text Classification with APIs