8  Coding Assistance: Benefits, Pitfalls, and Best Practices

Tentative time: 12 minutes

8.1 Learning Objectives

By the end of this section, you will be able to:

  • Understand the “jagged frontier” of AI coding capabilities and recognize where LLMs excel versus where they struggle
  • Apply best practices for AI-assisted coding whether you’re a beginner or experienced programmer
  • Avoid common pitfalls like “vibe coding” and over-reliance on AI-generated solutions
  • Use LLMs strategically to enhance your data analysis workflow and research reproducibility
  • Recognize when AI coding assistance can help bridge the gap to programmatic approaches for larger projects

8.2 Why This Matters for Your Research

Many researchers work with data analysis code—whether in R, Python, Stata, SPSS, or MATLAB. You might write code yourself, supervise research assistants who code, or wish you could move from Excel to more reproducible analytical workflows.

LLMs have become surprisingly capable coding assistants, but they come with significant caveats. When used well, they can dramatically reduce the time spent on routine coding tasks, help you learn new approaches, and make software development best practices more accessible. When used poorly, they can create more problems than they solve.

This chapter will help you navigate both the promise and the pitfalls of AI-assisted coding.

8.3 The Jagged Frontier: Where LLMs Excel and Struggle in Coding

Let’s apply our jagged frontier concept to coding tasks:

Inside the Frontier (LLMs excel):

  • Writing standard data manipulation and analysis code
  • Translating between programming languages (R to Python, etc.)
  • Explaining what existing code does, line by line
  • Generating unit tests and documentation
  • Creating first drafts of visualization code
  • Debugging common error messages
  • Implementing standard statistical methods
  • Writing repetitive or boilerplate code

Outside the Frontier (LLMs struggle):

  • Understanding your specific research context and data quirks
  • Making architectural decisions for complex projects
  • Optimizing code for performance at scale
  • Handling edge cases specific to your domain
  • Choosing the right statistical approach for your research question
  • Understanding implicit assumptions in your analytical workflow

The Jagged Edge (Be Extra Careful):

  • Complex, multi-step analyses where small errors compound
  • Domain-specific packages with limited training data
  • Code that handles sensitive or proprietary data
  • Statistical methods that require deep contextual understanding

8.4 The Promise: What LLMs Can Do for Research Coding

8.4.1 Dramatic Speed Improvements for Routine Tasks

LLMs can handle many coding tasks in seconds that might take hours to research and implement manually:

  • Data cleaning and transformation: “Convert this wide survey data to long format and handle missing values”
  • Statistical analysis: “Run a fixed-effects regression with clustered standard errors”
  • Visualization: “Create a publication-ready plot showing trends by region and year”
  • Code translation: “Convert this Stata code to R using tidyverse approaches”

8.4.2 Learning Acceleration

LLMs can serve as patient, knowledgeable tutors:

  • Explain unfamiliar code: Understand what a colleague’s analysis script actually does
  • Learn new languages: Bridge from Stata to R, or R to Python, with guided examples
  • Discover best practices: Learn about code style, testing, and documentation approaches
  • Understand error messages: Get help diagnosing and fixing coding problems

8.4.3 Enhanced Research Reproducibility

LLMs can help implement software development best practices that make research more robust:

  • Unit testing: Automatically generate tests to verify your code works as expected
  • Documentation: Create clear explanations of what your analysis code does
  • Code organization: Structure projects following established conventions
  • Version control: Learn to use Git for tracking changes in your analysis

8.5 The Perils: “Vibe Coding” and Common Pitfalls

8.5.1 The “Vibe Coding” Problem

What is “Vibe Coding”?

“Vibe coding” refers to the phenomenon where AI confidently generates code that looks correct and professional, giving users a false sense that everything is working properly. The code might run without errors but produce incorrect results, or contain subtle bugs that only become apparent later.

The code often appears sophisticated and well-structured, making it difficult to spot errors, especially for less experienced programmers.

Common manifestations:

  • Code that runs but produces incorrect statistical results
  • Functions that don’t exist (AI “hallucinations”)
  • Mixing syntax from different programming languages
  • Solutions that work for simple cases but fail with real data complexity

8.5.2 The 0-to-90% Problem

One of the most frustrating aspects of AI coding assistance is what I call the “0-to-90% problem.” LLMs can quickly generate code that appears to solve 90% of your problem, but then you spend hours debugging the remaining 10%.

Why this happens:

  • AI lacks context about your specific data quirks
  • Complex analyses have interdependencies AI doesn’t understand
  • Edge cases and real-world data messiness aren’t captured in training data
  • AI can’t validate its own outputs against your research objectives

8.5.3 Language-Specific Limitations

AI coding performance varies significantly by programming language, reflecting the training data available:

Strong Performance:

  • Python (extensive open-source examples)
  • R (large community, well-documented packages)
  • JavaScript (massive web presence)

Weaker Performance:

  • Stata (limited online documentation compared to open-source alternatives)
  • SPSS (proprietary, fewer public examples)
  • MATLAB (specialized, less public code)

For languages with limited training data, AI is more likely to hallucinate syntax or suggest outdated approaches.

However, there’s an important caveat: while LLMs excel at open-source languages, these languages also evolve rapidly. LLMs may have outdated code in their training data and might not know about recent changes in package functionality. Fortunately, many newer models now include internet search capabilities, allowing them to learn about recent updates that occurred after their training cutoff.

8.6 Best Practices for AI-Assisted Coding

8.6.1 Use the Most Advanced Models Available

Model capabilities are improving rapidly, and using the most advanced models makes a significant difference in code quality and reliability. This typically requires paid subscriptions ($20/month), but it’s a worthwhile investment. As one researcher put it: “Twenty dollars a month is peanuts for freeing you to walk your dog instead of being stuck in debugging hell.”

The free versions of AI models are often significantly less capable than their paid counterparts, leading to more errors and frustration.

8.6.2 For Beginners: From Excel to Code

If you’re new to coding or primarily use Excel for data analysis:

Start Small:

  • Use AI to help transition simple Excel tasks to code
  • Ask for heavily commented code that explains each step
  • Focus on reproducible analyses rather than one-off calculations

Example Prompt:

I currently analyze survey data in Excel but want to learn R. I have a dataset with household income by region and year. Can you show me how to:
1. Load the data from a CSV file
2. Calculate mean income by region
3. Create a simple bar chart
4. Export the results

Please include comments explaining what each line does, and use tidyverse style.

8.6.3 For Intermediate Users: Enhance Your Skills

If you already code but want to improve:

Iterative Development:

  • Break complex tasks into smaller steps
  • Test each component before combining
  • Use AI to explain unfamiliar code patterns

Quality Improvements:

  • Ask AI to review your code for potential improvements
  • Request help with testing and documentation
  • Learn about style guides and best practices

8.6.4 For Advanced Users: Accelerate Development

If you’re already comfortable coding:

Rapid Prototyping:

  • Use AI for boilerplate code and standard patterns
  • Get help with unfamiliar packages or methods
  • Explore alternative approaches to complex problems

Cross-Language Translation:

  • Convert analyses between R, Python, and Stata
  • Learn new programming paradigms
  • Adapt code from different domains

8.6.5 Effective Debugging with AI

When you encounter errors, AI can be incredibly helpful if you provide complete information:

What to include in your debugging request:

  • Full error message: Copy the entire error output, not just the summary
  • Complete code: Share the code that’s causing the problem
  • System information: In R, include sessionInfo() output; in Python, include package versions
  • Expected vs. actual behavior: Explain what you thought should happen
  • Screenshots: Sometimes visual context helps, especially with data display issues

Example debugging prompt:

I'm getting an error in R when trying to merge two datasets. Here's the error message:

[paste full error message]

Here's my code:
[paste complete code block]

Here's my sessionInfo():
[paste sessionInfo() output]

I expected this to create a merged dataset with 1000 rows, but instead I'm getting this error. What's going wrong?

This comprehensive approach helps AI understand your specific context and provide more accurate solutions.

8.7 Creating Your Coding Assistant

Here’s a refined version of a coding assistant prompt that works well for research:

# [Your Name] - Research Coding Assistant

## About Me
I'm a [your role] working on [your research area]. I primarily use [your main language] for data analysis and have [your experience level] programming experience.

## My Coding Context
- **Primary language**: [R/Python/Stata/etc.]
- **Typical tasks**: [Survey data analysis, econometric models, etc.]
- **Style preferences**: [Tidyverse, PEP 8, etc.]
- **Current project**: [Brief description]

## How to Help Me Code
1. **Follow style guides**: Use [tidyverse style guide/PEP 8/etc.]
2. **Break down complex tasks**: Divide large projects into manageable steps
3. **Explain your logic**: Help me understand the approach, not just the syntax
4. **Ask clarifying questions**: Don't assume what I want - ask for specifics
5. **Use best practices**: Include error handling, comments, and testing where appropriate
6. **Prefer functional approaches**: Use purrr/map functions over loops when possible

## What I Need Most Help With
- [Specific coding challenges you face]
- [Areas where you want to improve]
- [Types of analyses you do frequently]

When providing code, please:
- Include comments explaining key steps
- Use clear variable names
- Suggest alternative approaches when relevant
- Point out potential pitfalls or limitations

8.8 Hands-On Activity: AI-Assisted Data Analysis

Let’s practice using AI for a common research task:

Scenario: You have survey data with household income, education levels, and geographic regions. You want to analyze income inequality patterns.

Step 1: Plan Your Analysis Before asking AI for code, outline what you want to do: - Load and examine the data - Calculate summary statistics by region - Create visualizations - Run statistical tests

Step 2: Iterative Development Instead of asking for everything at once, try:

I have survey data with columns: household_id, income, education, region. 
First, help me write R code to:
1. Load the data and examine its structure
2. Check for missing values and outliers
3. Calculate basic summary statistics

Please use tidyverse style and include comments.

Step 3: Build Complexity Once the basic code works, add layers:

Now help me create a visualization showing income distribution by region, 
using ggplot2. I want to compare medians and show the spread of the data.

Step 4: Validation Always test AI-generated code: - Run it on a subset of your data first - Check that outputs make sense - Verify calculations manually for simple cases

8.9 Moving from Chat to IDE: Advanced Coding Tools

8.9.1 Specialized AI Coding Environments

While chatbots are great for learning and quick questions, specialized coding environments offer more sophisticated assistance:

Cursor - An AI-native code editor that can:

  • Edit code in place rather than generating separate blocks
  • Understand your entire project context
  • Run code and iterate based on results
  • Provide inline suggestions as you type

Other Tools:

  • GitHub Copilot - Autocomplete suggestions within your existing IDE
  • Claude Code - Anthropic’s command-line coding assistant
  • Windsurf - Another AI-powered development environment

8.9.2 Benefits of IDE-Based Assistance

Context Awareness:

  • Sees your entire project, not just individual prompts
  • Maintains consistency across files
  • Understands project structure and dependencies

Iterative Development:

  • Can run code and learn from errors
  • Makes targeted edits rather than rewriting everything
  • Preserves your existing code while making improvements

Professional Workflow:

  • Integrates with version control (Git)
  • Supports debugging and testing
  • Enables collaborative development

8.10 Bridge to Programmatic Approaches

Understanding AI coding assistance helps prepare you for the programmatic approaches we’ll discuss in our advanced section. When you’re ready to process thousands of documents or run large-scale analyses, you’ll use APIs rather than web interfaces.

The Connection:

  • Web coding practiceAPI development skills
  • Small-scale analysisLarge-scale automation
  • Interactive debuggingRobust, reproducible pipelines

If you become comfortable with AI-assisted coding through chat interfaces, you’ll be better prepared to work with research assistants or collaborators who can help you scale up to API-based approaches.

8.11 What Can Go Wrong (and How to Fix It)

8.11.1 Common Issues:

Over-reliance on AI output

  • Problem: Accepting code without understanding how it works
  • Solution: Always ask AI to explain the code and test it yourself

“Vibe coding” pitfalls

  • Problem: Code that looks professional but contains subtle errors
  • Solution: Validate outputs, especially statistical results

Getting stuck in debugging loops

  • Problem: AI fixes create new problems, leading to endless iterations
  • Solution: Start over with a simpler approach, or ask for human help

Language-specific limitations

  • Problem: Poor performance with specialized tools like Stata
  • Solution: Be extra careful with verification, consider alternative approaches

8.11.2 Validation Strategies:

  1. Test with known data: Use datasets where you know the expected results
  2. Manual verification: Check AI calculations against hand calculations
  3. Cross-platform validation: Compare results across different tools
  4. Peer review: Have colleagues examine AI-generated code
  5. Documentation: Keep detailed notes about what the code is supposed to do

8.12 What You Can Do Now

For Beginners:

  1. Try a simple data task - Use AI to help convert an Excel analysis to code
  2. Focus on understanding - Ask AI to explain every line of code it generates
  3. Start with small datasets - Practice with manageable examples before tackling large projects

For Intermediate Users:

  1. Create your coding assistant using the template above
  2. Experiment with code review - Ask AI to critique and improve your existing code
  3. Learn new approaches - Use AI to explore unfamiliar packages or methods

For Advanced Users:

  1. Try specialized tools - Experiment with Cursor or GitHub Copilot
  2. Focus on best practices - Use AI to improve code documentation and testing
  3. Explore cross-language translation - Convert analyses between R, Python, and Stata

8.13 The Bigger Picture

AI coding assistance is most valuable when it amplifies your existing skills rather than replacing your judgment. The goal isn’t to become an AI expert, but to use these tools strategically to:

  • Reduce time on routine tasks so you can focus on analysis and interpretation
  • Learn new approaches that enhance your research capabilities
  • Improve code quality through better documentation and testing
  • Enable more ambitious projects by making complex analyses more accessible

Remember: AI is a coding co-pilot, not an autopilot. It can help you fly faster and explore new territories, but you remain the pilot responsible for navigation and safe landing.

As you become more comfortable with AI-assisted coding, you’ll be better prepared to consider programmatic approaches for larger-scale research projects—the topic of our advanced section.


Up next: Case Study - Large-Scale Text Classification with APIs