Dong-in Kim

Data Scientist. AI Orchestrator. Storyteller.

MCP: Connecting AI to Your Data Science Tools

Series: AI Agents for Data Scientists (Part 2 of 4)

In Part 1, we explored how Claude Code can help you write better code. But what if your AI assistant could go beyond code generation and actually interact with your data infrastructure—querying databases, calling ML APIs, or reading files directly?

Enter Model Context Protocol (MCP), a standardized way to connect AI assistants to your tools and data sources.

What is Model Context Protocol?

Model Context Protocol (MCP) is an open protocol that enables AI assistants to securely connect to external tools, resources, and data sources. Think of it as a universal adapter that lets your AI assistant interact with:

Databases (PostgreSQL, MySQL, BigQuery)
APIs (REST endpoints, ML model services)
File systems (local files, cloud storage)
Development tools (Git, Docker, CI/CD)

Instead of copying data or writing custom integration code, MCP provides a standardized interface for AI assistants to access these resources directly.

Architecture

graph LR
    A[AI Assistant<br/>Claude Code] <-->|MCP Protocol| B[MCP Server]
    B <-->|Tools| C[Database]
    B <-->|Tools| D[ML API]
    B <-->|Resources| E[File System]
    B <-->|Resources| F[Cloud Storage]
    
    style A fill:#e1f5ff
    style B fill:#fff4e1
    style C fill:#e8f5e9
    style D fill:#e8f5e9
    style E fill:#e8f5e9
    style F fill:#e8f5e9

The MCP architecture consists of three key components:

AI Assistant (e.g., Claude Code) - Makes requests through the protocol
MCP Server - Translates protocol requests into actual tool/resource calls
Tools & Resources - Your actual data sources and services

Why It Matters for Data Scientists

1. Database Connections

Query your databases directly from AI conversations:

You: "What's the average transaction value for customers in the last 30 days?"

AI: [Queries your PostgreSQL database via MCP]
     "The average transaction value is $127.43, based on 15,234 transactions."

No more exporting CSVs or writing SQL manually—just ask questions in natural language.

2. API Integrations

Call your ML model APIs or data services:

You: "Run inference on the latest batch of user features using our churn model"

AI: [Calls your ML API via MCP]
     "Processed 1,000 users. 23 flagged as high churn risk."

3. File System Access

Read data files, configuration files, or results directly:

You: "Analyze the feature importance from last week's model training"

AI: [Reads your model artifacts via MCP]
     "Top 5 features: user_engagement_score (0.34), days_since_signup (0.21)..."

Core Concepts

Concept	Description	Example
MCP Server	A service that exposes tools/resources via the protocol	PostgreSQL MCP server, GitHub MCP server
Tools	Actions the AI can perform (functions with side effects)	`query_database()`, `run_inference()`, `create_file()`
Resources	Read-only data sources the AI can access	Database schemas, file contents, API documentation

Tools are for actions (queries, API calls, writes), while Resources are for reading data without modification.

Practical Example: Database Integration

Let’s say you want Claude Code to query your analytics database. Here’s how MCP makes it possible:

Setup

Install an MCP Server (e.g., @modelcontextprotocol/server-postgres)
Configure connection in your IDE settings
Start querying in natural language

Example Workflow

You: "Show me the top 10 products by revenue this month"

AI: [Uses MCP tool: query_database]
     SELECT product_name, SUM(revenue) as total_revenue
     FROM transactions
     WHERE date >= '2025-01-01'
     GROUP BY product_name
     ORDER BY total_revenue DESC
     LIMIT 10
     
     Results:
     1. Premium Subscription - $45,230
     2. Enterprise License - $38,900
     ...

The AI can:

Generate SQL queries based on your schema
Execute them safely (with your approval)
Format results in tables or visualizations
Answer follow-up questions using the same data

For ML Pipelines

You: "Check the latest model performance metrics from MLflow"

AI: [Uses MCP tool: read_mlflow_metrics]
     Latest run (2025-01-24):
     - Accuracy: 0.89
     - F1 Score: 0.85
     - AUC: 0.92
     
     Compared to baseline: +2.3% accuracy improvement

Getting Started

Step 1: Choose Your MCP Server

Popular options for data scientists:

PostgreSQL/MySQL - Database querying
GitHub - Code repository access
File System - Local file reading
Custom - Build your own for internal APIs

Step 2: Configure in Your IDE

Most MCP-enabled editors (like Cursor) support MCP servers through configuration files. Add your server:

{
  "mcpServers": {
    "postgres": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-postgres"],
      "env": {
        "POSTGRES_URL": "postgresql://user:pass@localhost/db"
      }
    }
  }
}

Step 3: Start Using

Once configured, simply ask your AI assistant questions that require data access. The MCP server handles the connection automatically.

Security Considerations

MCP servers run with the permissions you grant them. Best practices:

✅ Use read-only database users when possible
✅ Limit file system access to specific directories
✅ Review AI-generated queries before execution
✅ Use environment variables for credentials (never hardcode)

What’s Next?

MCP unlocks powerful capabilities, but there’s more to explore. In Part 3, we’ll dive into building custom MCP servers for your internal ML infrastructure and APIs.

This is Part 2 of the “AI Agents for Data Scientists” series.

Back to Blog