MCP: Connecting AI to Your Data Science Tools
Series: AI Agents for Data Scientists (Part 2 of 4)
In Part 1, we explored how Claude Code can help you write better code. But what if your AI assistant could go beyond code generation and actually interact with your data infrastructure—querying databases, calling ML APIs, or reading files directly?
Enter Model Context Protocol (MCP), a standardized way to connect AI assistants to your tools and data sources.
What is Model Context Protocol?
Model Context Protocol (MCP) is an open protocol that enables AI assistants to securely connect to external tools, resources, and data sources. Think of it as a universal adapter that lets your AI assistant interact with:
- Databases (PostgreSQL, MySQL, BigQuery)
- APIs (REST endpoints, ML model services)
- File systems (local files, cloud storage)
- Development tools (Git, Docker, CI/CD)
Instead of copying data or writing custom integration code, MCP provides a standardized interface for AI assistants to access these resources directly.
Architecture
graph LR
A[AI Assistant<br/>Claude Code] <-->|MCP Protocol| B[MCP Server]
B <-->|Tools| C[Database]
B <-->|Tools| D[ML API]
B <-->|Resources| E[File System]
B <-->|Resources| F[Cloud Storage]
style A fill:#e1f5ff
style B fill:#fff4e1
style C fill:#e8f5e9
style D fill:#e8f5e9
style E fill:#e8f5e9
style F fill:#e8f5e9
The MCP architecture consists of three key components:
- AI Assistant (e.g., Claude Code) - Makes requests through the protocol
- MCP Server - Translates protocol requests into actual tool/resource calls
- Tools & Resources - Your actual data sources and services
Why It Matters for Data Scientists
1. Database Connections
Query your databases directly from AI conversations:
You: "What's the average transaction value for customers in the last 30 days?"
AI: [Queries your PostgreSQL database via MCP]
"The average transaction value is $127.43, based on 15,234 transactions."
No more exporting CSVs or writing SQL manually—just ask questions in natural language.
2. API Integrations
Call your ML model APIs or data services:
You: "Run inference on the latest batch of user features using our churn model"
AI: [Calls your ML API via MCP]
"Processed 1,000 users. 23 flagged as high churn risk."
3. File System Access
Read data files, configuration files, or results directly:
You: "Analyze the feature importance from last week's model training"
AI: [Reads your model artifacts via MCP]
"Top 5 features: user_engagement_score (0.34), days_since_signup (0.21)..."
Core Concepts
| Concept | Description | Example |
|---|---|---|
| MCP Server | A service that exposes tools/resources via the protocol | PostgreSQL MCP server, GitHub MCP server |
| Tools | Actions the AI can perform (functions with side effects) | query_database(), run_inference(), create_file() |
| Resources | Read-only data sources the AI can access | Database schemas, file contents, API documentation |
Tools are for actions (queries, API calls, writes), while Resources are for reading data without modification.
Practical Example: Database Integration
Let’s say you want Claude Code to query your analytics database. Here’s how MCP makes it possible:
Setup
- Install an MCP Server (e.g.,
@modelcontextprotocol/server-postgres) - Configure connection in your IDE settings
- Start querying in natural language
Example Workflow
You: "Show me the top 10 products by revenue this month"
AI: [Uses MCP tool: query_database]
SELECT product_name, SUM(revenue) as total_revenue
FROM transactions
WHERE date >= '2025-01-01'
GROUP BY product_name
ORDER BY total_revenue DESC
LIMIT 10
Results:
1. Premium Subscription - $45,230
2. Enterprise License - $38,900
...
The AI can:
- Generate SQL queries based on your schema
- Execute them safely (with your approval)
- Format results in tables or visualizations
- Answer follow-up questions using the same data
For ML Pipelines
You: "Check the latest model performance metrics from MLflow"
AI: [Uses MCP tool: read_mlflow_metrics]
Latest run (2025-01-24):
- Accuracy: 0.89
- F1 Score: 0.85
- AUC: 0.92
Compared to baseline: +2.3% accuracy improvement
Getting Started
Step 1: Choose Your MCP Server
Popular options for data scientists:
- PostgreSQL/MySQL - Database querying
- GitHub - Code repository access
- File System - Local file reading
- Custom - Build your own for internal APIs
Step 2: Configure in Your IDE
Most MCP-enabled editors (like Cursor) support MCP servers through configuration files. Add your server:
{
"mcpServers": {
"postgres": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-postgres"],
"env": {
"POSTGRES_URL": "postgresql://user:pass@localhost/db"
}
}
}
}
Step 3: Start Using
Once configured, simply ask your AI assistant questions that require data access. The MCP server handles the connection automatically.
Security Considerations
MCP servers run with the permissions you grant them. Best practices:
- ✅ Use read-only database users when possible
- ✅ Limit file system access to specific directories
- ✅ Review AI-generated queries before execution
- ✅ Use environment variables for credentials (never hardcode)
What’s Next?
MCP unlocks powerful capabilities, but there’s more to explore. In Part 3, we’ll dive into building custom MCP servers for your internal ML infrastructure and APIs.
This is Part 2 of the “AI Agents for Data Scientists” series.