Compare commits

..

57 Commits

Author SHA1 Message Date
kevinwatt
0e5a30d10c fix: prevent server hang and output corruption in spawn handling (#23)
- Add process 'error' event handler to catch spawn failures (e.g., yt-dlp not installed)
- Separate stdout/stderr to prevent yt-dlp warnings from corrupting parsed output
- Add try-catch for RegExp construction from YTDLP_SANITIZE_ILLEGAL_CHARS env var
- Add NaN validation for YTDLP_MAX_FILENAME_LENGTH env var
- Sync VERSION constant with package.json (0.8.4)
- Update tests for new output format and null handling
- Add version sync guidance to CLAUDE.md
2026-01-05 01:47:59 +08:00
kevinwatt
47da207c57 docs(cookies): add deno requirement warning for cookie authentication
When using cookie authentication, YouTube uses authenticated API endpoints
that require JavaScript challenge solving. Without deno installed, downloads
will fail with "n challenge solving failed" error.
2025-12-25 04:31:33 +08:00
kevinwatt
910133f382 chore(release): v0.8.3 2025-12-25 04:10:12 +08:00
kevinwatt
7a8fb5f6ff chore(release): v0.8.2 2025-12-25 03:49:13 +08:00
kevinwatt
5e098dfaf1 docs(readme): add uploadDateFilter parameter to search tool 2025-12-25 03:41:14 +08:00
kevinwatt
af0137b2bd feat(search): add uploadDateFilter parameter for date-based filtering
Add optional uploadDateFilter parameter to ytdlp_search_videos tool
that allows filtering search results by upload date using YouTube's
native date filtering (sp parameter).

Options: hour, today, week, month, year
Default: no filter (all dates)

Closes #21
2025-12-25 03:41:14 +08:00
kevinwatt
020c57d40c fix(validation): check validateUrl return value before proceeding
Previously validateUrl() was called but its boolean return value was
ignored in audio.ts, metadata.ts, and video.ts. Invalid URLs would
pass through and only fail later during yt-dlp execution.

Now properly throw 'Invalid or unsupported URL format' error early.
2025-12-25 03:41:14 +08:00
Peter Keffer
bbda4d2857 fix(tests): improve comments test stability and CI compatibility
- Use delete for Python env vars instead of empty string assignment
- Make integration tests opt-in via RUN_INTEGRATION_TESTS=1 env var
- Fix regex null coalescing for author matching in tests
2025-12-25 03:41:14 +08:00
Peter Keffer
2e2888cccc feat(comments): add YouTube video comments extraction tools
Add two new MCP tools for extracting video comments:
- ytdlp_get_video_comments: Extract comments as structured JSON with
  author info, likes, timestamps, and reply threading
- ytdlp_get_video_comments_summary: Get human-readable summary of top comments

Features:
- Support for sorting by "top" (most liked) or "new" (newest first)
- Configurable comment limit (1-100 comments)
- Includes author verification status, pinned comments, and uploader replies
- Comprehensive error handling for disabled comments, private videos, etc.
- Comprehensive test suite
2025-12-25 03:41:14 +08:00
kevinwatt
d7f5ec0f62 refactor: move integration tests to tests/ directory
- Create tests/ directory for integration/manual test scripts
- Move test-mcp.mjs, test-real-video.mjs, test-bilibili.mjs to tests/
- Keep unit tests in src/__tests__/ (Jest convention)
- Update CHANGELOG.md
2025-12-16 04:36:55 +08:00
kevinwatt
6fa13c1204 chore: add Claude Code settings to gitignore and update guidelines
- Add .claude/ directory and CLAUDE.md to .gitignore
- Add development guideline to always update CHANGELOG.md
- Update CHANGELOG.md with unreleased changes
2025-12-16 04:33:53 +08:00
kevinwatt
748255fe01 docs(readme): add MCP client configuration section
Add comprehensive setup instructions for multiple MCP clients:
- Dive (featured at top)
- Claude Code, Claude Desktop
- Cursor, VS Code/Copilot, Windsurf
- Cline, Warp, JetBrains AI Assistant

Use @latest tag for consistent auto-updates.
2025-12-16 03:29:29 +08:00
kevinwatt
11464c6c24 fix(schema): use z.coerce.number() for MCP string serialization
MCP protocol serializes numeric parameters as strings during transport.
Using z.coerce.number() instead of z.number() to handle this gracefully.

Fixes #20
2025-12-08 04:32:35 +08:00
kevinwatt
87ba2f8494 feat(cookies): add cookie support for authenticated access
- Add YTDLP_COOKIES_FILE and YTDLP_COOKIES_FROM_BROWSER env vars
- Support all yt-dlp cookie methods (file, browser extraction)
- Validate browser names (brave, chrome, chromium, edge, firefox, opera, safari, vivaldi, whale)
- Cookie file takes precedence over browser extraction
- Add getCookieArgs() helper function
- Integrate cookie args into all modules (video, audio, subtitle, search, metadata)
- Add comprehensive cookie documentation (docs/cookies.md)
- Add 12 unit tests for cookie configuration
- Fix search.test.ts function signature issue

Closes #19
2025-12-06 18:42:25 +08:00
kevinwatt
26b2137751 chore: release v0.7.0 - MCP Best Practices & Quality Improvements
Major release with comprehensive MCP best practices implementation:

 Added:
- Tool name prefixes (ytdlp_) for all 8 tools to avoid naming conflicts
- Zod schema validation with runtime input validation
- Tool annotations (readOnlyHint, destructiveHint, idempotentHint, openWorldHint)
- Response format options (JSON/Markdown) for search tools
- Pagination support with offset parameter
- Character limits (25K standard, 50K for transcripts) with smart truncation
- Actionable error messages with platform-specific guidance

🔧 Improved:
- Comprehensive tool descriptions with usage examples
- Enhanced configuration system with limits
- Better TypeScript type safety
- Professional README with badges and tables

🐛 Fixed:
- JSON parsing issue in metadata truncation
- Maintained valid JSON structure when truncated

🧪 Tested:
-  YouTube platform (Rick Astley video)
-  Bilibili platform (Chinese content)
-  Multi-language support verified
-  All 8 tools tested with real API calls

📖 Documentation:
- Created comprehensive CHANGELOG.md
- Redesigned README.md with professional formatting
- Added migration guide for v0.6.x users

🌍 Platform Support:
- Verified: YouTube, Bilibili
- Theory: 1000+ platforms via yt-dlp
2025-10-19 01:52:22 +08:00
kevinwatt
c5e84c326e chore: release v0.6.28 2025-08-13 15:49:49 +08:00
kevinwatt
b19dbb67a5 feat(metadata): add get_video_metadata & get_video_metadata_summary; docs(api); tests(metadata) 2025-08-13 15:49:43 +08:00
kevinwatt
9d14f6bc01 remove unused test-utils.ts 2025-08-03 00:47:53 +08:00
kevinwatt
fa879ab9ab fix: update prepare script to skip lib check
- Replace shx with standard chmod command
- Add --skipLibCheck flag to resolve type dependencies issue
- Ensure smooth npm publish process
2025-07-28 04:46:20 +08:00
kevinwatt
5aecaa3b20 feat: add video search functionality
- Add new search_videos tool for YouTube video search
- Support configurable search result count (1-50)
- Return formatted results with title, channel, duration, and URL
- Add comprehensive test coverage with real yt-dlp integration
- Update documentation with search examples
- Fix dependency security vulnerabilities
- Bump version to 0.6.27

Resolves: kevinwatt/yt-dlp-mcp#14
2025-07-28 04:45:37 +08:00
Kevin Watt
9ba39128aa
Merge pull request #15 from seszele64/implement-trimmed-download
Implement trimmed download

Thanks for the great work on this PR! 🙌

The trimmed download feature looks solid - good implementation with proper tests and documentation. This will be really useful for users who need to download video segments.

Appreciate the contribution!

LGTM 👍
2025-07-28 04:21:22 +08:00
seszele64
353bc8fd22 feat(api): add start and end time docs and examples 2025-07-22 19:14:18 +02:00
seszele64
53437dc472 feat(readme): add start and end time params for trimming 2025-07-22 19:13:25 +02:00
seszele64
cc2b9ec8b6 feat(video): add start and end time params for trimming 2025-07-22 19:13:13 +02:00
seszele64
7278b672f4 test(video): add tests for video download trimming 2025-07-22 19:12:49 +02:00
seszele64
83a2eb9bb8 feat(video): add support for trimming video downloads 2025-07-22 19:12:38 +02:00
Kevin Watt
bbc0e6aa93
Merge pull request #11 from hesreallyhim/fix/add-ignore-config
fix: add `--ignore-config` flag

Seems good, Thanks.
2025-07-16 00:29:56 +08:00
Really Him
8cf7b3f5dc fix: fix contributing doc 2025-06-17 21:08:38 -04:00
Really Him
01709a778b fix: add ignore-config flag 2025-06-17 21:06:25 -04:00
Kevin Watt
da7e4666ed
Merge pull request #9 from kevinwatt/revert-8-revert-7-feature/transcript-download
Revert "Revert "feat: add transcript download functionality""
2025-05-30 12:03:50 +08:00
Kevin Watt
f27d22eb81
Revert "Revert "feat: add transcript download functionality"" 2025-05-30 12:03:04 +08:00
Kevin Watt
0de9308a41
Merge pull request #8 from kevinwatt/revert-7-feature/transcript-download
Revert "feat: add transcript download functionality"
2025-05-30 11:59:39 +08:00
Kevin Watt
c79766c241
Revert "feat: add transcript download functionality" 2025-05-30 11:57:52 +08:00
Kevin Watt
4171abc6d0
Merge pull request #7 from msuch/feature/transcript-download
Looks great to me. Thanks for the commit.

feat: add transcript download functionality
2025-05-30 11:56:47 +08:00
m
7900a9b4e1 feat: add transcript download functionality
- Add cleanSubtitleToTranscript utility to strip SRT formatting, timestamps, and HTML tags
- Implement downloadTranscript function using yt-dlp with subtitle cleaning
- Add download_transcript MCP tool with language support (defaults to English)
- Include comprehensive tests for both utility and download functionality
- Update README documentation with tool description and usage examples

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-05-27 12:00:31 +02:00
kevinwatt
944b0211c6 feat: add random filename fallback when filename cannot be retrieved - Add generateRandomFilename utility function - Modify downloadVideo to use random filename when yt-dlp fails to get filename - Update version to 0.6.26 2025-02-23 05:53:58 +08:00
kevinwatt
c39fd8785c update README.md 2025-02-22 03:24:44 +08:00
kevinwatt
e9a0e55762 feat: major improvements and version bump to 0.6.24 - Remove prompts functionality (prompts.ts and tests) - Improve error handling with VideoDownloadError class - Move configuration to dedicated file - Add URL validation and security checks - Reorganize code into modules - Add comprehensive unit tests - Enhance documentation with JSDoc and examples 2025-02-22 00:43:15 +08:00
kevinwatt
21689391bd add download_audio tool to README.md 2025-02-21 17:43:33 +08:00
kevinwatt
5152ad4d17 fix yt-dlp error handling for audio download 2025-02-21 17:40:35 +08:00
kevinwatt
c4dcc0eda2 v0.6.23 2025-02-21 17:22:47 +08:00
kevin
12fa5dbffe v0.6.22 2025-02-21 17:19:13 +08:00
kevin
b3e8ed5f58 feat: improve audio download support
- Add support for various audio formats (m4a/mp3)
- Update audio download format selection logic
- Improve error handling and filename display
- Bump version to 0.6.22
2025-02-21 17:14:28 +08:00
kevin
576549bc2c update description in README.md 2025-02-21 16:56:45 +08:00
kevin
9c25179fab more descriptive description 2025-02-21 16:55:31 +08:00
kevin
7710184faf fix: update description 2025-02-21 16:54:25 +08:00
kevin
adf1b7178c new description for package.json 2025-02-21 16:52:56 +08:00
kevin
58384bb1a2 fix: improve subtitle handling and tool names
- Rename list_video_subtitles to list_subtitle_languages for clarity
- Update tool descriptions to better reflect functionality
- Improve subtitle listing output format
- Simplify subtitle download parameters
- Add verbose logging for better debugging
- Bump version to 0.6.21
2025-02-21 16:42:12 +08:00
kevin
5523b1dedd fix: improve subtitle download reliability
- Use --write-sub --write-auto-sub combination for better subtitle support
- Simplify subtitle download logic to handle both regular and auto-generated subtitles
- Add debug logging for better troubleshooting
- Filter only .srt files as final output
- Bump version to 0.6.20
2025-02-21 15:54:11 +08:00
kevin
5b96dff785 fix: update version 2025-02-21 15:52:42 +08:00
kevin
b0eeb5f831 fix: improve subtitle download reliability
- Use --write-sub --write-auto-sub combination for better subtitle support
- Simplify subtitle download logic to handle both regular and auto-generated subtitles
- Add debug logging for better troubleshooting
- Filter only .srt files as final output
- Bump version to 0.6.19
2025-02-21 15:52:13 +08:00
kevin
f9c93a0463 feat: improve auto-generated subtitles support
- Update downloadSubtitles to properly handle auto-generated subtitles
- Update listSubtitles to show all available subtitles including auto-generated ones
- Update tool descriptions to clearly indicate auto-generated subtitles support
- Simplify error handling to show direct yt-dlp messages
- Bump version to 0.6.17
2025-02-21 15:46:33 +08:00
kevin
e1d09fc3ca fix: improve subtitle listing
- Fix listSubtitles to properly show auto-generated subtitles
- Remove redundant error handling in subtitle listing
- Pass through yt-dlp messages directly
- Bump version to 0.6.16
2025-02-21 15:42:58 +08:00
kevin
7537cd3326 feat: simplify error handling
- Remove custom error types and error codes
- Pass through yt-dlp error messages directly
- Simplify subtitle file filtering
- Bump version to 0.6.15
2025-02-21 15:40:57 +08:00
kevin
614d865b05 feat: rename download_video_srt to download_video_subtitles
- Rename tool from download_video_srt to download_video_subtitles
- Update tool description to better reflect subtitle format support
- Bump version to 0.6.14
2025-02-21 15:36:47 +08:00
kevin
c8d2199486 fix: improve subtitle download functionality
- Fixed YouTube subtitle download functionality
- Added support for VTT subtitle format
- Improved subtitle download error handling
- Enhanced auto-generated subtitle detection
2025-02-21 15:12:19 +08:00
kevin
cbf82eee32 chore: bump version to 0.6.12 2025-02-21 14:55:30 +08:00
42 changed files with 10607 additions and 1517 deletions

View File

@ -0,0 +1,202 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View File

@ -0,0 +1,328 @@
---
name: mcp-builder
description: Guide for creating high-quality MCP (Model Context Protocol) servers that enable LLMs to interact with external services through well-designed tools. Use when building MCP servers to integrate external APIs or services, whether in Python (FastMCP) or Node/TypeScript (MCP SDK).
license: Complete terms in LICENSE.txt
---
# MCP Server Development Guide
## Overview
To create high-quality MCP (Model Context Protocol) servers that enable LLMs to effectively interact with external services, use this skill. An MCP server provides tools that allow LLMs to access external services and APIs. The quality of an MCP server is measured by how well it enables LLMs to accomplish real-world tasks using the tools provided.
---
# Process
## 🚀 High-Level Workflow
Creating a high-quality MCP server involves four main phases:
### Phase 1: Deep Research and Planning
#### 1.1 Understand Agent-Centric Design Principles
Before diving into implementation, understand how to design tools for AI agents by reviewing these principles:
**Build for Workflows, Not Just API Endpoints:**
- Don't simply wrap existing API endpoints - build thoughtful, high-impact workflow tools
- Consolidate related operations (e.g., `schedule_event` that both checks availability and creates event)
- Focus on tools that enable complete tasks, not just individual API calls
- Consider what workflows agents actually need to accomplish
**Optimize for Limited Context:**
- Agents have constrained context windows - make every token count
- Return high-signal information, not exhaustive data dumps
- Provide "concise" vs "detailed" response format options
- Default to human-readable identifiers over technical codes (names over IDs)
- Consider the agent's context budget as a scarce resource
**Design Actionable Error Messages:**
- Error messages should guide agents toward correct usage patterns
- Suggest specific next steps: "Try using filter='active_only' to reduce results"
- Make errors educational, not just diagnostic
- Help agents learn proper tool usage through clear feedback
**Follow Natural Task Subdivisions:**
- Tool names should reflect how humans think about tasks
- Group related tools with consistent prefixes for discoverability
- Design tools around natural workflows, not just API structure
**Use Evaluation-Driven Development:**
- Create realistic evaluation scenarios early
- Let agent feedback drive tool improvements
- Prototype quickly and iterate based on actual agent performance
#### 1.3 Study MCP Protocol Documentation
**Fetch the latest MCP protocol documentation:**
Use WebFetch to load: `https://modelcontextprotocol.io/llms-full.txt`
This comprehensive document contains the complete MCP specification and guidelines.
#### 1.4 Study Framework Documentation
**Load and read the following reference files:**
- **MCP Best Practices**: [📋 View Best Practices](./reference/mcp_best_practices.md) - Core guidelines for all MCP servers
**For Python implementations, also load:**
- **Python SDK Documentation**: Use WebFetch to load `https://raw.githubusercontent.com/modelcontextprotocol/python-sdk/main/README.md`
- [🐍 Python Implementation Guide](./reference/python_mcp_server.md) - Python-specific best practices and examples
**For Node/TypeScript implementations, also load:**
- **TypeScript SDK Documentation**: Use WebFetch to load `https://raw.githubusercontent.com/modelcontextprotocol/typescript-sdk/main/README.md`
- [⚡ TypeScript Implementation Guide](./reference/node_mcp_server.md) - Node/TypeScript-specific best practices and examples
#### 1.5 Exhaustively Study API Documentation
To integrate a service, read through **ALL** available API documentation:
- Official API reference documentation
- Authentication and authorization requirements
- Rate limiting and pagination patterns
- Error responses and status codes
- Available endpoints and their parameters
- Data models and schemas
**To gather comprehensive information, use web search and the WebFetch tool as needed.**
#### 1.6 Create a Comprehensive Implementation Plan
Based on your research, create a detailed plan that includes:
**Tool Selection:**
- List the most valuable endpoints/operations to implement
- Prioritize tools that enable the most common and important use cases
- Consider which tools work together to enable complex workflows
**Shared Utilities and Helpers:**
- Identify common API request patterns
- Plan pagination helpers
- Design filtering and formatting utilities
- Plan error handling strategies
**Input/Output Design:**
- Define input validation models (Pydantic for Python, Zod for TypeScript)
- Design consistent response formats (e.g., JSON or Markdown), and configurable levels of detail (e.g., Detailed or Concise)
- Plan for large-scale usage (thousands of users/resources)
- Implement character limits and truncation strategies (e.g., 25,000 tokens)
**Error Handling Strategy:**
- Plan graceful failure modes
- Design clear, actionable, LLM-friendly, natural language error messages which prompt further action
- Consider rate limiting and timeout scenarios
- Handle authentication and authorization errors
---
### Phase 2: Implementation
Now that you have a comprehensive plan, begin implementation following language-specific best practices.
#### 2.1 Set Up Project Structure
**For Python:**
- Create a single `.py` file or organize into modules if complex (see [🐍 Python Guide](./reference/python_mcp_server.md))
- Use the MCP Python SDK for tool registration
- Define Pydantic models for input validation
**For Node/TypeScript:**
- Create proper project structure (see [⚡ TypeScript Guide](./reference/node_mcp_server.md))
- Set up `package.json` and `tsconfig.json`
- Use MCP TypeScript SDK
- Define Zod schemas for input validation
#### 2.2 Implement Core Infrastructure First
**To begin implementation, create shared utilities before implementing tools:**
- API request helper functions
- Error handling utilities
- Response formatting functions (JSON and Markdown)
- Pagination helpers
- Authentication/token management
#### 2.3 Implement Tools Systematically
For each tool in the plan:
**Define Input Schema:**
- Use Pydantic (Python) or Zod (TypeScript) for validation
- Include proper constraints (min/max length, regex patterns, min/max values, ranges)
- Provide clear, descriptive field descriptions
- Include diverse examples in field descriptions
**Write Comprehensive Docstrings/Descriptions:**
- One-line summary of what the tool does
- Detailed explanation of purpose and functionality
- Explicit parameter types with examples
- Complete return type schema
- Usage examples (when to use, when not to use)
- Error handling documentation, which outlines how to proceed given specific errors
**Implement Tool Logic:**
- Use shared utilities to avoid code duplication
- Follow async/await patterns for all I/O
- Implement proper error handling
- Support multiple response formats (JSON and Markdown)
- Respect pagination parameters
- Check character limits and truncate appropriately
**Add Tool Annotations:**
- `readOnlyHint`: true (for read-only operations)
- `destructiveHint`: false (for non-destructive operations)
- `idempotentHint`: true (if repeated calls have same effect)
- `openWorldHint`: true (if interacting with external systems)
#### 2.4 Follow Language-Specific Best Practices
**At this point, load the appropriate language guide:**
**For Python: Load [🐍 Python Implementation Guide](./reference/python_mcp_server.md) and ensure the following:**
- Using MCP Python SDK with proper tool registration
- Pydantic v2 models with `model_config`
- Type hints throughout
- Async/await for all I/O operations
- Proper imports organization
- Module-level constants (CHARACTER_LIMIT, API_BASE_URL)
**For Node/TypeScript: Load [⚡ TypeScript Implementation Guide](./reference/node_mcp_server.md) and ensure the following:**
- Using `server.registerTool` properly
- Zod schemas with `.strict()`
- TypeScript strict mode enabled
- No `any` types - use proper types
- Explicit Promise<T> return types
- Build process configured (`npm run build`)
---
### Phase 3: Review and Refine
After initial implementation:
#### 3.1 Code Quality Review
To ensure quality, review the code for:
- **DRY Principle**: No duplicated code between tools
- **Composability**: Shared logic extracted into functions
- **Consistency**: Similar operations return similar formats
- **Error Handling**: All external calls have error handling
- **Type Safety**: Full type coverage (Python type hints, TypeScript types)
- **Documentation**: Every tool has comprehensive docstrings/descriptions
#### 3.2 Test and Build
**Important:** MCP servers are long-running processes that wait for requests over stdio/stdin or sse/http. Running them directly in your main process (e.g., `python server.py` or `node dist/index.js`) will cause your process to hang indefinitely.
**Safe ways to test the server:**
- Use the evaluation harness (see Phase 4) - recommended approach
- Run the server in tmux to keep it outside your main process
- Use a timeout when testing: `timeout 5s python server.py`
**For Python:**
- Verify Python syntax: `python -m py_compile your_server.py`
- Check imports work correctly by reviewing the file
- To manually test: Run server in tmux, then test with evaluation harness in main process
- Or use the evaluation harness directly (it manages the server for stdio transport)
**For Node/TypeScript:**
- Run `npm run build` and ensure it completes without errors
- Verify dist/index.js is created
- To manually test: Run server in tmux, then test with evaluation harness in main process
- Or use the evaluation harness directly (it manages the server for stdio transport)
#### 3.3 Use Quality Checklist
To verify implementation quality, load the appropriate checklist from the language-specific guide:
- Python: see "Quality Checklist" in [🐍 Python Guide](./reference/python_mcp_server.md)
- Node/TypeScript: see "Quality Checklist" in [⚡ TypeScript Guide](./reference/node_mcp_server.md)
---
### Phase 4: Create Evaluations
After implementing your MCP server, create comprehensive evaluations to test its effectiveness.
**Load [✅ Evaluation Guide](./reference/evaluation.md) for complete evaluation guidelines.**
#### 4.1 Understand Evaluation Purpose
Evaluations test whether LLMs can effectively use your MCP server to answer realistic, complex questions.
#### 4.2 Create 10 Evaluation Questions
To create effective evaluations, follow the process outlined in the evaluation guide:
1. **Tool Inspection**: List available tools and understand their capabilities
2. **Content Exploration**: Use READ-ONLY operations to explore available data
3. **Question Generation**: Create 10 complex, realistic questions
4. **Answer Verification**: Solve each question yourself to verify answers
#### 4.3 Evaluation Requirements
Each question must be:
- **Independent**: Not dependent on other questions
- **Read-only**: Only non-destructive operations required
- **Complex**: Requiring multiple tool calls and deep exploration
- **Realistic**: Based on real use cases humans would care about
- **Verifiable**: Single, clear answer that can be verified by string comparison
- **Stable**: Answer won't change over time
#### 4.4 Output Format
Create an XML file with this structure:
```xml
<evaluation>
<qa_pair>
<question>Find discussions about AI model launches with animal codenames. One model needed a specific safety designation that uses the format ASL-X. What number X was being determined for the model named after a spotted wild cat?</question>
<answer>3</answer>
</qa_pair>
<!-- More qa_pairs... -->
</evaluation>
```
---
# Reference Files
## 📚 Documentation Library
Load these resources as needed during development:
### Core MCP Documentation (Load First)
- **MCP Protocol**: Fetch from `https://modelcontextprotocol.io/llms-full.txt` - Complete MCP specification
- [📋 MCP Best Practices](./reference/mcp_best_practices.md) - Universal MCP guidelines including:
- Server and tool naming conventions
- Response format guidelines (JSON vs Markdown)
- Pagination best practices
- Character limits and truncation strategies
- Tool development guidelines
- Security and error handling standards
### SDK Documentation (Load During Phase 1/2)
- **Python SDK**: Fetch from `https://raw.githubusercontent.com/modelcontextprotocol/python-sdk/main/README.md`
- **TypeScript SDK**: Fetch from `https://raw.githubusercontent.com/modelcontextprotocol/typescript-sdk/main/README.md`
### Language-Specific Implementation Guides (Load During Phase 2)
- [🐍 Python Implementation Guide](./reference/python_mcp_server.md) - Complete Python/FastMCP guide with:
- Server initialization patterns
- Pydantic model examples
- Tool registration with `@mcp.tool`
- Complete working examples
- Quality checklist
- [⚡ TypeScript Implementation Guide](./reference/node_mcp_server.md) - Complete TypeScript guide with:
- Project structure
- Zod schema patterns
- Tool registration with `server.registerTool`
- Complete working examples
- Quality checklist
### Evaluation Guide (Load During Phase 4)
- [✅ Evaluation Guide](./reference/evaluation.md) - Complete evaluation creation guide with:
- Question creation guidelines
- Answer verification strategies
- XML format specifications
- Example questions and answers
- Running an evaluation with the provided scripts

View File

@ -0,0 +1,602 @@
# MCP Server Evaluation Guide
## Overview
This document provides guidance on creating comprehensive evaluations for MCP servers. Evaluations test whether LLMs can effectively use your MCP server to answer realistic, complex questions using only the tools provided.
---
## Quick Reference
### Evaluation Requirements
- Create 10 human-readable questions
- Questions must be READ-ONLY, INDEPENDENT, NON-DESTRUCTIVE
- Each question requires multiple tool calls (potentially dozens)
- Answers must be single, verifiable values
- Answers must be STABLE (won't change over time)
### Output Format
```xml
<evaluation>
<qa_pair>
<question>Your question here</question>
<answer>Single verifiable answer</answer>
</qa_pair>
</evaluation>
```
---
## Purpose of Evaluations
The measure of quality of an MCP server is NOT how well or comprehensively the server implements tools, but how well these implementations (input/output schemas, docstrings/descriptions, functionality) enable LLMs with no other context and access ONLY to the MCP servers to answer realistic and difficult questions.
## Evaluation Overview
Create 10 human-readable questions requiring ONLY READ-ONLY, INDEPENDENT, NON-DESTRUCTIVE, and IDEMPOTENT operations to answer. Each question should be:
- Realistic
- Clear and concise
- Unambiguous
- Complex, requiring potentially dozens of tool calls or steps
- Answerable with a single, verifiable value that you identify in advance
## Question Guidelines
### Core Requirements
1. **Questions MUST be independent**
- Each question should NOT depend on the answer to any other question
- Should not assume prior write operations from processing another question
2. **Questions MUST require ONLY NON-DESTRUCTIVE AND IDEMPOTENT tool use**
- Should not instruct or require modifying state to arrive at the correct answer
3. **Questions must be REALISTIC, CLEAR, CONCISE, and COMPLEX**
- Must require another LLM to use multiple (potentially dozens of) tools or steps to answer
### Complexity and Depth
4. **Questions must require deep exploration**
- Consider multi-hop questions requiring multiple sub-questions and sequential tool calls
- Each step should benefit from information found in previous questions
5. **Questions may require extensive paging**
- May need paging through multiple pages of results
- May require querying old data (1-2 years out-of-date) to find niche information
- The questions must be DIFFICULT
6. **Questions must require deep understanding**
- Rather than surface-level knowledge
- May pose complex ideas as True/False questions requiring evidence
- May use multiple-choice format where LLM must search different hypotheses
7. **Questions must not be solvable with straightforward keyword search**
- Do not include specific keywords from the target content
- Use synonyms, related concepts, or paraphrases
- Require multiple searches, analyzing multiple related items, extracting context, then deriving the answer
### Tool Testing
8. **Questions should stress-test tool return values**
- May elicit tools returning large JSON objects or lists, overwhelming the LLM
- Should require understanding multiple modalities of data:
- IDs and names
- Timestamps and datetimes (months, days, years, seconds)
- File IDs, names, extensions, and mimetypes
- URLs, GIDs, etc.
- Should probe the tool's ability to return all useful forms of data
9. **Questions should MOSTLY reflect real human use cases**
- The kinds of information retrieval tasks that HUMANS assisted by an LLM would care about
10. **Questions may require dozens of tool calls**
- This challenges LLMs with limited context
- Encourages MCP server tools to reduce information returned
11. **Include ambiguous questions**
- May be ambiguous OR require difficult decisions on which tools to call
- Force the LLM to potentially make mistakes or misinterpret
- Ensure that despite AMBIGUITY, there is STILL A SINGLE VERIFIABLE ANSWER
### Stability
12. **Questions must be designed so the answer DOES NOT CHANGE**
- Do not ask questions that rely on "current state" which is dynamic
- For example, do not count:
- Number of reactions to a post
- Number of replies to a thread
- Number of members in a channel
13. **DO NOT let the MCP server RESTRICT the kinds of questions you create**
- Create challenging and complex questions
- Some may not be solvable with the available MCP server tools
- Questions may require specific output formats (datetime vs. epoch time, JSON vs. MARKDOWN)
- Questions may require dozens of tool calls to complete
## Answer Guidelines
### Verification
1. **Answers must be VERIFIABLE via direct string comparison**
- If the answer can be re-written in many formats, clearly specify the output format in the QUESTION
- Examples: "Use YYYY/MM/DD.", "Respond True or False.", "Answer A, B, C, or D and nothing else."
- Answer should be a single VERIFIABLE value such as:
- User ID, user name, display name, first name, last name
- Channel ID, channel name
- Message ID, string
- URL, title
- Numerical quantity
- Timestamp, datetime
- Boolean (for True/False questions)
- Email address, phone number
- File ID, file name, file extension
- Multiple choice answer
- Answers must not require special formatting or complex, structured output
- Answer will be verified using DIRECT STRING COMPARISON
### Readability
2. **Answers should generally prefer HUMAN-READABLE formats**
- Examples: names, first name, last name, datetime, file name, message string, URL, yes/no, true/false, a/b/c/d
- Rather than opaque IDs (though IDs are acceptable)
- The VAST MAJORITY of answers should be human-readable
### Stability
3. **Answers must be STABLE/STATIONARY**
- Look at old content (e.g., conversations that have ended, projects that have launched, questions answered)
- Create QUESTIONS based on "closed" concepts that will always return the same answer
- Questions may ask to consider a fixed time window to insulate from non-stationary answers
- Rely on context UNLIKELY to change
- Example: if finding a paper name, be SPECIFIC enough so answer is not confused with papers published later
4. **Answers must be CLEAR and UNAMBIGUOUS**
- Questions must be designed so there is a single, clear answer
- Answer can be derived from using the MCP server tools
### Diversity
5. **Answers must be DIVERSE**
- Answer should be a single VERIFIABLE value in diverse modalities and formats
- User concept: user ID, user name, display name, first name, last name, email address, phone number
- Channel concept: channel ID, channel name, channel topic
- Message concept: message ID, message string, timestamp, month, day, year
6. **Answers must NOT be complex structures**
- Not a list of values
- Not a complex object
- Not a list of IDs or strings
- Not natural language text
- UNLESS the answer can be straightforwardly verified using DIRECT STRING COMPARISON
- And can be realistically reproduced
- It should be unlikely that an LLM would return the same list in any other order or format
## Evaluation Process
### Step 1: Documentation Inspection
Read the documentation of the target API to understand:
- Available endpoints and functionality
- If ambiguity exists, fetch additional information from the web
- Parallelize this step AS MUCH AS POSSIBLE
- Ensure each subagent is ONLY examining documentation from the file system or on the web
### Step 2: Tool Inspection
List the tools available in the MCP server:
- Inspect the MCP server directly
- Understand input/output schemas, docstrings, and descriptions
- WITHOUT calling the tools themselves at this stage
### Step 3: Developing Understanding
Repeat steps 1 & 2 until you have a good understanding:
- Iterate multiple times
- Think about the kinds of tasks you want to create
- Refine your understanding
- At NO stage should you READ the code of the MCP server implementation itself
- Use your intuition and understanding to create reasonable, realistic, but VERY challenging tasks
### Step 4: Read-Only Content Inspection
After understanding the API and tools, USE the MCP server tools:
- Inspect content using READ-ONLY and NON-DESTRUCTIVE operations ONLY
- Goal: identify specific content (e.g., users, channels, messages, projects, tasks) for creating realistic questions
- Should NOT call any tools that modify state
- Will NOT read the code of the MCP server implementation itself
- Parallelize this step with individual sub-agents pursuing independent explorations
- Ensure each subagent is only performing READ-ONLY, NON-DESTRUCTIVE, and IDEMPOTENT operations
- BE CAREFUL: SOME TOOLS may return LOTS OF DATA which would cause you to run out of CONTEXT
- Make INCREMENTAL, SMALL, AND TARGETED tool calls for exploration
- In all tool call requests, use the `limit` parameter to limit results (<10)
- Use pagination
### Step 5: Task Generation
After inspecting the content, create 10 human-readable questions:
- An LLM should be able to answer these with the MCP server
- Follow all question and answer guidelines above
## Output Format
Each QA pair consists of a question and an answer. The output should be an XML file with this structure:
```xml
<evaluation>
<qa_pair>
<question>Find the project created in Q2 2024 with the highest number of completed tasks. What is the project name?</question>
<answer>Website Redesign</answer>
</qa_pair>
<qa_pair>
<question>Search for issues labeled as "bug" that were closed in March 2024. Which user closed the most issues? Provide their username.</question>
<answer>sarah_dev</answer>
</qa_pair>
<qa_pair>
<question>Look for pull requests that modified files in the /api directory and were merged between January 1 and January 31, 2024. How many different contributors worked on these PRs?</question>
<answer>7</answer>
</qa_pair>
<qa_pair>
<question>Find the repository with the most stars that was created before 2023. What is the repository name?</question>
<answer>data-pipeline</answer>
</qa_pair>
</evaluation>
```
## Evaluation Examples
### Good Questions
**Example 1: Multi-hop question requiring deep exploration (GitHub MCP)**
```xml
<qa_pair>
<question>Find the repository that was archived in Q3 2023 and had previously been the most forked project in the organization. What was the primary programming language used in that repository?</question>
<answer>Python</answer>
</qa_pair>
```
This question is good because:
- Requires multiple searches to find archived repositories
- Needs to identify which had the most forks before archival
- Requires examining repository details for the language
- Answer is a simple, verifiable value
- Based on historical (closed) data that won't change
**Example 2: Requires understanding context without keyword matching (Project Management MCP)**
```xml
<qa_pair>
<question>Locate the initiative focused on improving customer onboarding that was completed in late 2023. The project lead created a retrospective document after completion. What was the lead's role title at that time?</question>
<answer>Product Manager</answer>
</qa_pair>
```
This question is good because:
- Doesn't use specific project name ("initiative focused on improving customer onboarding")
- Requires finding completed projects from specific timeframe
- Needs to identify the project lead and their role
- Requires understanding context from retrospective documents
- Answer is human-readable and stable
- Based on completed work (won't change)
**Example 3: Complex aggregation requiring multiple steps (Issue Tracker MCP)**
```xml
<qa_pair>
<question>Among all bugs reported in January 2024 that were marked as critical priority, which assignee resolved the highest percentage of their assigned bugs within 48 hours? Provide the assignee's username.</question>
<answer>alex_eng</answer>
</qa_pair>
```
This question is good because:
- Requires filtering bugs by date, priority, and status
- Needs to group by assignee and calculate resolution rates
- Requires understanding timestamps to determine 48-hour windows
- Tests pagination (potentially many bugs to process)
- Answer is a single username
- Based on historical data from specific time period
**Example 4: Requires synthesis across multiple data types (CRM MCP)**
```xml
<qa_pair>
<question>Find the account that upgraded from the Starter to Enterprise plan in Q4 2023 and had the highest annual contract value. What industry does this account operate in?</question>
<answer>Healthcare</answer>
</qa_pair>
```
This question is good because:
- Requires understanding subscription tier changes
- Needs to identify upgrade events in specific timeframe
- Requires comparing contract values
- Must access account industry information
- Answer is simple and verifiable
- Based on completed historical transactions
### Poor Questions
**Example 1: Answer changes over time**
```xml
<qa_pair>
<question>How many open issues are currently assigned to the engineering team?</question>
<answer>47</answer>
</qa_pair>
```
This question is poor because:
- The answer will change as issues are created, closed, or reassigned
- Not based on stable/stationary data
- Relies on "current state" which is dynamic
**Example 2: Too easy with keyword search**
```xml
<qa_pair>
<question>Find the pull request with title "Add authentication feature" and tell me who created it.</question>
<answer>developer123</answer>
</qa_pair>
```
This question is poor because:
- Can be solved with a straightforward keyword search for exact title
- Doesn't require deep exploration or understanding
- No synthesis or analysis needed
**Example 3: Ambiguous answer format**
```xml
<qa_pair>
<question>List all the repositories that have Python as their primary language.</question>
<answer>repo1, repo2, repo3, data-pipeline, ml-tools</answer>
</qa_pair>
```
This question is poor because:
- Answer is a list that could be returned in any order
- Difficult to verify with direct string comparison
- LLM might format differently (JSON array, comma-separated, newline-separated)
- Better to ask for a specific aggregate (count) or superlative (most stars)
## Verification Process
After creating evaluations:
1. **Examine the XML file** to understand the schema
2. **Load each task instruction** and in parallel using the MCP server and tools, identify the correct answer by attempting to solve the task YOURSELF
3. **Flag any operations** that require WRITE or DESTRUCTIVE operations
4. **Accumulate all CORRECT answers** and replace any incorrect answers in the document
5. **Remove any `<qa_pair>`** that require WRITE or DESTRUCTIVE operations
Remember to parallelize solving tasks to avoid running out of context, then accumulate all answers and make changes to the file at the end.
## Tips for Creating Quality Evaluations
1. **Think Hard and Plan Ahead** before generating tasks
2. **Parallelize Where Opportunity Arises** to speed up the process and manage context
3. **Focus on Realistic Use Cases** that humans would actually want to accomplish
4. **Create Challenging Questions** that test the limits of the MCP server's capabilities
5. **Ensure Stability** by using historical data and closed concepts
6. **Verify Answers** by solving the questions yourself using the MCP server tools
7. **Iterate and Refine** based on what you learn during the process
---
# Running Evaluations
After creating your evaluation file, you can use the provided evaluation harness to test your MCP server.
## Setup
1. **Install Dependencies**
```bash
pip install -r scripts/requirements.txt
```
Or install manually:
```bash
pip install anthropic mcp
```
2. **Set API Key**
```bash
export ANTHROPIC_API_KEY=your_api_key_here
```
## Evaluation File Format
Evaluation files use XML format with `<qa_pair>` elements:
```xml
<evaluation>
<qa_pair>
<question>Find the project created in Q2 2024 with the highest number of completed tasks. What is the project name?</question>
<answer>Website Redesign</answer>
</qa_pair>
<qa_pair>
<question>Search for issues labeled as "bug" that were closed in March 2024. Which user closed the most issues? Provide their username.</question>
<answer>sarah_dev</answer>
</qa_pair>
</evaluation>
```
## Running Evaluations
The evaluation script (`scripts/evaluation.py`) supports three transport types:
**Important:**
- **stdio transport**: The evaluation script automatically launches and manages the MCP server process for you. Do not run the server manually.
- **sse/http transports**: You must start the MCP server separately before running the evaluation. The script connects to the already-running server at the specified URL.
### 1. Local STDIO Server
For locally-run MCP servers (script launches the server automatically):
```bash
python scripts/evaluation.py \
-t stdio \
-c python \
-a my_mcp_server.py \
evaluation.xml
```
With environment variables:
```bash
python scripts/evaluation.py \
-t stdio \
-c python \
-a my_mcp_server.py \
-e API_KEY=abc123 \
-e DEBUG=true \
evaluation.xml
```
### 2. Server-Sent Events (SSE)
For SSE-based MCP servers (you must start the server first):
```bash
python scripts/evaluation.py \
-t sse \
-u https://example.com/mcp \
-H "Authorization: Bearer token123" \
-H "X-Custom-Header: value" \
evaluation.xml
```
### 3. HTTP (Streamable HTTP)
For HTTP-based MCP servers (you must start the server first):
```bash
python scripts/evaluation.py \
-t http \
-u https://example.com/mcp \
-H "Authorization: Bearer token123" \
evaluation.xml
```
## Command-Line Options
```
usage: evaluation.py [-h] [-t {stdio,sse,http}] [-m MODEL] [-c COMMAND]
[-a ARGS [ARGS ...]] [-e ENV [ENV ...]] [-u URL]
[-H HEADERS [HEADERS ...]] [-o OUTPUT]
eval_file
positional arguments:
eval_file Path to evaluation XML file
optional arguments:
-h, --help Show help message
-t, --transport Transport type: stdio, sse, or http (default: stdio)
-m, --model Claude model to use (default: claude-3-7-sonnet-20250219)
-o, --output Output file for report (default: print to stdout)
stdio options:
-c, --command Command to run MCP server (e.g., python, node)
-a, --args Arguments for the command (e.g., server.py)
-e, --env Environment variables in KEY=VALUE format
sse/http options:
-u, --url MCP server URL
-H, --header HTTP headers in 'Key: Value' format
```
## Output
The evaluation script generates a detailed report including:
- **Summary Statistics**:
- Accuracy (correct/total)
- Average task duration
- Average tool calls per task
- Total tool calls
- **Per-Task Results**:
- Prompt and expected response
- Actual response from the agent
- Whether the answer was correct (✅/❌)
- Duration and tool call details
- Agent's summary of its approach
- Agent's feedback on the tools
### Save Report to File
```bash
python scripts/evaluation.py \
-t stdio \
-c python \
-a my_server.py \
-o evaluation_report.md \
evaluation.xml
```
## Complete Example Workflow
Here's a complete example of creating and running an evaluation:
1. **Create your evaluation file** (`my_evaluation.xml`):
```xml
<evaluation>
<qa_pair>
<question>Find the user who created the most issues in January 2024. What is their username?</question>
<answer>alice_developer</answer>
</qa_pair>
<qa_pair>
<question>Among all pull requests merged in Q1 2024, which repository had the highest number? Provide the repository name.</question>
<answer>backend-api</answer>
</qa_pair>
<qa_pair>
<question>Find the project that was completed in December 2023 and had the longest duration from start to finish. How many days did it take?</question>
<answer>127</answer>
</qa_pair>
</evaluation>
```
2. **Install dependencies**:
```bash
pip install -r scripts/requirements.txt
export ANTHROPIC_API_KEY=your_api_key
```
3. **Run evaluation**:
```bash
python scripts/evaluation.py \
-t stdio \
-c python \
-a github_mcp_server.py \
-e GITHUB_TOKEN=ghp_xxx \
-o github_eval_report.md \
my_evaluation.xml
```
4. **Review the report** in `github_eval_report.md` to:
- See which questions passed/failed
- Read the agent's feedback on your tools
- Identify areas for improvement
- Iterate on your MCP server design
## Troubleshooting
### Connection Errors
If you get connection errors:
- **STDIO**: Verify the command and arguments are correct
- **SSE/HTTP**: Check the URL is accessible and headers are correct
- Ensure any required API keys are set in environment variables or headers
### Low Accuracy
If many evaluations fail:
- Review the agent's feedback for each task
- Check if tool descriptions are clear and comprehensive
- Verify input parameters are well-documented
- Consider whether tools return too much or too little data
- Ensure error messages are actionable
### Timeout Issues
If tasks are timing out:
- Use a more capable model (e.g., `claude-3-7-sonnet-20250219`)
- Check if tools are returning too much data
- Verify pagination is working correctly
- Consider simplifying complex questions

View File

@ -0,0 +1,915 @@
# MCP Server Development Best Practices and Guidelines
## Overview
This document compiles essential best practices and guidelines for building Model Context Protocol (MCP) servers. It covers naming conventions, tool design, response formats, pagination, error handling, security, and compliance requirements.
---
## Quick Reference
### Server Naming
- **Python**: `{service}_mcp` (e.g., `slack_mcp`)
- **Node/TypeScript**: `{service}-mcp-server` (e.g., `slack-mcp-server`)
### Tool Naming
- Use snake_case with service prefix
- Format: `{service}_{action}_{resource}`
- Example: `slack_send_message`, `github_create_issue`
### Response Formats
- Support both JSON and Markdown formats
- JSON for programmatic processing
- Markdown for human readability
### Pagination
- Always respect `limit` parameter
- Return `has_more`, `next_offset`, `total_count`
- Default to 20-50 items
### Character Limits
- Set CHARACTER_LIMIT constant (typically 25,000)
- Truncate gracefully with clear messages
- Provide guidance on filtering
---
## Table of Contents
1. Server Naming Conventions
2. Tool Naming and Design
3. Response Format Guidelines
4. Pagination Best Practices
5. Character Limits and Truncation
6. Tool Development Best Practices
7. Transport Best Practices
8. Testing Requirements
9. OAuth and Security Best Practices
10. Resource Management Best Practices
11. Prompt Management Best Practices
12. Error Handling Standards
13. Documentation Requirements
14. Compliance and Monitoring
---
## 1. Server Naming Conventions
Follow these standardized naming patterns for MCP servers:
**Python**: Use format `{service}_mcp` (lowercase with underscores)
- Examples: `slack_mcp`, `github_mcp`, `jira_mcp`, `stripe_mcp`
**Node/TypeScript**: Use format `{service}-mcp-server` (lowercase with hyphens)
- Examples: `slack-mcp-server`, `github-mcp-server`, `jira-mcp-server`
The name should be:
- General (not tied to specific features)
- Descriptive of the service/API being integrated
- Easy to infer from the task description
- Without version numbers or dates
---
## 2. Tool Naming and Design
### Tool Naming Best Practices
1. **Use snake_case**: `search_users`, `create_project`, `get_channel_info`
2. **Include service prefix**: Anticipate that your MCP server may be used alongside other MCP servers
- Use `slack_send_message` instead of just `send_message`
- Use `github_create_issue` instead of just `create_issue`
- Use `asana_list_tasks` instead of just `list_tasks`
3. **Be action-oriented**: Start with verbs (get, list, search, create, etc.)
4. **Be specific**: Avoid generic names that could conflict with other servers
5. **Maintain consistency**: Use consistent naming patterns within your server
### Tool Design Guidelines
- Tool descriptions must narrowly and unambiguously describe functionality
- Descriptions must precisely match actual functionality
- Should not create confusion with other MCP servers
- Should provide tool annotations (readOnlyHint, destructiveHint, idempotentHint, openWorldHint)
- Keep tool operations focused and atomic
---
## 3. Response Format Guidelines
All tools that return data should support multiple formats for flexibility:
### JSON Format (`response_format="json"`)
- Machine-readable structured data
- Include all available fields and metadata
- Consistent field names and types
- Suitable for programmatic processing
- Use for when LLMs need to process data further
### Markdown Format (`response_format="markdown"`, typically default)
- Human-readable formatted text
- Use headers, lists, and formatting for clarity
- Convert timestamps to human-readable format (e.g., "2024-01-15 10:30:00 UTC" instead of epoch)
- Show display names with IDs in parentheses (e.g., "@john.doe (U123456)")
- Omit verbose metadata (e.g., show only one profile image URL, not all sizes)
- Group related information logically
- Use for when presenting information to users
---
## 4. Pagination Best Practices
For tools that list resources:
- **Always respect the `limit` parameter**: Never load all results when a limit is specified
- **Implement pagination**: Use `offset` or cursor-based pagination
- **Return pagination metadata**: Include `has_more`, `next_offset`/`next_cursor`, `total_count`
- **Never load all results into memory**: Especially important for large datasets
- **Default to reasonable limits**: 20-50 items is typical
- **Include clear pagination info in responses**: Make it easy for LLMs to request more data
Example pagination response structure:
```json
{
"total": 150,
"count": 20,
"offset": 0,
"items": [...],
"has_more": true,
"next_offset": 20
}
```
---
## 5. Character Limits and Truncation
To prevent overwhelming responses with too much data:
- **Define CHARACTER_LIMIT constant**: Typically 25,000 characters at module level
- **Check response size before returning**: Measure the final response length
- **Truncate gracefully with clear indicators**: Let the LLM know data was truncated
- **Provide guidance on filtering**: Suggest how to use parameters to reduce results
- **Include truncation metadata**: Show what was truncated and how to get more
Example truncation handling:
```python
CHARACTER_LIMIT = 25000
if len(result) > CHARACTER_LIMIT:
truncated_data = data[:max(1, len(data) // 2)]
response["truncated"] = True
response["truncation_message"] = (
f"Response truncated from {len(data)} to {len(truncated_data)} items. "
f"Use 'offset' parameter or add filters to see more results."
)
```
---
## 6. Transport Options
MCP servers support multiple transport mechanisms for different deployment scenarios:
### Stdio Transport
**Best for**: Command-line tools, local integrations, subprocess execution
**Characteristics**:
- Standard input/output stream communication
- Simple setup, no network configuration needed
- Runs as a subprocess of the client
- Ideal for desktop applications and CLI tools
**Use when**:
- Building tools for local development environments
- Integrating with desktop applications (e.g., Claude Desktop)
- Creating command-line utilities
- Single-user, single-session scenarios
### HTTP Transport
**Best for**: Web services, remote access, multi-client scenarios
**Characteristics**:
- Request-response pattern over HTTP
- Supports multiple simultaneous clients
- Can be deployed as a web service
- Requires network configuration and security considerations
**Use when**:
- Serving multiple clients simultaneously
- Deploying as a cloud service
- Integration with web applications
- Need for load balancing or scaling
### Server-Sent Events (SSE) Transport
**Best for**: Real-time updates, push notifications, streaming data
**Characteristics**:
- One-way server-to-client streaming over HTTP
- Enables real-time updates without polling
- Long-lived connections for continuous data flow
- Built on standard HTTP infrastructure
**Use when**:
- Clients need real-time data updates
- Implementing push notifications
- Streaming logs or monitoring data
- Progressive result delivery for long operations
### Transport Selection Criteria
| Criterion | Stdio | HTTP | SSE |
|-----------|-------|------|-----|
| **Deployment** | Local | Remote | Remote |
| **Clients** | Single | Multiple | Multiple |
| **Communication** | Bidirectional | Request-Response | Server-Push |
| **Complexity** | Low | Medium | Medium-High |
| **Real-time** | No | No | Yes |
---
## 7. Tool Development Best Practices
### General Guidelines
1. Tool names should be descriptive and action-oriented
2. Use parameter validation with detailed JSON schemas
3. Include examples in tool descriptions
4. Implement proper error handling and validation
5. Use progress reporting for long operations
6. Keep tool operations focused and atomic
7. Document expected return value structures
8. Implement proper timeouts
9. Consider rate limiting for resource-intensive operations
10. Log tool usage for debugging and monitoring
### Security Considerations for Tools
#### Input Validation
- Validate all parameters against schema
- Sanitize file paths and system commands
- Validate URLs and external identifiers
- Check parameter sizes and ranges
- Prevent command injection
#### Access Control
- Implement authentication where needed
- Use appropriate authorization checks
- Audit tool usage
- Rate limit requests
- Monitor for abuse
#### Error Handling
- Don't expose internal errors to clients
- Log security-relevant errors
- Handle timeouts appropriately
- Clean up resources after errors
- Validate return values
### Tool Annotations
- Provide readOnlyHint and destructiveHint annotations
- Remember annotations are hints, not security guarantees
- Clients should not make security-critical decisions based solely on annotations
---
## 8. Transport Best Practices
### General Transport Guidelines
1. Handle connection lifecycle properly
2. Implement proper error handling
3. Use appropriate timeout values
4. Implement connection state management
5. Clean up resources on disconnection
### Security Best Practices for Transport
- Follow security considerations for DNS rebinding attacks
- Implement proper authentication mechanisms
- Validate message formats
- Handle malformed messages gracefully
### Stdio Transport Specific
- Local MCP servers should NOT log to stdout (interferes with protocol)
- Use stderr for logging messages
- Handle standard I/O streams properly
---
## 9. Testing Requirements
A comprehensive testing strategy should cover:
### Functional Testing
- Verify correct execution with valid/invalid inputs
### Integration Testing
- Test interaction with external systems
### Security Testing
- Validate auth, input sanitization, rate limiting
### Performance Testing
- Check behavior under load, timeouts
### Error Handling
- Ensure proper error reporting and cleanup
---
## 10. OAuth and Security Best Practices
### Authentication and Authorization
MCP servers that connect to external services should implement proper authentication:
**OAuth 2.1 Implementation:**
- Use secure OAuth 2.1 with certificates from recognized authorities
- Validate access tokens before processing requests
- Only accept tokens specifically intended for your server
- Reject tokens without proper audience claims
- Never pass through tokens received from MCP clients
**API Key Management:**
- Store API keys in environment variables, never in code
- Validate keys on server startup
- Provide clear error messages when authentication fails
- Use secure transmission for sensitive credentials
### Input Validation and Security
**Always validate inputs:**
- Sanitize file paths to prevent directory traversal
- Validate URLs and external identifiers
- Check parameter sizes and ranges
- Prevent command injection in system calls
- Use schema validation (Pydantic/Zod) for all inputs
**Error handling security:**
- Don't expose internal errors to clients
- Log security-relevant errors server-side
- Provide helpful but not revealing error messages
- Clean up resources after errors
### Privacy and Data Protection
**Data collection principles:**
- Only collect data strictly necessary for functionality
- Don't collect extraneous conversation data
- Don't collect PII unless explicitly required for the tool's purpose
- Provide clear information about what data is accessed
**Data transmission:**
- Don't send data to servers outside your organization without disclosure
- Use secure transmission (HTTPS) for all network communication
- Validate certificates for external services
---
## 11. Resource Management Best Practices
1. Only suggest necessary resources
2. Use clear, descriptive names for roots
3. Handle resource boundaries properly
4. Respect client control over resources
5. Use model-controlled primitives (tools) for automatic data exposure
---
## 12. Prompt Management Best Practices
- Clients should show users proposed prompts
- Users should be able to modify or reject prompts
- Clients should show users completions
- Users should be able to modify or reject completions
- Consider costs when using sampling
---
## 13. Error Handling Standards
- Use standard JSON-RPC error codes
- Report tool errors within result objects (not protocol-level)
- Provide helpful, specific error messages
- Don't expose internal implementation details
- Clean up resources properly on errors
---
## 14. Documentation Requirements
- Provide clear documentation of all tools and capabilities
- Include working examples (at least 3 per major feature)
- Document security considerations
- Specify required permissions and access levels
- Document rate limits and performance characteristics
---
## 15. Compliance and Monitoring
- Implement logging for debugging and monitoring
- Track tool usage patterns
- Monitor for potential abuse
- Maintain audit trails for security-relevant operations
- Be prepared for ongoing compliance reviews
---
## Summary
These best practices represent the comprehensive guidelines for building secure, efficient, and compliant MCP servers that work well within the ecosystem. Developers should follow these guidelines to ensure their MCP servers meet the standards for inclusion in the MCP directory and provide a safe, reliable experience for users.
----------
# Tools
> Enable LLMs to perform actions through your server
Tools are a powerful primitive in the Model Context Protocol (MCP) that enable servers to expose executable functionality to clients. Through tools, LLMs can interact with external systems, perform computations, and take actions in the real world.
<Note>
Tools are designed to be **model-controlled**, meaning that tools are exposed from servers to clients with the intention of the AI model being able to automatically invoke them (with a human in the loop to grant approval).
</Note>
## Overview
Tools in MCP allow servers to expose executable functions that can be invoked by clients and used by LLMs to perform actions. Key aspects of tools include:
* **Discovery**: Clients can obtain a list of available tools by sending a `tools/list` request
* **Invocation**: Tools are called using the `tools/call` request, where servers perform the requested operation and return results
* **Flexibility**: Tools can range from simple calculations to complex API interactions
Like [resources](/docs/concepts/resources), tools are identified by unique names and can include descriptions to guide their usage. However, unlike resources, tools represent dynamic operations that can modify state or interact with external systems.
## Tool definition structure
Each tool is defined with the following structure:
```typescript
{
name: string; // Unique identifier for the tool
description?: string; // Human-readable description
inputSchema: { // JSON Schema for the tool's parameters
type: "object",
properties: { ... } // Tool-specific parameters
},
annotations?: { // Optional hints about tool behavior
title?: string; // Human-readable title for the tool
readOnlyHint?: boolean; // If true, the tool does not modify its environment
destructiveHint?: boolean; // If true, the tool may perform destructive updates
idempotentHint?: boolean; // If true, repeated calls with same args have no additional effect
openWorldHint?: boolean; // If true, tool interacts with external entities
}
}
```
## Implementing tools
Here's an example of implementing a basic tool in an MCP server:
<Tabs>
<Tab title="TypeScript">
```typescript
const server = new Server({
name: "example-server",
version: "1.0.0"
}, {
capabilities: {
tools: {}
}
});
// Define available tools
server.setRequestHandler(ListToolsRequestSchema, async () => {
return {
tools: [{
name: "calculate_sum",
description: "Add two numbers together",
inputSchema: {
type: "object",
properties: {
a: { type: "number" },
b: { type: "number" }
},
required: ["a", "b"]
}
}]
};
});
// Handle tool execution
server.setRequestHandler(CallToolRequestSchema, async (request) => {
if (request.params.name === "calculate_sum") {
const { a, b } = request.params.arguments;
return {
content: [
{
type: "text",
text: String(a + b)
}
]
};
}
throw new Error("Tool not found");
});
```
</Tab>
<Tab title="Python">
```python
app = Server("example-server")
@app.list_tools()
async def list_tools() -> list[types.Tool]:
return [
types.Tool(
name="calculate_sum",
description="Add two numbers together",
inputSchema={
"type": "object",
"properties": {
"a": {"type": "number"},
"b": {"type": "number"}
},
"required": ["a", "b"]
}
)
]
@app.call_tool()
async def call_tool(
name: str,
arguments: dict
) -> list[types.TextContent | types.ImageContent | types.EmbeddedResource]:
if name == "calculate_sum":
a = arguments["a"]
b = arguments["b"]
result = a + b
return [types.TextContent(type="text", text=str(result))]
raise ValueError(f"Tool not found: {name}")
```
</Tab>
</Tabs>
## Example tool patterns
Here are some examples of types of tools that a server could provide:
### System operations
Tools that interact with the local system:
```typescript
{
name: "execute_command",
description: "Run a shell command",
inputSchema: {
type: "object",
properties: {
command: { type: "string" },
args: { type: "array", items: { type: "string" } }
}
}
}
```
### API integrations
Tools that wrap external APIs:
```typescript
{
name: "github_create_issue",
description: "Create a GitHub issue",
inputSchema: {
type: "object",
properties: {
title: { type: "string" },
body: { type: "string" },
labels: { type: "array", items: { type: "string" } }
}
}
}
```
### Data processing
Tools that transform or analyze data:
```typescript
{
name: "analyze_csv",
description: "Analyze a CSV file",
inputSchema: {
type: "object",
properties: {
filepath: { type: "string" },
operations: {
type: "array",
items: {
enum: ["sum", "average", "count"]
}
}
}
}
}
```
## Best practices
When implementing tools:
1. Provide clear, descriptive names and descriptions
2. Use detailed JSON Schema definitions for parameters
3. Include examples in tool descriptions to demonstrate how the model should use them
4. Implement proper error handling and validation
5. Use progress reporting for long operations
6. Keep tool operations focused and atomic
7. Document expected return value structures
8. Implement proper timeouts
9. Consider rate limiting for resource-intensive operations
10. Log tool usage for debugging and monitoring
### Tool name conflicts
MCP client applications and MCP server proxies may encounter tool name conflicts when building their own tool lists. For example, two connected MCP servers `web1` and `web2` may both expose a tool named `search_web`.
Applications may disambiguiate tools with one of the following strategies (among others; not an exhaustive list):
* Concatenating a unique, user-defined server name with the tool name, e.g. `web1___search_web` and `web2___search_web`. This strategy may be preferable when unique server names are already provided by the user in a configuration file.
* Generating a random prefix for the tool name, e.g. `jrwxs___search_web` and `6cq52___search_web`. This strategy may be preferable in server proxies where user-defined unique names are not available.
* Using the server URI as a prefix for the tool name, e.g. `web1.example.com:search_web` and `web2.example.com:search_web`. This strategy may be suitable when working with remote MCP servers.
Note that the server-provided name from the initialization flow is not guaranteed to be unique and is not generally suitable for disambiguation purposes.
## Security considerations
When exposing tools:
### Input validation
* Validate all parameters against the schema
* Sanitize file paths and system commands
* Validate URLs and external identifiers
* Check parameter sizes and ranges
* Prevent command injection
### Access control
* Implement authentication where needed
* Use appropriate authorization checks
* Audit tool usage
* Rate limit requests
* Monitor for abuse
### Error handling
* Don't expose internal errors to clients
* Log security-relevant errors
* Handle timeouts appropriately
* Clean up resources after errors
* Validate return values
## Tool discovery and updates
MCP supports dynamic tool discovery:
1. Clients can list available tools at any time
2. Servers can notify clients when tools change using `notifications/tools/list_changed`
3. Tools can be added or removed during runtime
4. Tool definitions can be updated (though this should be done carefully)
## Error handling
Tool errors should be reported within the result object, not as MCP protocol-level errors. This allows the LLM to see and potentially handle the error. When a tool encounters an error:
1. Set `isError` to `true` in the result
2. Include error details in the `content` array
Here's an example of proper error handling for tools:
<Tabs>
<Tab title="TypeScript">
```typescript
try {
// Tool operation
const result = performOperation();
return {
content: [
{
type: "text",
text: `Operation successful: ${result}`
}
]
};
} catch (error) {
return {
isError: true,
content: [
{
type: "text",
text: `Error: ${error.message}`
}
]
};
}
```
</Tab>
<Tab title="Python">
```python
try:
# Tool operation
result = perform_operation()
return types.CallToolResult(
content=[
types.TextContent(
type="text",
text=f"Operation successful: {result}"
)
]
)
except Exception as error:
return types.CallToolResult(
isError=True,
content=[
types.TextContent(
type="text",
text=f"Error: {str(error)}"
)
]
)
```
</Tab>
</Tabs>
This approach allows the LLM to see that an error occurred and potentially take corrective action or request human intervention.
## Tool annotations
Tool annotations provide additional metadata about a tool's behavior, helping clients understand how to present and manage tools. These annotations are hints that describe the nature and impact of a tool, but should not be relied upon for security decisions.
### Purpose of tool annotations
Tool annotations serve several key purposes:
1. Provide UX-specific information without affecting model context
2. Help clients categorize and present tools appropriately
3. Convey information about a tool's potential side effects
4. Assist in developing intuitive interfaces for tool approval
### Available tool annotations
The MCP specification defines the following annotations for tools:
| Annotation | Type | Default | Description |
| ----------------- | ------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------ |
| `title` | string | - | A human-readable title for the tool, useful for UI display |
| `readOnlyHint` | boolean | false | If true, indicates the tool does not modify its environment |
| `destructiveHint` | boolean | true | If true, the tool may perform destructive updates (only meaningful when `readOnlyHint` is false) |
| `idempotentHint` | boolean | false | If true, calling the tool repeatedly with the same arguments has no additional effect (only meaningful when `readOnlyHint` is false) |
| `openWorldHint` | boolean | true | If true, the tool may interact with an "open world" of external entities |
### Example usage
Here's how to define tools with annotations for different scenarios:
```typescript
// A read-only search tool
{
name: "web_search",
description: "Search the web for information",
inputSchema: {
type: "object",
properties: {
query: { type: "string" }
},
required: ["query"]
},
annotations: {
title: "Web Search",
readOnlyHint: true,
openWorldHint: true
}
}
// A destructive file deletion tool
{
name: "delete_file",
description: "Delete a file from the filesystem",
inputSchema: {
type: "object",
properties: {
path: { type: "string" }
},
required: ["path"]
},
annotations: {
title: "Delete File",
readOnlyHint: false,
destructiveHint: true,
idempotentHint: true,
openWorldHint: false
}
}
// A non-destructive database record creation tool
{
name: "create_record",
description: "Create a new record in the database",
inputSchema: {
type: "object",
properties: {
table: { type: "string" },
data: { type: "object" }
},
required: ["table", "data"]
},
annotations: {
title: "Create Database Record",
readOnlyHint: false,
destructiveHint: false,
idempotentHint: false,
openWorldHint: false
}
}
```
### Integrating annotations in server implementation
<Tabs>
<Tab title="TypeScript">
```typescript
server.setRequestHandler(ListToolsRequestSchema, async () => {
return {
tools: [{
name: "calculate_sum",
description: "Add two numbers together",
inputSchema: {
type: "object",
properties: {
a: { type: "number" },
b: { type: "number" }
},
required: ["a", "b"]
},
annotations: {
title: "Calculate Sum",
readOnlyHint: true,
openWorldHint: false
}
}]
};
});
```
</Tab>
<Tab title="Python">
```python
from mcp.server.fastmcp import FastMCP
mcp = FastMCP("example-server")
@mcp.tool(
annotations={
"title": "Calculate Sum",
"readOnlyHint": True,
"openWorldHint": False
}
)
async def calculate_sum(a: float, b: float) -> str:
"""Add two numbers together.
Args:
a: First number to add
b: Second number to add
"""
result = a + b
return str(result)
```
</Tab>
</Tabs>
### Best practices for tool annotations
1. **Be accurate about side effects**: Clearly indicate whether a tool modifies its environment and whether those modifications are destructive.
2. **Use descriptive titles**: Provide human-friendly titles that clearly describe the tool's purpose.
3. **Indicate idempotency properly**: Mark tools as idempotent only if repeated calls with the same arguments truly have no additional effect.
4. **Set appropriate open/closed world hints**: Indicate whether a tool interacts with a closed system (like a database) or an open system (like the web).
5. **Remember annotations are hints**: All properties in ToolAnnotations are hints and not guaranteed to provide a faithful description of tool behavior. Clients should never make security-critical decisions based solely on annotations.
## Testing tools
A comprehensive testing strategy for MCP tools should cover:
* **Functional testing**: Verify tools execute correctly with valid inputs and handle invalid inputs appropriately
* **Integration testing**: Test tool interaction with external systems using both real and mocked dependencies
* **Security testing**: Validate authentication, authorization, input sanitization, and rate limiting
* **Performance testing**: Check behavior under load, timeout handling, and resource cleanup
* **Error handling**: Ensure tools properly report errors through the MCP protocol and clean up resources

View File

@ -0,0 +1,916 @@
# Node/TypeScript MCP Server Implementation Guide
## Overview
This document provides Node/TypeScript-specific best practices and examples for implementing MCP servers using the MCP TypeScript SDK. It covers project structure, server setup, tool registration patterns, input validation with Zod, error handling, and complete working examples.
---
## Quick Reference
### Key Imports
```typescript
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";
import axios, { AxiosError } from "axios";
```
### Server Initialization
```typescript
const server = new McpServer({
name: "service-mcp-server",
version: "1.0.0"
});
```
### Tool Registration Pattern
```typescript
server.registerTool("tool_name", {...config}, async (params) => {
// Implementation
});
```
---
## MCP TypeScript SDK
The official MCP TypeScript SDK provides:
- `McpServer` class for server initialization
- `registerTool` method for tool registration
- Zod schema integration for runtime input validation
- Type-safe tool handler implementations
See the MCP SDK documentation in the references for complete details.
## Server Naming Convention
Node/TypeScript MCP servers must follow this naming pattern:
- **Format**: `{service}-mcp-server` (lowercase with hyphens)
- **Examples**: `github-mcp-server`, `jira-mcp-server`, `stripe-mcp-server`
The name should be:
- General (not tied to specific features)
- Descriptive of the service/API being integrated
- Easy to infer from the task description
- Without version numbers or dates
## Project Structure
Create the following structure for Node/TypeScript MCP servers:
```
{service}-mcp-server/
├── package.json
├── tsconfig.json
├── README.md
├── src/
│ ├── index.ts # Main entry point with McpServer initialization
│ ├── types.ts # TypeScript type definitions and interfaces
│ ├── tools/ # Tool implementations (one file per domain)
│ ├── services/ # API clients and shared utilities
│ ├── schemas/ # Zod validation schemas
│ └── constants.ts # Shared constants (API_URL, CHARACTER_LIMIT, etc.)
└── dist/ # Built JavaScript files (entry point: dist/index.js)
```
## Tool Implementation
### Tool Naming
Use snake_case for tool names (e.g., "search_users", "create_project", "get_channel_info") with clear, action-oriented names.
**Avoid Naming Conflicts**: Include the service context to prevent overlaps:
- Use "slack_send_message" instead of just "send_message"
- Use "github_create_issue" instead of just "create_issue"
- Use "asana_list_tasks" instead of just "list_tasks"
### Tool Structure
Tools are registered using the `registerTool` method with the following requirements:
- Use Zod schemas for runtime input validation and type safety
- The `description` field must be explicitly provided - JSDoc comments are NOT automatically extracted
- Explicitly provide `title`, `description`, `inputSchema`, and `annotations`
- The `inputSchema` must be a Zod schema object (not a JSON schema)
- Type all parameters and return values explicitly
```typescript
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { z } from "zod";
const server = new McpServer({
name: "example-mcp",
version: "1.0.0"
});
// Zod schema for input validation
const UserSearchInputSchema = z.object({
query: z.string()
.min(2, "Query must be at least 2 characters")
.max(200, "Query must not exceed 200 characters")
.describe("Search string to match against names/emails"),
limit: z.number()
.int()
.min(1)
.max(100)
.default(20)
.describe("Maximum results to return"),
offset: z.number()
.int()
.min(0)
.default(0)
.describe("Number of results to skip for pagination"),
response_format: z.nativeEnum(ResponseFormat)
.default(ResponseFormat.MARKDOWN)
.describe("Output format: 'markdown' for human-readable or 'json' for machine-readable")
}).strict();
// Type definition from Zod schema
type UserSearchInput = z.infer<typeof UserSearchInputSchema>;
server.registerTool(
"example_search_users",
{
title: "Search Example Users",
description: `Search for users in the Example system by name, email, or team.
This tool searches across all user profiles in the Example platform, supporting partial matches and various search filters. It does NOT create or modify users, only searches existing ones.
Args:
- query (string): Search string to match against names/emails
- limit (number): Maximum results to return, between 1-100 (default: 20)
- offset (number): Number of results to skip for pagination (default: 0)
- response_format ('markdown' | 'json'): Output format (default: 'markdown')
Returns:
For JSON format: Structured data with schema:
{
"total": number, // Total number of matches found
"count": number, // Number of results in this response
"offset": number, // Current pagination offset
"users": [
{
"id": string, // User ID (e.g., "U123456789")
"name": string, // Full name (e.g., "John Doe")
"email": string, // Email address
"team": string, // Team name (optional)
"active": boolean // Whether user is active
}
],
"has_more": boolean, // Whether more results are available
"next_offset": number // Offset for next page (if has_more is true)
}
Examples:
- Use when: "Find all marketing team members" -> params with query="team:marketing"
- Use when: "Search for John's account" -> params with query="john"
- Don't use when: You need to create a user (use example_create_user instead)
Error Handling:
- Returns "Error: Rate limit exceeded" if too many requests (429 status)
- Returns "No users found matching '<query>'" if search returns empty`,
inputSchema: UserSearchInputSchema,
annotations: {
readOnlyHint: true,
destructiveHint: false,
idempotentHint: true,
openWorldHint: true
}
},
async (params: UserSearchInput) => {
try {
// Input validation is handled by Zod schema
// Make API request using validated parameters
const data = await makeApiRequest<any>(
"users/search",
"GET",
undefined,
{
q: params.query,
limit: params.limit,
offset: params.offset
}
);
const users = data.users || [];
const total = data.total || 0;
if (!users.length) {
return {
content: [{
type: "text",
text: `No users found matching '${params.query}'`
}]
};
}
// Format response based on requested format
let result: string;
if (params.response_format === ResponseFormat.MARKDOWN) {
// Human-readable markdown format
const lines: string[] = [`# User Search Results: '${params.query}'`, ""];
lines.push(`Found ${total} users (showing ${users.length})`);
lines.push("");
for (const user of users) {
lines.push(`## ${user.name} (${user.id})`);
lines.push(`- **Email**: ${user.email}`);
if (user.team) {
lines.push(`- **Team**: ${user.team}`);
}
lines.push("");
}
result = lines.join("\n");
} else {
// Machine-readable JSON format
const response: any = {
total,
count: users.length,
offset: params.offset,
users: users.map((user: any) => ({
id: user.id,
name: user.name,
email: user.email,
...(user.team ? { team: user.team } : {}),
active: user.active ?? true
}))
};
// Add pagination info if there are more results
if (total > params.offset + users.length) {
response.has_more = true;
response.next_offset = params.offset + users.length;
}
result = JSON.stringify(response, null, 2);
}
return {
content: [{
type: "text",
text: result
}]
};
} catch (error) {
return {
content: [{
type: "text",
text: handleApiError(error)
}]
};
}
}
);
```
## Zod Schemas for Input Validation
Zod provides runtime type validation:
```typescript
import { z } from "zod";
// Basic schema with validation
const CreateUserSchema = z.object({
name: z.string()
.min(1, "Name is required")
.max(100, "Name must not exceed 100 characters"),
email: z.string()
.email("Invalid email format"),
age: z.number()
.int("Age must be a whole number")
.min(0, "Age cannot be negative")
.max(150, "Age cannot be greater than 150")
}).strict(); // Use .strict() to forbid extra fields
// Enums
enum ResponseFormat {
MARKDOWN = "markdown",
JSON = "json"
}
const SearchSchema = z.object({
response_format: z.nativeEnum(ResponseFormat)
.default(ResponseFormat.MARKDOWN)
.describe("Output format")
});
// Optional fields with defaults
const PaginationSchema = z.object({
limit: z.number()
.int()
.min(1)
.max(100)
.default(20)
.describe("Maximum results to return"),
offset: z.number()
.int()
.min(0)
.default(0)
.describe("Number of results to skip")
});
```
## Response Format Options
Support multiple output formats for flexibility:
```typescript
enum ResponseFormat {
MARKDOWN = "markdown",
JSON = "json"
}
const inputSchema = z.object({
query: z.string(),
response_format: z.nativeEnum(ResponseFormat)
.default(ResponseFormat.MARKDOWN)
.describe("Output format: 'markdown' for human-readable or 'json' for machine-readable")
});
```
**Markdown format**:
- Use headers, lists, and formatting for clarity
- Convert timestamps to human-readable format
- Show display names with IDs in parentheses
- Omit verbose metadata
- Group related information logically
**JSON format**:
- Return complete, structured data suitable for programmatic processing
- Include all available fields and metadata
- Use consistent field names and types
## Pagination Implementation
For tools that list resources:
```typescript
const ListSchema = z.object({
limit: z.number().int().min(1).max(100).default(20),
offset: z.number().int().min(0).default(0)
});
async function listItems(params: z.infer<typeof ListSchema>) {
const data = await apiRequest(params.limit, params.offset);
const response = {
total: data.total,
count: data.items.length,
offset: params.offset,
items: data.items,
has_more: data.total > params.offset + data.items.length,
next_offset: data.total > params.offset + data.items.length
? params.offset + data.items.length
: undefined
};
return JSON.stringify(response, null, 2);
}
```
## Character Limits and Truncation
Add a CHARACTER_LIMIT constant to prevent overwhelming responses:
```typescript
// At module level in constants.ts
export const CHARACTER_LIMIT = 25000; // Maximum response size in characters
async function searchTool(params: SearchInput) {
let result = generateResponse(data);
// Check character limit and truncate if needed
if (result.length > CHARACTER_LIMIT) {
const truncatedData = data.slice(0, Math.max(1, data.length / 2));
response.data = truncatedData;
response.truncated = true;
response.truncation_message =
`Response truncated from ${data.length} to ${truncatedData.length} items. ` +
`Use 'offset' parameter or add filters to see more results.`;
result = JSON.stringify(response, null, 2);
}
return result;
}
```
## Error Handling
Provide clear, actionable error messages:
```typescript
import axios, { AxiosError } from "axios";
function handleApiError(error: unknown): string {
if (error instanceof AxiosError) {
if (error.response) {
switch (error.response.status) {
case 404:
return "Error: Resource not found. Please check the ID is correct.";
case 403:
return "Error: Permission denied. You don't have access to this resource.";
case 429:
return "Error: Rate limit exceeded. Please wait before making more requests.";
default:
return `Error: API request failed with status ${error.response.status}`;
}
} else if (error.code === "ECONNABORTED") {
return "Error: Request timed out. Please try again.";
}
}
return `Error: Unexpected error occurred: ${error instanceof Error ? error.message : String(error)}`;
}
```
## Shared Utilities
Extract common functionality into reusable functions:
```typescript
// Shared API request function
async function makeApiRequest<T>(
endpoint: string,
method: "GET" | "POST" | "PUT" | "DELETE" = "GET",
data?: any,
params?: any
): Promise<T> {
try {
const response = await axios({
method,
url: `${API_BASE_URL}/${endpoint}`,
data,
params,
timeout: 30000,
headers: {
"Content-Type": "application/json",
"Accept": "application/json"
}
});
return response.data;
} catch (error) {
throw error;
}
}
```
## Async/Await Best Practices
Always use async/await for network requests and I/O operations:
```typescript
// Good: Async network request
async function fetchData(resourceId: string): Promise<ResourceData> {
const response = await axios.get(`${API_URL}/resource/${resourceId}`);
return response.data;
}
// Bad: Promise chains
function fetchData(resourceId: string): Promise<ResourceData> {
return axios.get(`${API_URL}/resource/${resourceId}`)
.then(response => response.data); // Harder to read and maintain
}
```
## TypeScript Best Practices
1. **Use Strict TypeScript**: Enable strict mode in tsconfig.json
2. **Define Interfaces**: Create clear interface definitions for all data structures
3. **Avoid `any`**: Use proper types or `unknown` instead of `any`
4. **Zod for Runtime Validation**: Use Zod schemas to validate external data
5. **Type Guards**: Create type guard functions for complex type checking
6. **Error Handling**: Always use try-catch with proper error type checking
7. **Null Safety**: Use optional chaining (`?.`) and nullish coalescing (`??`)
```typescript
// Good: Type-safe with Zod and interfaces
interface UserResponse {
id: string;
name: string;
email: string;
team?: string;
active: boolean;
}
const UserSchema = z.object({
id: z.string(),
name: z.string(),
email: z.string().email(),
team: z.string().optional(),
active: z.boolean()
});
type User = z.infer<typeof UserSchema>;
async function getUser(id: string): Promise<User> {
const data = await apiCall(`/users/${id}`);
return UserSchema.parse(data); // Runtime validation
}
// Bad: Using any
async function getUser(id: string): Promise<any> {
return await apiCall(`/users/${id}`); // No type safety
}
```
## Package Configuration
### package.json
```json
{
"name": "{service}-mcp-server",
"version": "1.0.0",
"description": "MCP server for {Service} API integration",
"type": "module",
"main": "dist/index.js",
"scripts": {
"start": "node dist/index.js",
"dev": "tsx watch src/index.ts",
"build": "tsc",
"clean": "rm -rf dist"
},
"engines": {
"node": ">=18"
},
"dependencies": {
"@modelcontextprotocol/sdk": "^1.6.1",
"axios": "^1.7.9",
"zod": "^3.23.8"
},
"devDependencies": {
"@types/node": "^22.10.0",
"tsx": "^4.19.2",
"typescript": "^5.7.2"
}
}
```
### tsconfig.json
```json
{
"compilerOptions": {
"target": "ES2022",
"module": "Node16",
"moduleResolution": "Node16",
"lib": ["ES2022"],
"outDir": "./dist",
"rootDir": "./src",
"strict": true,
"esModuleInterop": true,
"skipLibCheck": true,
"forceConsistentCasingInFileNames": true,
"declaration": true,
"declarationMap": true,
"sourceMap": true,
"allowSyntheticDefaultImports": true
},
"include": ["src/**/*"],
"exclude": ["node_modules", "dist"]
}
```
## Complete Example
```typescript
#!/usr/bin/env node
/**
* MCP Server for Example Service.
*
* This server provides tools to interact with Example API, including user search,
* project management, and data export capabilities.
*/
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";
import axios, { AxiosError } from "axios";
// Constants
const API_BASE_URL = "https://api.example.com/v1";
const CHARACTER_LIMIT = 25000;
// Enums
enum ResponseFormat {
MARKDOWN = "markdown",
JSON = "json"
}
// Zod schemas
const UserSearchInputSchema = z.object({
query: z.string()
.min(2, "Query must be at least 2 characters")
.max(200, "Query must not exceed 200 characters")
.describe("Search string to match against names/emails"),
limit: z.number()
.int()
.min(1)
.max(100)
.default(20)
.describe("Maximum results to return"),
offset: z.number()
.int()
.min(0)
.default(0)
.describe("Number of results to skip for pagination"),
response_format: z.nativeEnum(ResponseFormat)
.default(ResponseFormat.MARKDOWN)
.describe("Output format: 'markdown' for human-readable or 'json' for machine-readable")
}).strict();
type UserSearchInput = z.infer<typeof UserSearchInputSchema>;
// Shared utility functions
async function makeApiRequest<T>(
endpoint: string,
method: "GET" | "POST" | "PUT" | "DELETE" = "GET",
data?: any,
params?: any
): Promise<T> {
try {
const response = await axios({
method,
url: `${API_BASE_URL}/${endpoint}`,
data,
params,
timeout: 30000,
headers: {
"Content-Type": "application/json",
"Accept": "application/json"
}
});
return response.data;
} catch (error) {
throw error;
}
}
function handleApiError(error: unknown): string {
if (error instanceof AxiosError) {
if (error.response) {
switch (error.response.status) {
case 404:
return "Error: Resource not found. Please check the ID is correct.";
case 403:
return "Error: Permission denied. You don't have access to this resource.";
case 429:
return "Error: Rate limit exceeded. Please wait before making more requests.";
default:
return `Error: API request failed with status ${error.response.status}`;
}
} else if (error.code === "ECONNABORTED") {
return "Error: Request timed out. Please try again.";
}
}
return `Error: Unexpected error occurred: ${error instanceof Error ? error.message : String(error)}`;
}
// Create MCP server instance
const server = new McpServer({
name: "example-mcp",
version: "1.0.0"
});
// Register tools
server.registerTool(
"example_search_users",
{
title: "Search Example Users",
description: `[Full description as shown above]`,
inputSchema: UserSearchInputSchema,
annotations: {
readOnlyHint: true,
destructiveHint: false,
idempotentHint: true,
openWorldHint: true
}
},
async (params: UserSearchInput) => {
// Implementation as shown above
}
);
// Main function
async function main() {
// Verify environment variables if needed
if (!process.env.EXAMPLE_API_KEY) {
console.error("ERROR: EXAMPLE_API_KEY environment variable is required");
process.exit(1);
}
// Create transport
const transport = new StdioServerTransport();
// Connect server to transport
await server.connect(transport);
console.error("Example MCP server running via stdio");
}
// Run the server
main().catch((error) => {
console.error("Server error:", error);
process.exit(1);
});
```
---
## Advanced MCP Features
### Resource Registration
Expose data as resources for efficient, URI-based access:
```typescript
import { ResourceTemplate } from "@modelcontextprotocol/sdk/types.js";
// Register a resource with URI template
server.registerResource(
{
uri: "file://documents/{name}",
name: "Document Resource",
description: "Access documents by name",
mimeType: "text/plain"
},
async (uri: string) => {
// Extract parameter from URI
const match = uri.match(/^file:\/\/documents\/(.+)$/);
if (!match) {
throw new Error("Invalid URI format");
}
const documentName = match[1];
const content = await loadDocument(documentName);
return {
contents: [{
uri,
mimeType: "text/plain",
text: content
}]
};
}
);
// List available resources dynamically
server.registerResourceList(async () => {
const documents = await getAvailableDocuments();
return {
resources: documents.map(doc => ({
uri: `file://documents/${doc.name}`,
name: doc.name,
mimeType: "text/plain",
description: doc.description
}))
};
});
```
**When to use Resources vs Tools:**
- **Resources**: For data access with simple URI-based parameters
- **Tools**: For complex operations requiring validation and business logic
- **Resources**: When data is relatively static or template-based
- **Tools**: When operations have side effects or complex workflows
### Multiple Transport Options
The TypeScript SDK supports different transport mechanisms:
```typescript
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { SSEServerTransport } from "@modelcontextprotocol/sdk/server/sse.js";
// Stdio transport (default - for CLI tools)
const stdioTransport = new StdioServerTransport();
await server.connect(stdioTransport);
// SSE transport (for real-time web updates)
const sseTransport = new SSEServerTransport("/message", response);
await server.connect(sseTransport);
// HTTP transport (for web services)
// Configure based on your HTTP framework integration
```
**Transport selection guide:**
- **Stdio**: Command-line tools, subprocess integration, local development
- **HTTP**: Web services, remote access, multiple simultaneous clients
- **SSE**: Real-time updates, server-push notifications, web dashboards
### Notification Support
Notify clients when server state changes:
```typescript
// Notify when tools list changes
server.notification({
method: "notifications/tools/list_changed"
});
// Notify when resources change
server.notification({
method: "notifications/resources/list_changed"
});
```
Use notifications sparingly - only when server capabilities genuinely change.
---
## Code Best Practices
### Code Composability and Reusability
Your implementation MUST prioritize composability and code reuse:
1. **Extract Common Functionality**:
- Create reusable helper functions for operations used across multiple tools
- Build shared API clients for HTTP requests instead of duplicating code
- Centralize error handling logic in utility functions
- Extract business logic into dedicated functions that can be composed
- Extract shared markdown or JSON field selection & formatting functionality
2. **Avoid Duplication**:
- NEVER copy-paste similar code between tools
- If you find yourself writing similar logic twice, extract it into a function
- Common operations like pagination, filtering, field selection, and formatting should be shared
- Authentication/authorization logic should be centralized
## Building and Running
Always build your TypeScript code before running:
```bash
# Build the project
npm run build
# Run the server
npm start
# Development with auto-reload
npm run dev
```
Always ensure `npm run build` completes successfully before considering the implementation complete.
## Quality Checklist
Before finalizing your Node/TypeScript MCP server implementation, ensure:
### Strategic Design
- [ ] Tools enable complete workflows, not just API endpoint wrappers
- [ ] Tool names reflect natural task subdivisions
- [ ] Response formats optimize for agent context efficiency
- [ ] Human-readable identifiers used where appropriate
- [ ] Error messages guide agents toward correct usage
### Implementation Quality
- [ ] FOCUSED IMPLEMENTATION: Most important and valuable tools implemented
- [ ] All tools registered using `registerTool` with complete configuration
- [ ] All tools include `title`, `description`, `inputSchema`, and `annotations`
- [ ] Annotations correctly set (readOnlyHint, destructiveHint, idempotentHint, openWorldHint)
- [ ] All tools use Zod schemas for runtime input validation with `.strict()` enforcement
- [ ] All Zod schemas have proper constraints and descriptive error messages
- [ ] All tools have comprehensive descriptions with explicit input/output types
- [ ] Descriptions include return value examples and complete schema documentation
- [ ] Error messages are clear, actionable, and educational
### TypeScript Quality
- [ ] TypeScript interfaces are defined for all data structures
- [ ] Strict TypeScript is enabled in tsconfig.json
- [ ] No use of `any` type - use `unknown` or proper types instead
- [ ] All async functions have explicit Promise<T> return types
- [ ] Error handling uses proper type guards (e.g., `axios.isAxiosError`, `z.ZodError`)
### Advanced Features (where applicable)
- [ ] Resources registered for appropriate data endpoints
- [ ] Appropriate transport configured (stdio, HTTP, SSE)
- [ ] Notifications implemented for dynamic server capabilities
- [ ] Type-safe with SDK interfaces
### Project Configuration
- [ ] Package.json includes all necessary dependencies
- [ ] Build script produces working JavaScript in dist/ directory
- [ ] Main entry point is properly configured as dist/index.js
- [ ] Server name follows format: `{service}-mcp-server`
- [ ] tsconfig.json properly configured with strict mode
### Code Quality
- [ ] Pagination is properly implemented where applicable
- [ ] Large responses check CHARACTER_LIMIT constant and truncate with clear messages
- [ ] Filtering options are provided for potentially large result sets
- [ ] All network operations handle timeouts and connection errors gracefully
- [ ] Common functionality is extracted into reusable functions
- [ ] Return types are consistent across similar operations
### Testing and Build
- [ ] `npm run build` completes successfully without errors
- [ ] dist/index.js created and executable
- [ ] Server runs: `node dist/index.js --help`
- [ ] All imports resolve correctly
- [ ] Sample tool calls work as expected

View File

@ -0,0 +1,752 @@
# Python MCP Server Implementation Guide
## Overview
This document provides Python-specific best practices and examples for implementing MCP servers using the MCP Python SDK. It covers server setup, tool registration patterns, input validation with Pydantic, error handling, and complete working examples.
---
## Quick Reference
### Key Imports
```python
from mcp.server.fastmcp import FastMCP
from pydantic import BaseModel, Field, field_validator, ConfigDict
from typing import Optional, List, Dict, Any
from enum import Enum
import httpx
```
### Server Initialization
```python
mcp = FastMCP("service_mcp")
```
### Tool Registration Pattern
```python
@mcp.tool(name="tool_name", annotations={...})
async def tool_function(params: InputModel) -> str:
# Implementation
pass
```
---
## MCP Python SDK and FastMCP
The official MCP Python SDK provides FastMCP, a high-level framework for building MCP servers. It provides:
- Automatic description and inputSchema generation from function signatures and docstrings
- Pydantic model integration for input validation
- Decorator-based tool registration with `@mcp.tool`
**For complete SDK documentation, use WebFetch to load:**
`https://raw.githubusercontent.com/modelcontextprotocol/python-sdk/main/README.md`
## Server Naming Convention
Python MCP servers must follow this naming pattern:
- **Format**: `{service}_mcp` (lowercase with underscores)
- **Examples**: `github_mcp`, `jira_mcp`, `stripe_mcp`
The name should be:
- General (not tied to specific features)
- Descriptive of the service/API being integrated
- Easy to infer from the task description
- Without version numbers or dates
## Tool Implementation
### Tool Naming
Use snake_case for tool names (e.g., "search_users", "create_project", "get_channel_info") with clear, action-oriented names.
**Avoid Naming Conflicts**: Include the service context to prevent overlaps:
- Use "slack_send_message" instead of just "send_message"
- Use "github_create_issue" instead of just "create_issue"
- Use "asana_list_tasks" instead of just "list_tasks"
### Tool Structure with FastMCP
Tools are defined using the `@mcp.tool` decorator with Pydantic models for input validation:
```python
from pydantic import BaseModel, Field, ConfigDict
from mcp.server.fastmcp import FastMCP
# Initialize the MCP server
mcp = FastMCP("example_mcp")
# Define Pydantic model for input validation
class ServiceToolInput(BaseModel):
'''Input model for service tool operation.'''
model_config = ConfigDict(
str_strip_whitespace=True, # Auto-strip whitespace from strings
validate_assignment=True, # Validate on assignment
extra='forbid' # Forbid extra fields
)
param1: str = Field(..., description="First parameter description (e.g., 'user123', 'project-abc')", min_length=1, max_length=100)
param2: Optional[int] = Field(default=None, description="Optional integer parameter with constraints", ge=0, le=1000)
tags: Optional[List[str]] = Field(default_factory=list, description="List of tags to apply", max_items=10)
@mcp.tool(
name="service_tool_name",
annotations={
"title": "Human-Readable Tool Title",
"readOnlyHint": True, # Tool does not modify environment
"destructiveHint": False, # Tool does not perform destructive operations
"idempotentHint": True, # Repeated calls have no additional effect
"openWorldHint": False # Tool does not interact with external entities
}
)
async def service_tool_name(params: ServiceToolInput) -> str:
'''Tool description automatically becomes the 'description' field.
This tool performs a specific operation on the service. It validates all inputs
using the ServiceToolInput Pydantic model before processing.
Args:
params (ServiceToolInput): Validated input parameters containing:
- param1 (str): First parameter description
- param2 (Optional[int]): Optional parameter with default
- tags (Optional[List[str]]): List of tags
Returns:
str: JSON-formatted response containing operation results
'''
# Implementation here
pass
```
## Pydantic v2 Key Features
- Use `model_config` instead of nested `Config` class
- Use `field_validator` instead of deprecated `validator`
- Use `model_dump()` instead of deprecated `dict()`
- Validators require `@classmethod` decorator
- Type hints are required for validator methods
```python
from pydantic import BaseModel, Field, field_validator, ConfigDict
class CreateUserInput(BaseModel):
model_config = ConfigDict(
str_strip_whitespace=True,
validate_assignment=True
)
name: str = Field(..., description="User's full name", min_length=1, max_length=100)
email: str = Field(..., description="User's email address", pattern=r'^[\w\.-]+@[\w\.-]+\.\w+$')
age: int = Field(..., description="User's age", ge=0, le=150)
@field_validator('email')
@classmethod
def validate_email(cls, v: str) -> str:
if not v.strip():
raise ValueError("Email cannot be empty")
return v.lower()
```
## Response Format Options
Support multiple output formats for flexibility:
```python
from enum import Enum
class ResponseFormat(str, Enum):
'''Output format for tool responses.'''
MARKDOWN = "markdown"
JSON = "json"
class UserSearchInput(BaseModel):
query: str = Field(..., description="Search query")
response_format: ResponseFormat = Field(
default=ResponseFormat.MARKDOWN,
description="Output format: 'markdown' for human-readable or 'json' for machine-readable"
)
```
**Markdown format**:
- Use headers, lists, and formatting for clarity
- Convert timestamps to human-readable format (e.g., "2024-01-15 10:30:00 UTC" instead of epoch)
- Show display names with IDs in parentheses (e.g., "@john.doe (U123456)")
- Omit verbose metadata (e.g., show only one profile image URL, not all sizes)
- Group related information logically
**JSON format**:
- Return complete, structured data suitable for programmatic processing
- Include all available fields and metadata
- Use consistent field names and types
## Pagination Implementation
For tools that list resources:
```python
class ListInput(BaseModel):
limit: Optional[int] = Field(default=20, description="Maximum results to return", ge=1, le=100)
offset: Optional[int] = Field(default=0, description="Number of results to skip for pagination", ge=0)
async def list_items(params: ListInput) -> str:
# Make API request with pagination
data = await api_request(limit=params.limit, offset=params.offset)
# Return pagination info
response = {
"total": data["total"],
"count": len(data["items"]),
"offset": params.offset,
"items": data["items"],
"has_more": data["total"] > params.offset + len(data["items"]),
"next_offset": params.offset + len(data["items"]) if data["total"] > params.offset + len(data["items"]) else None
}
return json.dumps(response, indent=2)
```
## Character Limits and Truncation
Add a CHARACTER_LIMIT constant to prevent overwhelming responses:
```python
# At module level
CHARACTER_LIMIT = 25000 # Maximum response size in characters
async def search_tool(params: SearchInput) -> str:
result = generate_response(data)
# Check character limit and truncate if needed
if len(result) > CHARACTER_LIMIT:
# Truncate data and add notice
truncated_data = data[:max(1, len(data) // 2)]
response["data"] = truncated_data
response["truncated"] = True
response["truncation_message"] = (
f"Response truncated from {len(data)} to {len(truncated_data)} items. "
f"Use 'offset' parameter or add filters to see more results."
)
result = json.dumps(response, indent=2)
return result
```
## Error Handling
Provide clear, actionable error messages:
```python
def _handle_api_error(e: Exception) -> str:
'''Consistent error formatting across all tools.'''
if isinstance(e, httpx.HTTPStatusError):
if e.response.status_code == 404:
return "Error: Resource not found. Please check the ID is correct."
elif e.response.status_code == 403:
return "Error: Permission denied. You don't have access to this resource."
elif e.response.status_code == 429:
return "Error: Rate limit exceeded. Please wait before making more requests."
return f"Error: API request failed with status {e.response.status_code}"
elif isinstance(e, httpx.TimeoutException):
return "Error: Request timed out. Please try again."
return f"Error: Unexpected error occurred: {type(e).__name__}"
```
## Shared Utilities
Extract common functionality into reusable functions:
```python
# Shared API request function
async def _make_api_request(endpoint: str, method: str = "GET", **kwargs) -> dict:
'''Reusable function for all API calls.'''
async with httpx.AsyncClient() as client:
response = await client.request(
method,
f"{API_BASE_URL}/{endpoint}",
timeout=30.0,
**kwargs
)
response.raise_for_status()
return response.json()
```
## Async/Await Best Practices
Always use async/await for network requests and I/O operations:
```python
# Good: Async network request
async def fetch_data(resource_id: str) -> dict:
async with httpx.AsyncClient() as client:
response = await client.get(f"{API_URL}/resource/{resource_id}")
response.raise_for_status()
return response.json()
# Bad: Synchronous request
def fetch_data(resource_id: str) -> dict:
response = requests.get(f"{API_URL}/resource/{resource_id}") # Blocks
return response.json()
```
## Type Hints
Use type hints throughout:
```python
from typing import Optional, List, Dict, Any
async def get_user(user_id: str) -> Dict[str, Any]:
data = await fetch_user(user_id)
return {"id": data["id"], "name": data["name"]}
```
## Tool Docstrings
Every tool must have comprehensive docstrings with explicit type information:
```python
async def search_users(params: UserSearchInput) -> str:
'''
Search for users in the Example system by name, email, or team.
This tool searches across all user profiles in the Example platform,
supporting partial matches and various search filters. It does NOT
create or modify users, only searches existing ones.
Args:
params (UserSearchInput): Validated input parameters containing:
- query (str): Search string to match against names/emails (e.g., "john", "@example.com", "team:marketing")
- limit (Optional[int]): Maximum results to return, between 1-100 (default: 20)
- offset (Optional[int]): Number of results to skip for pagination (default: 0)
Returns:
str: JSON-formatted string containing search results with the following schema:
Success response:
{
"total": int, # Total number of matches found
"count": int, # Number of results in this response
"offset": int, # Current pagination offset
"users": [
{
"id": str, # User ID (e.g., "U123456789")
"name": str, # Full name (e.g., "John Doe")
"email": str, # Email address (e.g., "john@example.com")
"team": str # Team name (e.g., "Marketing") - optional
}
]
}
Error response:
"Error: <error message>" or "No users found matching '<query>'"
Examples:
- Use when: "Find all marketing team members" -> params with query="team:marketing"
- Use when: "Search for John's account" -> params with query="john"
- Don't use when: You need to create a user (use example_create_user instead)
- Don't use when: You have a user ID and need full details (use example_get_user instead)
Error Handling:
- Input validation errors are handled by Pydantic model
- Returns "Error: Rate limit exceeded" if too many requests (429 status)
- Returns "Error: Invalid API authentication" if API key is invalid (401 status)
- Returns formatted list of results or "No users found matching 'query'"
'''
```
## Complete Example
See below for a complete Python MCP server example:
```python
#!/usr/bin/env python3
'''
MCP Server for Example Service.
This server provides tools to interact with Example API, including user search,
project management, and data export capabilities.
'''
from typing import Optional, List, Dict, Any
from enum import Enum
import httpx
from pydantic import BaseModel, Field, field_validator, ConfigDict
from mcp.server.fastmcp import FastMCP
# Initialize the MCP server
mcp = FastMCP("example_mcp")
# Constants
API_BASE_URL = "https://api.example.com/v1"
CHARACTER_LIMIT = 25000 # Maximum response size in characters
# Enums
class ResponseFormat(str, Enum):
'''Output format for tool responses.'''
MARKDOWN = "markdown"
JSON = "json"
# Pydantic Models for Input Validation
class UserSearchInput(BaseModel):
'''Input model for user search operations.'''
model_config = ConfigDict(
str_strip_whitespace=True,
validate_assignment=True
)
query: str = Field(..., description="Search string to match against names/emails", min_length=2, max_length=200)
limit: Optional[int] = Field(default=20, description="Maximum results to return", ge=1, le=100)
offset: Optional[int] = Field(default=0, description="Number of results to skip for pagination", ge=0)
response_format: ResponseFormat = Field(default=ResponseFormat.MARKDOWN, description="Output format")
@field_validator('query')
@classmethod
def validate_query(cls, v: str) -> str:
if not v.strip():
raise ValueError("Query cannot be empty or whitespace only")
return v.strip()
# Shared utility functions
async def _make_api_request(endpoint: str, method: str = "GET", **kwargs) -> dict:
'''Reusable function for all API calls.'''
async with httpx.AsyncClient() as client:
response = await client.request(
method,
f"{API_BASE_URL}/{endpoint}",
timeout=30.0,
**kwargs
)
response.raise_for_status()
return response.json()
def _handle_api_error(e: Exception) -> str:
'''Consistent error formatting across all tools.'''
if isinstance(e, httpx.HTTPStatusError):
if e.response.status_code == 404:
return "Error: Resource not found. Please check the ID is correct."
elif e.response.status_code == 403:
return "Error: Permission denied. You don't have access to this resource."
elif e.response.status_code == 429:
return "Error: Rate limit exceeded. Please wait before making more requests."
return f"Error: API request failed with status {e.response.status_code}"
elif isinstance(e, httpx.TimeoutException):
return "Error: Request timed out. Please try again."
return f"Error: Unexpected error occurred: {type(e).__name__}"
# Tool definitions
@mcp.tool(
name="example_search_users",
annotations={
"title": "Search Example Users",
"readOnlyHint": True,
"destructiveHint": False,
"idempotentHint": True,
"openWorldHint": True
}
)
async def example_search_users(params: UserSearchInput) -> str:
'''Search for users in the Example system by name, email, or team.
[Full docstring as shown above]
'''
try:
# Make API request using validated parameters
data = await _make_api_request(
"users/search",
params={
"q": params.query,
"limit": params.limit,
"offset": params.offset
}
)
users = data.get("users", [])
total = data.get("total", 0)
if not users:
return f"No users found matching '{params.query}'"
# Format response based on requested format
if params.response_format == ResponseFormat.MARKDOWN:
lines = [f"# User Search Results: '{params.query}'", ""]
lines.append(f"Found {total} users (showing {len(users)})")
lines.append("")
for user in users:
lines.append(f"## {user['name']} ({user['id']})")
lines.append(f"- **Email**: {user['email']}")
if user.get('team'):
lines.append(f"- **Team**: {user['team']}")
lines.append("")
return "\n".join(lines)
else:
# Machine-readable JSON format
import json
response = {
"total": total,
"count": len(users),
"offset": params.offset,
"users": users
}
return json.dumps(response, indent=2)
except Exception as e:
return _handle_api_error(e)
if __name__ == "__main__":
mcp.run()
```
---
## Advanced FastMCP Features
### Context Parameter Injection
FastMCP can automatically inject a `Context` parameter into tools for advanced capabilities like logging, progress reporting, resource reading, and user interaction:
```python
from mcp.server.fastmcp import FastMCP, Context
mcp = FastMCP("example_mcp")
@mcp.tool()
async def advanced_search(query: str, ctx: Context) -> str:
'''Advanced tool with context access for logging and progress.'''
# Report progress for long operations
await ctx.report_progress(0.25, "Starting search...")
# Log information for debugging
await ctx.log_info("Processing query", {"query": query, "timestamp": datetime.now()})
# Perform search
results = await search_api(query)
await ctx.report_progress(0.75, "Formatting results...")
# Access server configuration
server_name = ctx.fastmcp.name
return format_results(results)
@mcp.tool()
async def interactive_tool(resource_id: str, ctx: Context) -> str:
'''Tool that can request additional input from users.'''
# Request sensitive information when needed
api_key = await ctx.elicit(
prompt="Please provide your API key:",
input_type="password"
)
# Use the provided key
return await api_call(resource_id, api_key)
```
**Context capabilities:**
- `ctx.report_progress(progress, message)` - Report progress for long operations
- `ctx.log_info(message, data)` / `ctx.log_error()` / `ctx.log_debug()` - Logging
- `ctx.elicit(prompt, input_type)` - Request input from users
- `ctx.fastmcp.name` - Access server configuration
- `ctx.read_resource(uri)` - Read MCP resources
### Resource Registration
Expose data as resources for efficient, template-based access:
```python
@mcp.resource("file://documents/{name}")
async def get_document(name: str) -> str:
'''Expose documents as MCP resources.
Resources are useful for static or semi-static data that doesn't
require complex parameters. They use URI templates for flexible access.
'''
document_path = f"./docs/{name}"
with open(document_path, "r") as f:
return f.read()
@mcp.resource("config://settings/{key}")
async def get_setting(key: str, ctx: Context) -> str:
'''Expose configuration as resources with context.'''
settings = await load_settings()
return json.dumps(settings.get(key, {}))
```
**When to use Resources vs Tools:**
- **Resources**: For data access with simple parameters (URI templates)
- **Tools**: For complex operations with validation and business logic
### Structured Output Types
FastMCP supports multiple return types beyond strings:
```python
from typing import TypedDict
from dataclasses import dataclass
from pydantic import BaseModel
# TypedDict for structured returns
class UserData(TypedDict):
id: str
name: str
email: str
@mcp.tool()
async def get_user_typed(user_id: str) -> UserData:
'''Returns structured data - FastMCP handles serialization.'''
return {"id": user_id, "name": "John Doe", "email": "john@example.com"}
# Pydantic models for complex validation
class DetailedUser(BaseModel):
id: str
name: str
email: str
created_at: datetime
metadata: Dict[str, Any]
@mcp.tool()
async def get_user_detailed(user_id: str) -> DetailedUser:
'''Returns Pydantic model - automatically generates schema.'''
user = await fetch_user(user_id)
return DetailedUser(**user)
```
### Lifespan Management
Initialize resources that persist across requests:
```python
from contextlib import asynccontextmanager
@asynccontextmanager
async def app_lifespan():
'''Manage resources that live for the server's lifetime.'''
# Initialize connections, load config, etc.
db = await connect_to_database()
config = load_configuration()
# Make available to all tools
yield {"db": db, "config": config}
# Cleanup on shutdown
await db.close()
mcp = FastMCP("example_mcp", lifespan=app_lifespan)
@mcp.tool()
async def query_data(query: str, ctx: Context) -> str:
'''Access lifespan resources through context.'''
db = ctx.request_context.lifespan_state["db"]
results = await db.query(query)
return format_results(results)
```
### Multiple Transport Options
FastMCP supports different transport mechanisms:
```python
# Default: Stdio transport (for CLI tools)
if __name__ == "__main__":
mcp.run()
# HTTP transport (for web services)
if __name__ == "__main__":
mcp.run(transport="streamable_http", port=8000)
# SSE transport (for real-time updates)
if __name__ == "__main__":
mcp.run(transport="sse", port=8000)
```
**Transport selection:**
- **Stdio**: Command-line tools, subprocess integration
- **HTTP**: Web services, remote access, multiple clients
- **SSE**: Real-time updates, push notifications
---
## Code Best Practices
### Code Composability and Reusability
Your implementation MUST prioritize composability and code reuse:
1. **Extract Common Functionality**:
- Create reusable helper functions for operations used across multiple tools
- Build shared API clients for HTTP requests instead of duplicating code
- Centralize error handling logic in utility functions
- Extract business logic into dedicated functions that can be composed
- Extract shared markdown or JSON field selection & formatting functionality
2. **Avoid Duplication**:
- NEVER copy-paste similar code between tools
- If you find yourself writing similar logic twice, extract it into a function
- Common operations like pagination, filtering, field selection, and formatting should be shared
- Authentication/authorization logic should be centralized
### Python-Specific Best Practices
1. **Use Type Hints**: Always include type annotations for function parameters and return values
2. **Pydantic Models**: Define clear Pydantic models for all input validation
3. **Avoid Manual Validation**: Let Pydantic handle input validation with constraints
4. **Proper Imports**: Group imports (standard library, third-party, local)
5. **Error Handling**: Use specific exception types (httpx.HTTPStatusError, not generic Exception)
6. **Async Context Managers**: Use `async with` for resources that need cleanup
7. **Constants**: Define module-level constants in UPPER_CASE
## Quality Checklist
Before finalizing your Python MCP server implementation, ensure:
### Strategic Design
- [ ] Tools enable complete workflows, not just API endpoint wrappers
- [ ] Tool names reflect natural task subdivisions
- [ ] Response formats optimize for agent context efficiency
- [ ] Human-readable identifiers used where appropriate
- [ ] Error messages guide agents toward correct usage
### Implementation Quality
- [ ] FOCUSED IMPLEMENTATION: Most important and valuable tools implemented
- [ ] All tools have descriptive names and documentation
- [ ] Return types are consistent across similar operations
- [ ] Error handling is implemented for all external calls
- [ ] Server name follows format: `{service}_mcp`
- [ ] All network operations use async/await
- [ ] Common functionality is extracted into reusable functions
- [ ] Error messages are clear, actionable, and educational
- [ ] Outputs are properly validated and formatted
### Tool Configuration
- [ ] All tools implement 'name' and 'annotations' in the decorator
- [ ] Annotations correctly set (readOnlyHint, destructiveHint, idempotentHint, openWorldHint)
- [ ] All tools use Pydantic BaseModel for input validation with Field() definitions
- [ ] All Pydantic Fields have explicit types and descriptions with constraints
- [ ] All tools have comprehensive docstrings with explicit input/output types
- [ ] Docstrings include complete schema structure for dict/JSON returns
- [ ] Pydantic models handle input validation (no manual validation needed)
### Advanced Features (where applicable)
- [ ] Context injection used for logging, progress, or elicitation
- [ ] Resources registered for appropriate data endpoints
- [ ] Lifespan management implemented for persistent connections
- [ ] Structured output types used (TypedDict, Pydantic models)
- [ ] Appropriate transport configured (stdio, HTTP, SSE)
### Code Quality
- [ ] File includes proper imports including Pydantic imports
- [ ] Pagination is properly implemented where applicable
- [ ] Large responses check CHARACTER_LIMIT and truncate with clear messages
- [ ] Filtering options are provided for potentially large result sets
- [ ] All async functions are properly defined with `async def`
- [ ] HTTP client usage follows async patterns with proper context managers
- [ ] Type hints are used throughout the code
- [ ] Constants are defined at module level in UPPER_CASE
### Testing
- [ ] Server runs successfully: `python your_server.py --help`
- [ ] All imports resolve correctly
- [ ] Sample tool calls work as expected
- [ ] Error scenarios handled gracefully

View File

@ -0,0 +1,151 @@
"""Lightweight connection handling for MCP servers."""
from abc import ABC, abstractmethod
from contextlib import AsyncExitStack
from typing import Any
from mcp import ClientSession, StdioServerParameters
from mcp.client.sse import sse_client
from mcp.client.stdio import stdio_client
from mcp.client.streamable_http import streamablehttp_client
class MCPConnection(ABC):
"""Base class for MCP server connections."""
def __init__(self):
self.session = None
self._stack = None
@abstractmethod
def _create_context(self):
"""Create the connection context based on connection type."""
async def __aenter__(self):
"""Initialize MCP server connection."""
self._stack = AsyncExitStack()
await self._stack.__aenter__()
try:
ctx = self._create_context()
result = await self._stack.enter_async_context(ctx)
if len(result) == 2:
read, write = result
elif len(result) == 3:
read, write, _ = result
else:
raise ValueError(f"Unexpected context result: {result}")
session_ctx = ClientSession(read, write)
self.session = await self._stack.enter_async_context(session_ctx)
await self.session.initialize()
return self
except BaseException:
await self._stack.__aexit__(None, None, None)
raise
async def __aexit__(self, exc_type, exc_val, exc_tb):
"""Clean up MCP server connection resources."""
if self._stack:
await self._stack.__aexit__(exc_type, exc_val, exc_tb)
self.session = None
self._stack = None
async def list_tools(self) -> list[dict[str, Any]]:
"""Retrieve available tools from the MCP server."""
response = await self.session.list_tools()
return [
{
"name": tool.name,
"description": tool.description,
"input_schema": tool.inputSchema,
}
for tool in response.tools
]
async def call_tool(self, tool_name: str, arguments: dict[str, Any]) -> Any:
"""Call a tool on the MCP server with provided arguments."""
result = await self.session.call_tool(tool_name, arguments=arguments)
return result.content
class MCPConnectionStdio(MCPConnection):
"""MCP connection using standard input/output."""
def __init__(self, command: str, args: list[str] = None, env: dict[str, str] = None):
super().__init__()
self.command = command
self.args = args or []
self.env = env
def _create_context(self):
return stdio_client(
StdioServerParameters(command=self.command, args=self.args, env=self.env)
)
class MCPConnectionSSE(MCPConnection):
"""MCP connection using Server-Sent Events."""
def __init__(self, url: str, headers: dict[str, str] = None):
super().__init__()
self.url = url
self.headers = headers or {}
def _create_context(self):
return sse_client(url=self.url, headers=self.headers)
class MCPConnectionHTTP(MCPConnection):
"""MCP connection using Streamable HTTP."""
def __init__(self, url: str, headers: dict[str, str] = None):
super().__init__()
self.url = url
self.headers = headers or {}
def _create_context(self):
return streamablehttp_client(url=self.url, headers=self.headers)
def create_connection(
transport: str,
command: str = None,
args: list[str] = None,
env: dict[str, str] = None,
url: str = None,
headers: dict[str, str] = None,
) -> MCPConnection:
"""Factory function to create the appropriate MCP connection.
Args:
transport: Connection type ("stdio", "sse", or "http")
command: Command to run (stdio only)
args: Command arguments (stdio only)
env: Environment variables (stdio only)
url: Server URL (sse and http only)
headers: HTTP headers (sse and http only)
Returns:
MCPConnection instance
"""
transport = transport.lower()
if transport == "stdio":
if not command:
raise ValueError("Command is required for stdio transport")
return MCPConnectionStdio(command=command, args=args, env=env)
elif transport == "sse":
if not url:
raise ValueError("URL is required for sse transport")
return MCPConnectionSSE(url=url, headers=headers)
elif transport in ["http", "streamable_http", "streamable-http"]:
if not url:
raise ValueError("URL is required for http transport")
return MCPConnectionHTTP(url=url, headers=headers)
else:
raise ValueError(f"Unsupported transport type: {transport}. Use 'stdio', 'sse', or 'http'")

View File

@ -0,0 +1,373 @@
"""MCP Server Evaluation Harness
This script evaluates MCP servers by running test questions against them using Claude.
"""
import argparse
import asyncio
import json
import re
import sys
import time
import traceback
import xml.etree.ElementTree as ET
from pathlib import Path
from typing import Any
from anthropic import Anthropic
from connections import create_connection
EVALUATION_PROMPT = """You are an AI assistant with access to tools.
When given a task, you MUST:
1. Use the available tools to complete the task
2. Provide summary of each step in your approach, wrapped in <summary> tags
3. Provide feedback on the tools provided, wrapped in <feedback> tags
4. Provide your final response, wrapped in <response> tags
Summary Requirements:
- In your <summary> tags, you must explain:
- The steps you took to complete the task
- Which tools you used, in what order, and why
- The inputs you provided to each tool
- The outputs you received from each tool
- A summary for how you arrived at the response
Feedback Requirements:
- In your <feedback> tags, provide constructive feedback on the tools:
- Comment on tool names: Are they clear and descriptive?
- Comment on input parameters: Are they well-documented? Are required vs optional parameters clear?
- Comment on descriptions: Do they accurately describe what the tool does?
- Comment on any errors encountered during tool usage: Did the tool fail to execute? Did the tool return too many tokens?
- Identify specific areas for improvement and explain WHY they would help
- Be specific and actionable in your suggestions
Response Requirements:
- Your response should be concise and directly address what was asked
- Always wrap your final response in <response> tags
- If you cannot solve the task return <response>NOT_FOUND</response>
- For numeric responses, provide just the number
- For IDs, provide just the ID
- For names or text, provide the exact text requested
- Your response should go last"""
def parse_evaluation_file(file_path: Path) -> list[dict[str, Any]]:
"""Parse XML evaluation file with qa_pair elements."""
try:
tree = ET.parse(file_path)
root = tree.getroot()
evaluations = []
for qa_pair in root.findall(".//qa_pair"):
question_elem = qa_pair.find("question")
answer_elem = qa_pair.find("answer")
if question_elem is not None and answer_elem is not None:
evaluations.append({
"question": (question_elem.text or "").strip(),
"answer": (answer_elem.text or "").strip(),
})
return evaluations
except Exception as e:
print(f"Error parsing evaluation file {file_path}: {e}")
return []
def extract_xml_content(text: str, tag: str) -> str | None:
"""Extract content from XML tags."""
pattern = rf"<{tag}>(.*?)</{tag}>"
matches = re.findall(pattern, text, re.DOTALL)
return matches[-1].strip() if matches else None
async def agent_loop(
client: Anthropic,
model: str,
question: str,
tools: list[dict[str, Any]],
connection: Any,
) -> tuple[str, dict[str, Any]]:
"""Run the agent loop with MCP tools."""
messages = [{"role": "user", "content": question}]
response = await asyncio.to_thread(
client.messages.create,
model=model,
max_tokens=4096,
system=EVALUATION_PROMPT,
messages=messages,
tools=tools,
)
messages.append({"role": "assistant", "content": response.content})
tool_metrics = {}
while response.stop_reason == "tool_use":
tool_use = next(block for block in response.content if block.type == "tool_use")
tool_name = tool_use.name
tool_input = tool_use.input
tool_start_ts = time.time()
try:
tool_result = await connection.call_tool(tool_name, tool_input)
tool_response = json.dumps(tool_result) if isinstance(tool_result, (dict, list)) else str(tool_result)
except Exception as e:
tool_response = f"Error executing tool {tool_name}: {str(e)}\n"
tool_response += traceback.format_exc()
tool_duration = time.time() - tool_start_ts
if tool_name not in tool_metrics:
tool_metrics[tool_name] = {"count": 0, "durations": []}
tool_metrics[tool_name]["count"] += 1
tool_metrics[tool_name]["durations"].append(tool_duration)
messages.append({
"role": "user",
"content": [{
"type": "tool_result",
"tool_use_id": tool_use.id,
"content": tool_response,
}]
})
response = await asyncio.to_thread(
client.messages.create,
model=model,
max_tokens=4096,
system=EVALUATION_PROMPT,
messages=messages,
tools=tools,
)
messages.append({"role": "assistant", "content": response.content})
response_text = next(
(block.text for block in response.content if hasattr(block, "text")),
None,
)
return response_text, tool_metrics
async def evaluate_single_task(
client: Anthropic,
model: str,
qa_pair: dict[str, Any],
tools: list[dict[str, Any]],
connection: Any,
task_index: int,
) -> dict[str, Any]:
"""Evaluate a single QA pair with the given tools."""
start_time = time.time()
print(f"Task {task_index + 1}: Running task with question: {qa_pair['question']}")
response, tool_metrics = await agent_loop(client, model, qa_pair["question"], tools, connection)
response_value = extract_xml_content(response, "response")
summary = extract_xml_content(response, "summary")
feedback = extract_xml_content(response, "feedback")
duration_seconds = time.time() - start_time
return {
"question": qa_pair["question"],
"expected": qa_pair["answer"],
"actual": response_value,
"score": int(response_value == qa_pair["answer"]) if response_value else 0,
"total_duration": duration_seconds,
"tool_calls": tool_metrics,
"num_tool_calls": sum(len(metrics["durations"]) for metrics in tool_metrics.values()),
"summary": summary,
"feedback": feedback,
}
REPORT_HEADER = """
# Evaluation Report
## Summary
- **Accuracy**: {correct}/{total} ({accuracy:.1f}%)
- **Average Task Duration**: {average_duration_s:.2f}s
- **Average Tool Calls per Task**: {average_tool_calls:.2f}
- **Total Tool Calls**: {total_tool_calls}
---
"""
TASK_TEMPLATE = """
### Task {task_num}
**Question**: {question}
**Ground Truth Answer**: `{expected_answer}`
**Actual Answer**: `{actual_answer}`
**Correct**: {correct_indicator}
**Duration**: {total_duration:.2f}s
**Tool Calls**: {tool_calls}
**Summary**
{summary}
**Feedback**
{feedback}
---
"""
async def run_evaluation(
eval_path: Path,
connection: Any,
model: str = "claude-3-7-sonnet-20250219",
) -> str:
"""Run evaluation with MCP server tools."""
print("🚀 Starting Evaluation")
client = Anthropic()
tools = await connection.list_tools()
print(f"📋 Loaded {len(tools)} tools from MCP server")
qa_pairs = parse_evaluation_file(eval_path)
print(f"📋 Loaded {len(qa_pairs)} evaluation tasks")
results = []
for i, qa_pair in enumerate(qa_pairs):
print(f"Processing task {i + 1}/{len(qa_pairs)}")
result = await evaluate_single_task(client, model, qa_pair, tools, connection, i)
results.append(result)
correct = sum(r["score"] for r in results)
accuracy = (correct / len(results)) * 100 if results else 0
average_duration_s = sum(r["total_duration"] for r in results) / len(results) if results else 0
average_tool_calls = sum(r["num_tool_calls"] for r in results) / len(results) if results else 0
total_tool_calls = sum(r["num_tool_calls"] for r in results)
report = REPORT_HEADER.format(
correct=correct,
total=len(results),
accuracy=accuracy,
average_duration_s=average_duration_s,
average_tool_calls=average_tool_calls,
total_tool_calls=total_tool_calls,
)
report += "".join([
TASK_TEMPLATE.format(
task_num=i + 1,
question=qa_pair["question"],
expected_answer=qa_pair["answer"],
actual_answer=result["actual"] or "N/A",
correct_indicator="" if result["score"] else "",
total_duration=result["total_duration"],
tool_calls=json.dumps(result["tool_calls"], indent=2),
summary=result["summary"] or "N/A",
feedback=result["feedback"] or "N/A",
)
for i, (qa_pair, result) in enumerate(zip(qa_pairs, results))
])
return report
def parse_headers(header_list: list[str]) -> dict[str, str]:
"""Parse header strings in format 'Key: Value' into a dictionary."""
headers = {}
if not header_list:
return headers
for header in header_list:
if ":" in header:
key, value = header.split(":", 1)
headers[key.strip()] = value.strip()
else:
print(f"Warning: Ignoring malformed header: {header}")
return headers
def parse_env_vars(env_list: list[str]) -> dict[str, str]:
"""Parse environment variable strings in format 'KEY=VALUE' into a dictionary."""
env = {}
if not env_list:
return env
for env_var in env_list:
if "=" in env_var:
key, value = env_var.split("=", 1)
env[key.strip()] = value.strip()
else:
print(f"Warning: Ignoring malformed environment variable: {env_var}")
return env
async def main():
parser = argparse.ArgumentParser(
description="Evaluate MCP servers using test questions",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
# Evaluate a local stdio MCP server
python evaluation.py -t stdio -c python -a my_server.py eval.xml
# Evaluate an SSE MCP server
python evaluation.py -t sse -u https://example.com/mcp -H "Authorization: Bearer token" eval.xml
# Evaluate an HTTP MCP server with custom model
python evaluation.py -t http -u https://example.com/mcp -m claude-3-5-sonnet-20241022 eval.xml
""",
)
parser.add_argument("eval_file", type=Path, help="Path to evaluation XML file")
parser.add_argument("-t", "--transport", choices=["stdio", "sse", "http"], default="stdio", help="Transport type (default: stdio)")
parser.add_argument("-m", "--model", default="claude-3-7-sonnet-20250219", help="Claude model to use (default: claude-3-7-sonnet-20250219)")
stdio_group = parser.add_argument_group("stdio options")
stdio_group.add_argument("-c", "--command", help="Command to run MCP server (stdio only)")
stdio_group.add_argument("-a", "--args", nargs="+", help="Arguments for the command (stdio only)")
stdio_group.add_argument("-e", "--env", nargs="+", help="Environment variables in KEY=VALUE format (stdio only)")
remote_group = parser.add_argument_group("sse/http options")
remote_group.add_argument("-u", "--url", help="MCP server URL (sse/http only)")
remote_group.add_argument("-H", "--header", nargs="+", dest="headers", help="HTTP headers in 'Key: Value' format (sse/http only)")
parser.add_argument("-o", "--output", type=Path, help="Output file for evaluation report (default: stdout)")
args = parser.parse_args()
if not args.eval_file.exists():
print(f"Error: Evaluation file not found: {args.eval_file}")
sys.exit(1)
headers = parse_headers(args.headers) if args.headers else None
env_vars = parse_env_vars(args.env) if args.env else None
try:
connection = create_connection(
transport=args.transport,
command=args.command,
args=args.args,
env=env_vars,
url=args.url,
headers=headers,
)
except ValueError as e:
print(f"Error: {e}")
sys.exit(1)
print(f"🔗 Connecting to MCP server via {args.transport}...")
async with connection:
print("✅ Connected successfully")
report = await run_evaluation(args.eval_file, connection, args.model)
if args.output:
args.output.write_text(report)
print(f"\n✅ Report saved to {args.output}")
else:
print("\n" + report)
if __name__ == "__main__":
asyncio.run(main())

View File

@ -0,0 +1,22 @@
<evaluation>
<qa_pair>
<question>Calculate the compound interest on $10,000 invested at 5% annual interest rate, compounded monthly for 3 years. What is the final amount in dollars (rounded to 2 decimal places)?</question>
<answer>11614.72</answer>
</qa_pair>
<qa_pair>
<question>A projectile is launched at a 45-degree angle with an initial velocity of 50 m/s. Calculate the total distance (in meters) it has traveled from the launch point after 2 seconds, assuming g=9.8 m/s². Round to 2 decimal places.</question>
<answer>87.25</answer>
</qa_pair>
<qa_pair>
<question>A sphere has a volume of 500 cubic meters. Calculate its surface area in square meters. Round to 2 decimal places.</question>
<answer>304.65</answer>
</qa_pair>
<qa_pair>
<question>Calculate the population standard deviation of this dataset: [12, 15, 18, 22, 25, 30, 35]. Round to 2 decimal places.</question>
<answer>7.61</answer>
</qa_pair>
<qa_pair>
<question>Calculate the pH of a solution with a hydrogen ion concentration of 3.5 × 10^-5 M. Round to 2 decimal places.</question>
<answer>4.46</answer>
</qa_pair>
</evaluation>

View File

@ -0,0 +1,2 @@
anthropic>=0.39.0
mcp>=1.1.0

6
.gitignore vendored
View File

@ -28,9 +28,11 @@ node_modules
lib
test-dist
docs
# Claude Code
.claude/
CLAUDE.md
# WebStorm
.idea/
plan/
dist/

250
CHANGELOG.md Normal file
View File

@ -0,0 +1,250 @@
# Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [Unreleased]
---
## [0.8.4] - 2026-01-04
### Fixed
- **Critical**: Added missing process error handler in `_spawnPromise()` to prevent server hang when yt-dlp is not installed or fails to spawn ([#23](https://github.com/kevinwatt/yt-dlp-mcp/issues/23))
- **Critical**: Fixed stdout/stderr mixing in `_spawnPromise()` that caused yt-dlp warnings to corrupt parsed output
- Fixed VERSION constant mismatch (was `0.7.0`, now synced with package.json)
- Added try-catch for RegExp construction from `YTDLP_SANITIZE_ILLEGAL_CHARS` env var to prevent startup crash on invalid regex
- Added validation for `YTDLP_MAX_FILENAME_LENGTH` env var to handle NaN values gracefully
- Fixed test expectations for search output format and metadata `creators` field null handling
### Changed
- **Documentation**: Added warning about JavaScript runtime (deno) requirement when using cookie authentication
- YouTube authenticated API endpoints require JS challenge solving
- Without deno, downloads will fail with "n challenge solving failed" error
- **Documentation**: Added version sync guidance to CLAUDE.md (package.json + src/index.mts)
---
## [0.8.3] - 2025-12-25
### Added
- **Video Comments Extraction**: New tools for extracting YouTube video comments
- `ytdlp_get_video_comments`: Extract comments in structured JSON format with author info, likes, timestamps, and reply threading
- `ytdlp_get_video_comments_summary`: Get human-readable summary of top comments
- Supports sorting by "top" (most liked) or "new" (newest first)
- Configurable comment limit (1-100 comments)
- Includes author verification status, pinned comments, and uploader replies
- Comprehensive test suite for comments functionality
- **Upload Date Filter**: New `uploadDateFilter` parameter for `ytdlp_search_videos` tool ([#21](https://github.com/kevinwatt/yt-dlp-mcp/issues/21))
- Filter search results by upload date: `hour`, `today`, `week`, `month`, `year`
- Uses YouTube's native date filtering for efficient searches
- Optional parameter - defaults to no filtering (all dates)
### Changed
- Add Claude Code settings (.claude/, CLAUDE.md) to .gitignore
- Add development guideline to always update CHANGELOG.md
- Move integration test scripts to `tests/` directory for cleaner root
- Comments integration tests are now opt-in via `RUN_INTEGRATION_TESTS=1` env var for CI stability
### Fixed
- Fixed `validateUrl()` return value not being checked in `audio.ts`, `metadata.ts`, and `video.ts`
- Fixed comments test Python environment handling (use `delete` instead of empty string assignment)
- Fixed regex null coalescing in comments test for author matching
---
## [0.7.0] - 2025-10-19
### 🎉 Major Release - MCP Best Practices & Quality Improvements
This release represents a significant upgrade with comprehensive MCP best practices implementation, following the official MCP server development guidelines.
### ✨ Added
#### Tool Naming & Organization
- **Tool Name Prefixes**: All tools now have `ytdlp_` prefix to avoid naming conflicts with other MCP servers
- `search_videos``ytdlp_search_videos`
- `download_video``ytdlp_download_video`
- `get_video_metadata``ytdlp_get_video_metadata`
- And all other tools similarly prefixed
#### Input Validation
- **Zod Schema Validation**: Implemented runtime input validation for all 8 tools
- URL validation with proper format checking
- String length constraints (min/max)
- Number range validation
- Regex patterns for language codes and time formats
- Enum validation for resolution and format options
- `.strict()` mode to prevent unexpected fields
#### Tool Annotations
- **MCP Tool Hints**: Added comprehensive annotations to all tools
- `readOnlyHint: true` for read-only operations (search, list, get)
- `readOnlyHint: false` for file-creating operations (downloads)
- `destructiveHint: false` for all tools (no destructive updates)
- `idempotentHint: true/false` based on operation type
- `openWorldHint: true` for all tools (external API interactions)
#### Response Formats
- **Flexible Output Formats**: Added `response_format` parameter to search tools
- JSON format: Structured data for programmatic processing
- Markdown format: Human-readable display (default)
- Both formats include pagination metadata
#### Pagination Support
- **Search Pagination**: Added offset parameter to `ytdlp_search_videos`
- `offset` parameter for skipping results
- `has_more` indicator in responses
- `next_offset` for easy pagination
- Works with both JSON and Markdown formats
#### Character Limits & Truncation
- **Response Size Protection**: Implemented character limits to prevent context overflow
- Standard limit: 25,000 characters
- Transcript limit: 50,000 characters (larger for text content)
- Automatic truncation with clear messages
- Smart truncation that preserves JSON validity
#### Error Messages
- **Actionable Error Guidance**: Improved error messages across all modules
- Platform-specific errors (Unsupported URL, Video unavailable, etc.)
- Network error guidance with retry suggestions
- Language availability hints (e.g., "Use ytdlp_list_subtitle_languages to check options")
- Rate limit handling with wait time suggestions
### 🔧 Improved
#### Tool Descriptions
- **Comprehensive Documentation**: Enhanced all tool descriptions with:
- Clear purpose statements
- Detailed parameter descriptions with examples
- Complete return value schemas
- "Use when" / "Don't use when" guidance
- Error handling documentation
- Example use cases
#### Configuration
- **Enhanced Config System**: Added new configuration options
- `limits.characterLimit`: Maximum response size (25,000)
- `limits.maxTranscriptLength`: Maximum transcript size (50,000)
- Environment variable support for all settings
#### Code Quality
- **Better Type Safety**: Improved TypeScript types throughout
- Proper type definitions for metadata with truncation fields
- Explicit Promise return types
- Better error type handling
### 🐛 Fixed
- **JSON Parsing Issue**: Fixed metadata truncation that was breaking JSON format
- Truncation messages now inside JSON objects instead of appended
- Prevents "Unexpected non-whitespace character" errors
- Maintains valid JSON structure even when truncated
### 🧪 Testing
- **Real-World Validation**: Comprehensive testing with actual videos
- ✅ YouTube platform fully tested (Rick Astley - Never Gonna Give You Up)
- ✅ Bilibili platform fully tested (Chinese content)
- ✅ Multi-language support verified (English, Chinese)
- ✅ All 8 tools tested with real API calls
- ✅ MCP protocol compatibility verified
### 📖 Documentation
- **Enhanced README**: Completely redesigned README.md with:
- Professional badges and visual formatting
- Comprehensive feature tables
- Detailed tool documentation
- Usage examples by category
- Configuration guide
- Architecture overview
- Multi-language support demonstration
### 🌍 Platform Support
- **Verified Platforms**:
- ✅ YouTube (fully tested)
- ✅ Bilibili (哔哩哔哩) (fully tested)
- 🎯 1000+ other platforms supported via yt-dlp
### 📊 Statistics
- 8 tools with complete Zod validation
- 8 tools with proper annotations
- 8 tools with comprehensive descriptions
- 2 platforms tested and verified
- 5/5 YouTube tests passing
- 3/3 Bilibili tests passing
- 0 critical bugs remaining
### 🔄 Migration Guide
If upgrading from 0.6.x:
1. **Tool Names**: Update all tool names to include `ytdlp_` prefix
```diff
- "search_videos"
+ "ytdlp_search_videos"
```
2. **Search Parameters**: New optional parameters available
```javascript
{
query: "tutorial",
maxResults: 10,
offset: 0, // NEW: pagination support
response_format: "json" // NEW: format control
}
```
3. **Error Handling**: Error messages are more descriptive now
- Update any error parsing logic to handle new formats
### 🙏 Acknowledgments
This release follows the [MCP Server Development Best Practices](https://modelcontextprotocol.io) and incorporates feedback from the MCP community.
---
## [0.6.28] - 2025-08-13
### Added
- Video metadata extraction with `get_video_metadata` and `get_video_metadata_summary`
- Comprehensive test suite
- API documentation
### Changed
- Improved metadata extraction performance
- Updated dependencies
### Fixed
- Various bug fixes and stability improvements
---
## [0.6.0] - 2025-08-01
### Added
- Initial MCP server implementation
- YouTube video search functionality
- Video download with resolution control
- Audio extraction
- Subtitle download and transcript generation
- Integration with yt-dlp
### Features
- 8 core tools for video content management
- Support for multiple video platforms
- Configurable downloads directory
- Automatic filename sanitization
- Cross-platform compatibility (Windows, macOS, Linux)
---
[0.7.0]: https://github.com/kevinwatt/yt-dlp-mcp/compare/v0.6.28...v0.7.0
[0.6.28]: https://github.com/kevinwatt/yt-dlp-mcp/compare/v0.6.0...v0.6.28
[0.6.0]: https://github.com/kevinwatt/yt-dlp-mcp/releases/tag/v0.6.0

107
CLAUDE.md Normal file
View File

@ -0,0 +1,107 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Development Guidelines
- **Always update CHANGELOG.md** when making changes to the codebase
- **Version updates require TWO files**:
1. `package.json` - line 3: `"version": "x.x.x"`
2. `src/index.mts` - line 24: `const VERSION = 'x.x.x'`
## Development Commands
### Build and Prepare
```bash
npm run prepare # Compile TypeScript and make binary executable
```
### Testing
```bash
npm test # Run Jest tests with ESM support
```
### Manual Testing
```bash
npx @kevinwatt/yt-dlp-mcp # Start MCP server manually
```
## Code Architecture
### MCP Server Implementation
This is an MCP (Model Context Protocol) server that integrates with `yt-dlp` for video/audio downloading. The server:
- **Entry point**: `src/index.mts` - Main MCP server implementation with tool handlers
- **Modular design**: Each feature lives in `src/modules/` (video.ts, audio.ts, subtitle.ts, search.ts, metadata.ts)
- **Configuration**: `src/config.ts` - Centralized config with environment variable support and validation
- **Utility functions**: `src/modules/utils.ts` - Shared spawn and cleanup utilities
### Tool Architecture
The server exposes 8 MCP tools:
1. `search_videos` - YouTube video search
2. `list_subtitle_languages` - List available subtitles
3. `download_video_subtitles` - Download subtitle files
4. `download_video` - Download videos with resolution/trimming options
5. `download_audio` - Extract and download audio
6. `download_transcript` - Generate clean text transcripts
7. `get_video_metadata` - Extract comprehensive video metadata (JSON format)
8. `get_video_metadata_summary` - Get human-readable metadata summary
### Key Patterns
- **Unified error handling**: `handleToolExecution()` wrapper for consistent error responses
- **Spawn management**: All external tool calls go through `_spawnPromise()` with cleanup
- **Configuration-driven**: All defaults and behavior configurable via environment variables
- **ESM modules**: Uses `.mts` extension and ESM imports throughout
- **Filename sanitization**: Cross-platform safe filename handling with length limits
- **Metadata extraction**: Uses `yt-dlp --dump-json` for comprehensive video information without downloading content
### Dependencies
- **Required external**: `yt-dlp` must be installed and in PATH
- **Core MCP**: `@modelcontextprotocol/sdk` for server implementation
- **Process management**: `spawn-rx` for async process spawning
- **File operations**: `rimraf` for cleanup
### Configuration System
`CONFIG` object loaded from `config.ts` supports:
- Download directory customization (defaults to ~/Downloads)
- Resolution/format preferences
- Filename sanitization rules
- Temporary directory management
- Environment variable overrides (YTDLP_* prefix)
### Testing Setup
- **Jest with ESM**: Custom config for TypeScript + ESM support
- **Test isolation**: Tests run in separate environment with mocked dependencies
- **Coverage**: Tests for each module in `src/__tests__/`
### TypeScript Configuration
- **Strict mode**: All strict TypeScript checks enabled
- **ES2020 target**: Modern JavaScript features
- **Declaration generation**: Types exported to `lib/` for consumption
- **Source maps**: Enabled for debugging
### Build Output
- **Compiled code**: `lib/` directory with .js, .d.ts, and .map files
- **Executable**: `lib/index.mjs` with shebang for direct execution
- **Module structure**: Preserves source module organization
## Metadata Module Details
### VideoMetadata Interface
The `metadata.ts` module exports a comprehensive `VideoMetadata` interface containing fields like:
- Basic info: `id`, `title`, `description`, `duration`, `upload_date`
- Channel info: `channel`, `channel_id`, `channel_url`, `uploader`
- Analytics: `view_count`, `like_count`, `comment_count`
- Technical: `formats`, `thumbnails`, `subtitles`
- Content: `tags`, `categories`, `series`, `episode` data
### Key Functions
- `getVideoMetadata(url, fields?, config?)` - Extract full or filtered metadata as JSON
- `getVideoMetadataSummary(url, config?)` - Generate human-readable summary
### Testing
Comprehensive test suite in `src/__tests__/metadata.test.ts` covers:
- Field filtering and extraction
- Error handling for invalid URLs
- Format validation
- Real-world integration with YouTube videos

567
README.md
View File

@ -1,98 +1,541 @@
# yt-dlp-mcp
# 🎬 yt-dlp-mcp
An MCP server implementation that integrates with yt-dlp, providing video content download capabilities (e.g. YouTube, Facebook, etc.) for LLMs.
<div align="center">
## Features
**A powerful MCP server that brings video platform capabilities to your AI agents**
* **Subtitles**: Download subtitles in SRT format for LLMs to read
* **Video Download**: Save videos to your Downloads folder with resolution control
* **Privacy-Focused**: Direct download without tracking
* **MCP Integration**: Works with Dive and other MCP-compatible LLMs
[![npm version](https://img.shields.io/npm/v/@kevinwatt/yt-dlp-mcp.svg)](https://www.npmjs.com/package/@kevinwatt/yt-dlp-mcp)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Node.js Version](https://img.shields.io/badge/node-%3E%3D18-brightgreen)](https://nodejs.org/)
[![TypeScript](https://img.shields.io/badge/TypeScript-5.6-blue)](https://www.typescriptlang.org/)
## Installation
Integrate yt-dlp with Claude, Dive, and other MCP-compatible AI systems. Download videos, extract metadata, get transcripts, and more — all through natural language.
[Features](#-features) • [Installation](#-installation) • [Tools](#-available-tools) • [Usage](#-usage-examples) • [Documentation](#-documentation)
</div>
---
## ✨ Features
<table>
<tr>
<td width="50%">
### 🔍 **Search & Discovery**
- Search YouTube with pagination
- JSON or Markdown output formats
- Filter by relevance and quality
### 📊 **Metadata Extraction**
- Comprehensive video information
- Channel details and statistics
- Upload dates, tags, categories
- No content download required
### 📝 **Transcript & Subtitles**
- Download subtitles in VTT format
- Generate clean text transcripts
- Multi-language support
- Auto-generated captions
</td>
<td width="50%">
### 🎥 **Video Downloads**
- Resolution control (480p-1080p)
- Video trimming support
- Platform-agnostic (YouTube, Facebook, etc.)
- Saved to Downloads folder
### 🎵 **Audio Extraction**
- Best quality audio (M4A/MP3)
- Direct audio-only downloads
- Perfect for podcasts & music
### 🛡️ **Privacy & Safety**
- No tracking or analytics
- Direct downloads via yt-dlp
- Zod schema validation
- Character limits for LLM safety
</td>
</tr>
</table>
---
## 🚀 Installation
### Prerequisites
Install `yt-dlp` based on your operating system:
**Install yt-dlp** on your system:
```bash
# Windows
winget install yt-dlp
<table>
<tr>
<th>Platform</th>
<th>Command</th>
</tr>
<tr>
<td>🪟 <strong>Windows</strong></td>
<td><code>winget install yt-dlp</code></td>
</tr>
<tr>
<td>🍎 <strong>macOS</strong></td>
<td><code>brew install yt-dlp</code></td>
</tr>
<tr>
<td>🐧 <strong>Linux</strong></td>
<td><code>pip install yt-dlp</code></td>
</tr>
</table>
# macOS
brew install yt-dlp
### Getting Started
# Linux
pip install yt-dlp
```
### With [Dive Desktop](https://github.com/OpenAgentPlatform/Dive)
1. Click "+ Add MCP Server" in Dive Desktop
2. Copy and paste this configuration:
Add the following config to your MCP client:
```json
{
"mcpServers": {
"yt-dlp": {
"command": "npx",
"args": [
"-y",
"@kevinwatt/yt-dlp-mcp"
]
"args": ["-y", "@kevinwatt/yt-dlp-mcp@latest"]
}
}
}
```
3. Click "Save" to install the MCP server
## Tool Documentation
### MCP Client Configuration
* **list_video_subtitles**
* List all available subtitles for a video
* Inputs:
* `url` (string, required): URL of the video
<details open>
<summary><strong>Dive</strong></summary>
* **download_video_srt**
* Download subtitles in SRT format
* Inputs:
* `url` (string, required): URL of the video
* `language` (string, optional): Language code (e.g., 'en', 'zh-Hant', 'ja'). Defaults to 'en'
1. Open [Dive Desktop](https://github.com/OpenAgentPlatform/Dive)
2. Click **"+ Add MCP Server"**
3. Paste the config provided above
4. Click **"Save"** and you're ready!
* **download_video**
* Download video to user's Downloads folder
* Inputs:
* `url` (string, required): URL of the video
* `resolution` (string, optional): Video resolution ('480p', '720p', '1080p', 'best'). Defaults to '720p'
</details>
## Usage Examples
<details>
<summary><strong>Claude Code</strong></summary>
Ask your LLM to:
```
"List available subtitles for this video: https://youtube.com/watch?v=..."
"Download a video from facebook: https://facebook.com/..."
"Download Chinese subtitles from this video: https://youtube.com/watch?v=..."
"Download this video in 1080p: https://youtube.com/watch?v=..."
```
Use the Claude Code CLI to add the yt-dlp MCP server ([guide](https://docs.anthropic.com/en/docs/claude-code/mcp)):
## Manual Start
If needed, start the server manually:
```bash
npx @kevinwatt/yt-dlp-mcp
claude mcp add yt-dlp npx @kevinwatt/yt-dlp-mcp@latest
```
## Requirements
</details>
* Node.js 20+
* `yt-dlp` in system PATH
* MCP-compatible LLM service
<details>
<summary><strong>Claude Desktop</strong></summary>
## License
Add to your `claude_desktop_config.json`:
MIT
- macOS: `~/Library/Application Support/Claude/claude_desktop_config.json`
- Windows: `%APPDATA%\Claude\claude_desktop_config.json`
- Linux: `~/.config/Claude/claude_desktop_config.json`
## Author
```json
{
"mcpServers": {
"yt-dlp": {
"command": "npx",
"args": ["-y", "@kevinwatt/yt-dlp-mcp@latest"]
}
}
}
```
Dewei Yen
</details>
<details>
<summary><strong>Cursor</strong></summary>
Go to `Cursor Settings` -> `MCP` -> `New MCP Server`. Use the config provided above.
</details>
<details>
<summary><strong>VS Code / Copilot</strong></summary>
Install via the VS Code CLI:
```bash
code --add-mcp '{"name":"yt-dlp","command":"npx","args":["-y","@kevinwatt/yt-dlp-mcp@latest"]}'
```
Or follow the [MCP install guide](https://code.visualstudio.com/docs/copilot/chat/mcp-servers#_add-an-mcp-server) with the standard config from above.
</details>
<details>
<summary><strong>Windsurf</strong></summary>
Follow the [configure MCP guide](https://docs.windsurf.com/windsurf/cascade/mcp#mcp-config-json) using the standard config from above.
</details>
<details>
<summary><strong>Cline</strong></summary>
Follow [Cline MCP configuration guide](https://docs.cline.bot/mcp/configuring-mcp-servers) and use the config provided above.
</details>
<details>
<summary><strong>Warp</strong></summary>
Go to `Settings | AI | Manage MCP Servers` -> `+ Add` to [add an MCP Server](https://docs.warp.dev/knowledge-and-collaboration/mcp#adding-an-mcp-server). Use the config provided above.
</details>
<details>
<summary><strong>JetBrains AI Assistant</strong></summary>
Go to `Settings | Tools | AI Assistant | Model Context Protocol (MCP)` -> `Add`. Use the config provided above.
</details>
### Manual Installation
```bash
npm install -g @kevinwatt/yt-dlp-mcp
```
---
## 🛠️ Available Tools
All tools are prefixed with `ytdlp_` to avoid naming conflicts with other MCP servers.
### 🔍 Search & Discovery
<table>
<tr>
<th width="30%">Tool</th>
<th width="70%">Description</th>
</tr>
<tr>
<td><code>ytdlp_search_videos</code></td>
<td>
Search YouTube with pagination and date filtering support
- **Parameters**: `query`, `maxResults`, `offset`, `response_format`, `uploadDateFilter`
- **Date Filter**: `hour`, `today`, `week`, `month`, `year` (optional)
- **Returns**: Video list with titles, channels, durations, URLs
- **Supports**: JSON and Markdown formats
</td>
</tr>
</table>
### 📝 Subtitles & Transcripts
<table>
<tr>
<th width="30%">Tool</th>
<th width="70%">Description</th>
</tr>
<tr>
<td><code>ytdlp_list_subtitle_languages</code></td>
<td>
List all available subtitle languages for a video
- **Parameters**: `url`
- **Returns**: Available languages, formats, auto-generated status
</td>
</tr>
<tr>
<td><code>ytdlp_download_video_subtitles</code></td>
<td>
Download subtitles in VTT format with timestamps
- **Parameters**: `url`, `language` (optional)
- **Returns**: Raw VTT subtitle content
</td>
</tr>
<tr>
<td><code>ytdlp_download_transcript</code></td>
<td>
Generate clean plain text transcript
- **Parameters**: `url`, `language` (optional)
- **Returns**: Cleaned text without timestamps or formatting
</td>
</tr>
</table>
### 🎥 Video & Audio Downloads
<table>
<tr>
<th width="30%">Tool</th>
<th width="70%">Description</th>
</tr>
<tr>
<td><code>ytdlp_download_video</code></td>
<td>
Download video to Downloads folder
- **Parameters**: `url`, `resolution`, `startTime`, `endTime`
- **Resolutions**: 480p, 720p, 1080p, best
- **Supports**: Video trimming
</td>
</tr>
<tr>
<td><code>ytdlp_download_audio</code></td>
<td>
Extract and download audio only
- **Parameters**: `url`
- **Format**: Best quality M4A/MP3
</td>
</tr>
</table>
### 📊 Metadata
<table>
<tr>
<th width="30%">Tool</th>
<th width="70%">Description</th>
</tr>
<tr>
<td><code>ytdlp_get_video_metadata</code></td>
<td>
Extract comprehensive video metadata in JSON
- **Parameters**: `url`, `fields` (optional array)
- **Returns**: Complete metadata or filtered fields
- **Includes**: Views, likes, upload date, tags, formats, etc.
</td>
</tr>
<tr>
<td><code>ytdlp_get_video_metadata_summary</code></td>
<td>
Get human-readable metadata summary
- **Parameters**: `url`
- **Returns**: Formatted text with key information
</td>
</tr>
</table>
---
## 💡 Usage Examples
### Search Videos
```
"Search for Python programming tutorials"
"Find the top 20 machine learning videos"
"Search for 'react hooks tutorial' and show results 10-20"
"Search for JavaScript courses in JSON format"
```
### Get Metadata
```
"Get metadata for https://youtube.com/watch?v=..."
"Show me the title, channel, and view count for this video"
"Extract just the duration and upload date"
"Give me a quick summary of this video's info"
```
### Download Subtitles & Transcripts
```
"List available subtitles for https://youtube.com/watch?v=..."
"Download English subtitles from this video"
"Get a clean transcript of this video in Spanish"
"Download Chinese (zh-Hant) transcript"
```
### Download Content
```
"Download this video in 1080p: https://youtube.com/watch?v=..."
"Download audio from this YouTube video"
"Download this video from 1:30 to 2:45"
"Save this Facebook video to my Downloads"
```
---
## 📖 Documentation
- **[API Reference](./docs/api.md)** - Detailed tool documentation
- **[Configuration](./docs/configuration.md)** - Environment variables and settings
- **[Cookie Configuration](./docs/cookies.md)** - Authentication and private video access
- **[Error Handling](./docs/error-handling.md)** - Common errors and solutions
- **[Contributing](./docs/contributing.md)** - How to contribute
---
## 🔧 Configuration
### Environment Variables
```bash
# Downloads directory (default: ~/Downloads)
YTDLP_DOWNLOADS_DIR=/path/to/downloads
# Default resolution (default: 720p)
YTDLP_DEFAULT_RESOLUTION=1080p
# Default subtitle language (default: en)
YTDLP_DEFAULT_SUBTITLE_LANG=en
# Character limit (default: 25000)
YTDLP_CHARACTER_LIMIT=25000
# Max transcript length (default: 50000)
YTDLP_MAX_TRANSCRIPT_LENGTH=50000
```
### Cookie Configuration
To access private videos, age-restricted content, or avoid rate limits, configure cookies:
> ⚠️ **Important**: Cookie authentication requires a JavaScript runtime (deno) to be installed. When using cookies, YouTube uses authenticated API endpoints that require JavaScript challenge solving. Without deno, downloads will fail with "n challenge solving failed" error.
>
> Install deno: https://docs.deno.com/runtime/getting_started/installation/
```bash
# Extract cookies from browser (recommended)
YTDLP_COOKIES_FROM_BROWSER=chrome
# Or use a cookie file
YTDLP_COOKIES_FILE=/path/to/cookies.txt
```
**MCP Configuration with cookies:**
```json
{
"mcpServers": {
"yt-dlp": {
"command": "npx",
"args": ["-y", "@kevinwatt/yt-dlp-mcp@latest"],
"env": {
"YTDLP_COOKIES_FROM_BROWSER": "chrome"
}
}
}
}
```
Supported browsers: `brave`, `chrome`, `chromium`, `edge`, `firefox`, `opera`, `safari`, `vivaldi`, `whale`
See [Cookie Configuration Guide](./docs/cookies.md) for detailed setup instructions.
---
## 🏗️ Architecture
### Built With
- **[yt-dlp](https://github.com/yt-dlp/yt-dlp)** - Video extraction engine
- **[MCP SDK](https://github.com/modelcontextprotocol/typescript-sdk)** - Model Context Protocol
- **[Zod](https://github.com/colinhacks/zod)** - TypeScript-first schema validation
- **TypeScript** - Type safety and developer experience
### Key Features
- ✅ **Type-Safe**: Full TypeScript with strict mode
- ✅ **Validated Inputs**: Zod schemas for runtime validation
- ✅ **Character Limits**: Automatic truncation to prevent context overflow
- ✅ **Tool Annotations**: readOnly, destructive, idempotent hints
- ✅ **Error Guidance**: Actionable error messages for LLMs
- ✅ **Modular Design**: Clean separation of concerns
---
## 📊 Response Formats
### JSON Format
Perfect for programmatic processing:
```json
{
"total": 50,
"count": 10,
"offset": 0,
"videos": [...],
"has_more": true,
"next_offset": 10
}
```
### Markdown Format
Human-readable display:
```markdown
Found 50 videos (showing 10):
1. **Video Title**
📺 Channel: Creator Name
⏱️ Duration: 10:30
🔗 URL: https://...
```
---
## 🔒 Privacy & Security
- **No Tracking**: Direct downloads, no analytics
- **Input Validation**: Zod schemas prevent injection
- **URL Validation**: Strict URL format checking
- **Character Limits**: Prevents context overflow attacks
- **Read-Only by Default**: Most tools don't modify system state
---
## 🤝 Contributing
Contributions are welcome! Please check out our [Contributing Guide](./docs/contributing.md).
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request
---
## 📝 License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
---
## 🙏 Acknowledgments
- [yt-dlp](https://github.com/yt-dlp/yt-dlp) - The amazing video extraction tool
- [Anthropic](https://www.anthropic.com/) - For the Model Context Protocol
- [Dive](https://github.com/OpenAgentPlatform/Dive) - MCP-compatible AI platform
---
## 📚 Related Projects
- [MCP Servers](https://github.com/modelcontextprotocol/servers) - Official MCP server implementations
- [yt-dlp](https://github.com/yt-dlp/yt-dlp) - Command-line video downloader
- [Dive Desktop](https://github.com/OpenAgentPlatform/Dive) - AI agent platform
---
<div align="center">
[⬆ Back to Top](#-yt-dlp-mcp)
</div>

194
docs/api.md Normal file
View File

@ -0,0 +1,194 @@
# API Reference
## Video Operations
### downloadVideo(url: string, config?: Config, resolution?: string, startTime?: string, endTime?: string): Promise<string>
Downloads a video from the specified URL with optional trimming.
**Parameters:**
- `url`: The URL of the video to download
- `config`: (Optional) Configuration object
- `resolution`: (Optional) Preferred video resolution ('480p', '720p', '1080p', 'best')
- `startTime`: (Optional) Start time for trimming (format: HH:MM:SS[.ms])
- `endTime`: (Optional) End time for trimming (format: HH:MM:SS[.ms])
**Returns:**
- Promise resolving to a success message with the downloaded file path
**Example:**
```javascript
import { downloadVideo } from '@kevinwatt/yt-dlp-mcp';
// Download with default settings
const result = await downloadVideo('https://www.youtube.com/watch?v=jNQXAC9IVRw');
console.log(result);
// Download with specific resolution
const hdResult = await downloadVideo(
'https://www.youtube.com/watch?v=jNQXAC9IVRw',
undefined,
'1080p'
);
console.log(hdResult);
// Download with trimming
const trimmedResult = await downloadVideo(
'https://www.youtube.com/watch?v=jNQXAC9IVRw',
undefined,
'720p',
'00:01:30',
'00:02:45'
);
console.log(trimmedResult);
// Download with fractional seconds
const preciseTrim = await downloadVideo(
'https://www.youtube.com/watch?v=jNQXAC9IVRw',
undefined,
'720p',
'00:01:30.500',
'00:02:45.250'
);
console.log(preciseTrim);
```
## Audio Operations
### downloadAudio(url: string, config?: Config): Promise<string>
Downloads audio from the specified URL in the best available quality.
**Parameters:**
- `url`: The URL of the video to extract audio from
- `config`: (Optional) Configuration object
**Returns:**
- Promise resolving to a success message with the downloaded file path
**Example:**
```javascript
import { downloadAudio } from '@kevinwatt/yt-dlp-mcp';
const result = await downloadAudio('https://www.youtube.com/watch?v=jNQXAC9IVRw');
console.log(result);
```
## Subtitle Operations
### listSubtitles(url: string): Promise<string>
Lists all available subtitles for a video.
**Parameters:**
- `url`: The URL of the video
**Returns:**
- Promise resolving to a string containing the list of available subtitles
**Example:**
```javascript
import { listSubtitles } from '@kevinwatt/yt-dlp-mcp';
const subtitles = await listSubtitles('https://www.youtube.com/watch?v=jNQXAC9IVRw');
console.log(subtitles);
```
### downloadSubtitles(url: string, language: string): Promise<string>
Downloads subtitles for a video in the specified language.
**Parameters:**
- `url`: The URL of the video
- `language`: Language code (e.g., 'en', 'zh-Hant', 'ja')
**Returns:**
- Promise resolving to the subtitle content
**Example:**
```javascript
import { downloadSubtitles } from '@kevinwatt/yt-dlp-mcp';
const subtitles = await downloadSubtitles(
'https://www.youtube.com/watch?v=jNQXAC9IVRw',
'en'
);
console.log(subtitles);
```
## Metadata Operations
### getVideoMetadata(url: string, fields?: string[]): Promise<string>
Extract comprehensive video metadata using yt-dlp without downloading the content.
**Parameters:**
- `url`: The URL of the video to extract metadata from
- `fields`: (Optional) Specific metadata fields to extract (e.g., `['id', 'title', 'description', 'channel']`). If omitted, returns all available metadata. If provided as an empty array `[]`, returns `{}`.
**Returns:**
- Promise resolving to a JSON string of metadata (pretty-printed)
**Example:**
```javascript
import { getVideoMetadata } from '@kevinwatt/yt-dlp-mcp';
// Get all metadata
const all = await getVideoMetadata('https://www.youtube.com/watch?v=jNQXAC9IVRw');
console.log(all);
// Get specific fields only
const subset = await getVideoMetadata(
'https://www.youtube.com/watch?v=jNQXAC9IVRw',
['id', 'title', 'description', 'channel']
);
console.log(subset);
```
### getVideoMetadataSummary(url: string): Promise<string>
Get a human-readable summary of key video metadata fields.
**Parameters:**
- `url`: The URL of the video
**Returns:**
- Promise resolving to a formatted text summary (title, channel, duration, views, upload date, description preview, etc.)
**Example:**
```javascript
import { getVideoMetadataSummary } from '@kevinwatt/yt-dlp-mcp';
const summary = await getVideoMetadataSummary('https://www.youtube.com/watch?v=jNQXAC9IVRw');
console.log(summary);
```
## Configuration
### Config Interface
```typescript
interface Config {
file: {
maxFilenameLength: number;
downloadsDir: string;
tempDirPrefix: string;
sanitize: {
replaceChar: string;
truncateSuffix: string;
illegalChars: RegExp;
reservedNames: readonly string[];
};
};
tools: {
required: readonly string[];
};
download: {
defaultResolution: "480p" | "720p" | "1080p" | "best";
defaultAudioFormat: "m4a" | "mp3";
defaultSubtitleLanguage: string;
};
}
```
For detailed configuration options, see [Configuration Guide](./configuration.md).

169
docs/configuration.md Normal file
View File

@ -0,0 +1,169 @@
# Configuration Guide
## Overview
The yt-dlp-mcp package can be configured through environment variables or by passing a configuration object to the functions.
## Configuration Object
```typescript
interface Config {
file: {
maxFilenameLength: number;
downloadsDir: string;
tempDirPrefix: string;
sanitize: {
replaceChar: string;
truncateSuffix: string;
illegalChars: RegExp;
reservedNames: readonly string[];
};
};
tools: {
required: readonly string[];
};
download: {
defaultResolution: "480p" | "720p" | "1080p" | "best";
defaultAudioFormat: "m4a" | "mp3";
defaultSubtitleLanguage: string;
};
}
```
## Environment Variables
| Variable | Description | Default |
|----------|-------------|---------|
| `YTDLP_MAX_FILENAME_LENGTH` | Maximum length for filenames | 50 |
| `YTDLP_DOWNLOADS_DIR` | Download directory path | `~/Downloads` |
| `YTDLP_TEMP_DIR_PREFIX` | Prefix for temporary directories | `ytdlp-` |
| `YTDLP_SANITIZE_REPLACE_CHAR` | Character to replace illegal characters | `_` |
| `YTDLP_SANITIZE_TRUNCATE_SUFFIX` | Suffix for truncated filenames | `...` |
| `YTDLP_SANITIZE_ILLEGAL_CHARS` | Regex pattern for illegal characters | `/[<>:"/\\|?*\x00-\x1F]/g` |
| `YTDLP_SANITIZE_RESERVED_NAMES` | Comma-separated list of reserved names | `CON,PRN,AUX,...` |
| `YTDLP_DEFAULT_RESOLUTION` | Default video resolution | `720p` |
| `YTDLP_DEFAULT_AUDIO_FORMAT` | Default audio format | `m4a` |
| `YTDLP_DEFAULT_SUBTITLE_LANG` | Default subtitle language | `en` |
## File Configuration
### Download Directory
The download directory can be configured in two ways:
1. Environment variable:
```bash
export YTDLP_DOWNLOADS_DIR="/path/to/downloads"
```
2. Configuration object:
```javascript
const config = {
file: {
downloadsDir: "/path/to/downloads"
}
};
```
### Filename Sanitization
Control how filenames are sanitized:
```javascript
const config = {
file: {
maxFilenameLength: 100,
sanitize: {
replaceChar: '-',
truncateSuffix: '___',
illegalChars: /[<>:"/\\|?*\x00-\x1F]/g,
reservedNames: ['CON', 'PRN', 'AUX', 'NUL']
}
}
};
```
## Download Configuration
### Video Resolution
Set default video resolution:
```javascript
const config = {
download: {
defaultResolution: "1080p" // "480p" | "720p" | "1080p" | "best"
}
};
```
### Audio Format
Configure audio format preferences:
```javascript
const config = {
download: {
defaultAudioFormat: "m4a" // "m4a" | "mp3"
}
};
```
### Subtitle Language
Set default subtitle language:
```javascript
const config = {
download: {
defaultSubtitleLanguage: "en"
}
};
```
## Tools Configuration
Configure required external tools:
```javascript
const config = {
tools: {
required: ['yt-dlp']
}
};
```
## Complete Configuration Example
```javascript
import { CONFIG } from '@kevinwatt/yt-dlp-mcp';
const customConfig = {
file: {
maxFilenameLength: 100,
downloadsDir: '/custom/downloads',
tempDirPrefix: 'ytdlp-temp-',
sanitize: {
replaceChar: '-',
truncateSuffix: '___',
illegalChars: /[<>:"/\\|?*\x00-\x1F]/g,
reservedNames: [
'CON', 'PRN', 'AUX', 'NUL',
'COM1', 'COM2', 'COM3', 'COM4', 'COM5',
'LPT1', 'LPT2', 'LPT3'
]
}
},
tools: {
required: ['yt-dlp']
},
download: {
defaultResolution: '1080p',
defaultAudioFormat: 'm4a',
defaultSubtitleLanguage: 'en'
}
};
// Use the custom configuration
const result = await downloadVideo(url, customConfig);
```

198
docs/contributing.md Normal file
View File

@ -0,0 +1,198 @@
# Contributing Guide
## Getting Started
1. Fork the repository
2. Clone your fork:
```bash
git clone https://github.com/your-username/yt-dlp-mcp.git
cd yt-dlp-mcp
```
3. Install dependencies:
```bash
npm install
```
4. Create a new branch:
```bash
git checkout -b feature/your-feature-name
```
## Development Setup
### Prerequisites
- Node.js 16.x or higher
- yt-dlp installed on your system
- TypeScript knowledge
- Jest for testing
### Building
```bash
npm run prepare
```
### Running Tests
```bash
npm test
```
For specific test files:
```bash
npm test -- src/__tests__/video.test.ts
```
## Code Style
We use TypeScript and follow these conventions:
- Use meaningful variable and function names
- Add JSDoc comments for public APIs
- Follow the existing code style
- Use async/await for promises
- Handle errors appropriately
### TypeScript Guidelines
```typescript
// Use explicit types
function downloadVideo(url: string, config?: Config): Promise<string> {
// Implementation
}
// Use interfaces for complex types
interface DownloadOptions {
resolution: string;
format: string;
output: string;
}
// Use enums for fixed values
enum Resolution {
SD = "480p",
HD = "720p",
FHD = "1080p",
BEST = "best",
}
```
## Testing
### Writing Tests
- Place tests in `src/__tests__` directory
- Name test files with `.test.ts` suffix
- Use descriptive test names
- Test both success and error cases
Example:
```typescript
describe("downloadVideo", () => {
test("downloads video successfully", async () => {
const result = await downloadVideo(testUrl);
expect(result).toMatch(/Video successfully downloaded/);
});
test("handles invalid URL", async () => {
await expect(downloadVideo("invalid-url")).rejects.toThrow(
"Invalid or unsupported URL"
);
});
});
```
### Test Coverage
Aim for high test coverage:
```bash
npm run test:coverage
```
## Documentation
### JSDoc Comments
Add comprehensive JSDoc comments for all public APIs:
````typescript
/**
* Downloads a video from the specified URL.
*
* @param url - The URL of the video to download
* @param config - Optional configuration object
* @param resolution - Preferred video resolution
* @returns Promise resolving to success message with file path
* @throws {Error} When URL is invalid or download fails
*
* @example
* ```typescript
* const result = await downloadVideo('https://youtube.com/watch?v=...', config);
* console.log(result);
* ```
*/
export async function downloadVideo(
url: string,
config?: Config,
resolution?: string
): Promise<string> {
// Implementation
}
````
### README Updates
- Update README.md for new features
- Keep examples up to date
- Document breaking changes
## Pull Request Process
1. Update tests and documentation
2. Run all tests and linting
3. Update CHANGELOG.md
4. Create detailed PR description
5. Reference related issues
### PR Checklist
- [ ] Tests added/updated
- [ ] Documentation updated
- [ ] CHANGELOG.md updated
- [ ] Code follows style guidelines
- [ ] All tests passing
- [ ] No linting errors
## Release Process
1. Update version in package.json
2. Update CHANGELOG.md
3. Create release commit
4. Tag release
5. Push to main branch
### Version Numbers
Follow semantic versioning:
- MAJOR: Breaking changes
- MINOR: New features
- PATCH: Bug fixes
## Community
- Be respectful and inclusive
- Help others when possible
- Report bugs with detailed information
- Suggest improvements
- Share success stories
For more information, see the [README](./README.md) and [API Reference](./api.md).

302
docs/cookies.md Normal file
View File

@ -0,0 +1,302 @@
# Cookies Configuration Guide
## Why Do You Need Cookies?
You need to configure cookies for yt-dlp-mcp in the following situations:
- **Access private videos**: Videos that require login to view
- **Age-restricted content**: Content requiring account age verification
- **Bypass CAPTCHA**: Some websites require verification
- **Avoid rate limiting**: Reduce HTTP 429 (Too Many Requests) errors
- **YouTube Premium features**: Access premium-exclusive content and quality
## Configuration Methods
yt-dlp-mcp supports two cookie configuration methods via environment variables.
### Method 1: Extract from Browser (Recommended)
This is the simplest approach. yt-dlp reads cookies directly from your browser.
```bash
YTDLP_COOKIES_FROM_BROWSER=chrome
```
#### Supported Browsers
| Browser | Value |
|---------|-------|
| Google Chrome | `chrome` |
| Chromium | `chromium` |
| Microsoft Edge | `edge` |
| Mozilla Firefox | `firefox` |
| Brave | `brave` |
| Opera | `opera` |
| Safari (macOS) | `safari` |
| Vivaldi | `vivaldi` |
| Whale | `whale` |
#### Advanced Configuration
```bash
# Specify Chrome Profile
YTDLP_COOKIES_FROM_BROWSER=chrome:Profile 1
# Specify Firefox Container
YTDLP_COOKIES_FROM_BROWSER=firefox::work
# Flatpak-installed Chrome (Linux)
YTDLP_COOKIES_FROM_BROWSER=chrome:~/.var/app/com.google.Chrome/
# Full format: BROWSER:PROFILE::CONTAINER
YTDLP_COOKIES_FROM_BROWSER=chrome:Profile 1::personal
```
### Method 2: Use Cookie File
If you prefer using a fixed cookie file, or automatic extraction doesn't work:
```bash
YTDLP_COOKIES_FILE=/path/to/cookies.txt
```
The cookie file must be in Netscape/Mozilla format with the first line:
```
# Netscape HTTP Cookie File
```
## Exporting Cookies
### Using yt-dlp (Recommended)
This is the most reliable method, ensuring correct format:
```bash
# Export from Chrome
yt-dlp --cookies-from-browser chrome --cookies cookies.txt "https://www.youtube.com"
# Export from Firefox
yt-dlp --cookies-from-browser firefox --cookies cookies.txt "https://www.youtube.com"
```
> **Note**: This command exports ALL website cookies from your browser. Keep this file secure.
### Using Browser Extensions
| Browser | Extension |
|---------|-----------|
| Chrome | [Get cookies.txt LOCALLY](https://chrome.google.com/webstore/detail/get-cookiestxt-locally/cclelndahbckbenkjhflpdbgdldlbecc) |
| Firefox | [cookies.txt](https://addons.mozilla.org/firefox/addon/cookies-txt/) |
> **Warning**: Only use the recommended extensions above. Some cookie export extensions may be malware.
## MCP Configuration Examples
### Claude Desktop
Edit `claude_desktop_config.json`:
**Using Browser Cookies:**
```json
{
"mcpServers": {
"yt-dlp": {
"command": "npx",
"args": ["@kevinwatt/yt-dlp-mcp"],
"env": {
"YTDLP_COOKIES_FROM_BROWSER": "chrome"
}
}
}
}
```
**Using Cookie File:**
```json
{
"mcpServers": {
"yt-dlp": {
"command": "npx",
"args": ["@kevinwatt/yt-dlp-mcp"],
"env": {
"YTDLP_COOKIES_FILE": "/Users/username/.config/yt-dlp/cookies.txt"
}
}
}
}
```
### Configuration File Locations
| OS | Claude Desktop Config Location |
|----|-------------------------------|
| macOS | `~/Library/Application Support/Claude/claude_desktop_config.json` |
| Windows | `%APPDATA%\Claude\claude_desktop_config.json` |
| Linux | `~/.config/Claude/claude_desktop_config.json` |
## Priority Order
When both `YTDLP_COOKIES_FILE` and `YTDLP_COOKIES_FROM_BROWSER` are set:
1. `YTDLP_COOKIES_FILE` is used first
2. If no file is set, `YTDLP_COOKIES_FROM_BROWSER` is used
## Security Best Practices
### Cookie File Security
1. **Keep it safe**: Cookie files contain your login credentials; leakage may lead to account compromise
2. **Never share**: Never share cookie files with others or upload to public locations
3. **Version control**: Add `cookies.txt` to `.gitignore`
4. **File permissions**:
```bash
chmod 600 cookies.txt # Owner read/write only
```
### Browser Cookie Extraction
- Ensure your browser is up to date
- You may need to close the browser temporarily during extraction
- Some browser security features may block cookie extraction
### Regular Updates
- Browser cookies expire
- Re-export cookies periodically
- If you encounter authentication errors, try updating cookies
## Troubleshooting
### Error: Cookie file not found
```
Error: Cookie file not found: /path/to/cookies.txt
```
**Solutions:**
1. Verify the file path is correct
2. Confirm the file exists
3. Ensure the MCP service has permission to read the file
### Error: Browser cookies could not be loaded
```
Error: Could not load cookies from chrome
```
**Solutions:**
1. Verify browser name spelling is correct
2. Try closing the browser and retry
3. Ensure no multiple browser instances are running
4. Check if browser has password-protected cookie storage
### Error: Invalid cookie file format
```
Error: Invalid cookie file format
```
**Solutions:**
1. Ensure first line is `# Netscape HTTP Cookie File` or `# HTTP Cookie File`
2. Check line ending format (Unix uses LF, Windows uses CRLF)
3. Re-export cookies using yt-dlp
### Still Cannot Access Private Videos
1. **Confirm login**: Verify you're logged in to the video platform in your browser
2. **Refresh page**: Refresh the video page in browser before exporting
3. **Re-export**: Re-export cookies
4. **Check permissions**: Confirm your account has permission to access the video
### HTTP 400: Bad Request
This usually indicates incorrect line ending format in the cookie file.
**Linux/macOS:**
```bash
# Convert to Unix line endings
sed -i 's/\r$//' cookies.txt
```
**Windows:**
Use Notepad++ or VS Code to convert line endings to LF.
## YouTube JavaScript Runtime Requirement
YouTube requires a JavaScript runtime for yt-dlp to function properly. Without it, you may see errors like:
```
WARNING: [youtube] Signature solving failed: Some formats may be missing
ERROR: Requested format is not available
```
### Installing EJS (Recommended)
EJS is a lightweight JavaScript runtime specifically designed for yt-dlp.
**Linux (Debian/Ubuntu):**
```bash
# Install Node.js if not already installed
sudo apt install nodejs
# Install EJS globally
sudo npm install -g @aspect-build/ejs
```
**Linux (Arch):**
```bash
sudo pacman -S nodejs npm
sudo npm install -g @aspect-build/ejs
```
**macOS:**
```bash
brew install node
npm install -g @aspect-build/ejs
```
**Windows:**
```powershell
# Install Node.js from https://nodejs.org/
npm install -g @aspect-build/ejs
```
### Alternative: PhantomJS
If EJS doesn't work, you can try PhantomJS:
**Linux:**
```bash
sudo apt install phantomjs
```
**macOS:**
```bash
brew install phantomjs
```
### Verifying Installation
Test that yt-dlp can use the JavaScript runtime:
```bash
yt-dlp --dump-json "https://www.youtube.com/watch?v=dQw4w9WgXcQ" 2>&1 | head -1
```
If successful, you should see JSON output starting with `{`.
### Additional Dependencies for Cookie Extraction (Linux)
On Linux, cookie extraction from browsers requires the `secretstorage` module:
```bash
python3 -m pip install secretstorage
```
This is needed to decrypt cookies stored by Chromium-based browsers.
## Related Links
- [yt-dlp Cookie FAQ](https://github.com/yt-dlp/yt-dlp/wiki/FAQ#how-do-i-pass-cookies-to-yt-dlp)
- [yt-dlp EJS Wiki](https://github.com/yt-dlp/yt-dlp/wiki/EJS)
- [yt-dlp Documentation](https://github.com/yt-dlp/yt-dlp#readme)

175
docs/error-handling.md Normal file
View File

@ -0,0 +1,175 @@
# Error Handling Guide
## Common Errors
### Invalid URL
When providing an invalid or unsupported URL:
```javascript
try {
await downloadVideo('invalid-url');
} catch (error) {
if (error.message.includes('Invalid or unsupported URL')) {
console.error('Please provide a valid YouTube or supported platform URL');
}
}
```
### Missing Subtitles
When trying to download unavailable subtitles:
```javascript
try {
await downloadSubtitles(url, 'en');
} catch (error) {
if (error.message.includes('No subtitle files found')) {
console.warn('No subtitles available in the requested language');
}
}
```
### yt-dlp Command Failures
When yt-dlp command execution fails:
```javascript
try {
await downloadVideo(url);
} catch (error) {
if (error.message.includes('Failed with exit code')) {
console.error('yt-dlp command failed:', error.message);
// Check if yt-dlp is installed and up to date
}
}
```
### File System Errors
When encountering file system issues:
```javascript
try {
await downloadVideo(url);
} catch (error) {
if (error.message.includes('No write permission')) {
console.error('Cannot write to downloads directory. Check permissions.');
} else if (error.message.includes('Cannot create temporary directory')) {
console.error('Cannot create temporary directory. Check system temp directory permissions.');
}
}
```
## Comprehensive Error Handler
Here's a comprehensive error handler that covers most common scenarios:
```javascript
async function handleDownload(url, options = {}) {
try {
// Attempt the download
const result = await downloadVideo(url, options);
return result;
} catch (error) {
// URL validation errors
if (error.message.includes('Invalid or unsupported URL')) {
throw new Error(`Invalid URL: ${url}. Please provide a valid video URL.`);
}
// File system errors
if (error.message.includes('No write permission')) {
throw new Error(`Permission denied: Cannot write to ${options.file?.downloadsDir || '~/Downloads'}`);
}
if (error.message.includes('Cannot create temporary directory')) {
throw new Error('Cannot create temporary directory. Check system permissions.');
}
// yt-dlp related errors
if (error.message.includes('Failed with exit code')) {
if (error.message.includes('This video is unavailable')) {
throw new Error('Video is unavailable or has been removed.');
}
if (error.message.includes('Video is private')) {
throw new Error('This video is private and cannot be accessed.');
}
throw new Error('Download failed. Please check if yt-dlp is installed and up to date.');
}
// Subtitle related errors
if (error.message.includes('No subtitle files found')) {
throw new Error(`No subtitles available in ${options.language || 'the requested language'}.`);
}
// Unknown errors
throw new Error(`Unexpected error: ${error.message}`);
}
}
```
## Error Prevention
### URL Validation
Always validate URLs before processing:
```javascript
import { validateUrl, isYouTubeUrl } from '@kevinwatt/yt-dlp-mcp';
function validateVideoUrl(url) {
if (!validateUrl(url)) {
throw new Error('Invalid URL format');
}
if (!isYouTubeUrl(url)) {
console.warn('URL is not from YouTube, some features might not work');
}
}
```
### Configuration Validation
Validate configuration before use:
```javascript
function validateConfig(config) {
if (!config.file.downloadsDir) {
throw new Error('Downloads directory must be specified');
}
if (config.file.maxFilenameLength < 5) {
throw new Error('Filename length must be at least 5 characters');
}
if (!['480p', '720p', '1080p', 'best'].includes(config.download.defaultResolution)) {
throw new Error('Invalid resolution specified');
}
}
```
### Safe Cleanup
Always use safe cleanup for temporary files:
```javascript
import { safeCleanup } from '@kevinwatt/yt-dlp-mcp';
try {
// Your download code here
} catch (error) {
console.error('Download failed:', error);
} finally {
await safeCleanup(tempDir);
}
```
## Best Practices
1. Always wrap async operations in try-catch blocks
2. Validate inputs before processing
3. Use specific error types for different scenarios
4. Clean up temporary files in finally blocks
5. Log errors appropriately for debugging
6. Provide meaningful error messages to users
For more information about specific errors and their solutions, see the [API Reference](./api.md).

View File

@ -0,0 +1,77 @@
# Search Feature Demo
The search functionality has been successfully added to yt-dlp-mcp! This feature allows you to search for videos on YouTube using keywords and get formatted results with video information.
## New Tool: `search_videos`
### Description
Search for videos on YouTube using keywords. Returns title, uploader, duration, and URL for each result.
### Parameters
- `query` (string, required): Search keywords or phrase
- `maxResults` (number, optional): Maximum number of results to return (1-50, default: 10)
### Example Usage
Ask your LLM to:
```
"Search for Python tutorial videos"
"Find JavaScript courses and show me the top 5 results"
"Search for machine learning tutorials with 15 results"
```
### Example Output
When searching for "javascript tutorial" with 3 results, you'll get:
```
Found 3 videos:
1. **JavaScript Tutorial Full Course - Beginner to Pro**
📺 Channel: Traversy Media
⏱️ Duration: 15663
🔗 URL: https://www.youtube.com/watch?v=EerdGm-ehJQ
🆔 ID: EerdGm-ehJQ
2. **JavaScript Course for Beginners**
📺 Channel: FreeCodeCamp.org
⏱️ Duration: 12402
🔗 URL: https://www.youtube.com/watch?v=W6NZfCO5SIk
🆔 ID: W6NZfCO5SIk
3. **JavaScript Full Course for free 🌐 (2024)**
📺 Channel: Bro Code
⏱️ Duration: 43200
🔗 URL: https://www.youtube.com/watch?v=lfmg-EJ8gm4
🆔 ID: lfmg-EJ8gm4
💡 You can use any URL to download videos, audio, or subtitles!
```
## Integration with Existing Features
After searching for videos, you can directly use the returned URLs with other tools:
1. **Download video**: Use the URL with `download_video`
2. **Download audio**: Use the URL with `download_audio`
3. **Get subtitles**: Use the URL with `list_subtitle_languages` or `download_video_subtitles`
4. **Get transcript**: Use the URL with `download_transcript`
## Test Results
All search functionality tests pass:
- ✅ Successfully search and format results
- ✅ Reject empty search queries
- ✅ Validate maxResults parameter range
- ✅ Handle search with different result counts
- ✅ Return properly formatted results
- ✅ Handle obscure search terms gracefully
## Implementation Details
The search feature uses yt-dlp's built-in search capability with the syntax:
- `ytsearch[N]:[query]` where N is the number of results
- Uses `--print` options to extract: title, id, uploader, duration
- Results are formatted in a user-friendly way with emojis and clear structure
This addresses the feature request from [Issue #14](https://github.com/kevinwatt/yt-dlp-mcp/issues/14) and provides a seamless search experience for users.

1410
package-lock.json generated

File diff suppressed because it is too large Load Diff

View File

@ -1,7 +1,7 @@
{
"name": "@kevinwatt/yt-dlp-mcp",
"version": "0.6.11",
"description": "yt-dlp MCP Server - Download video content via Model Context Protocol",
"version": "0.8.4",
"description": "An MCP server implementation that integrates with yt-dlp, providing video and audio content download capabilities (e.g. YouTube, Facebook, Tiktok, etc.) for LLMs.",
"keywords": [
"mcp",
"youtube",
@ -26,8 +26,8 @@
],
"main": "./lib/index.mjs",
"scripts": {
"prepare": "tsc && shx chmod +x ./lib/index.mjs",
"test": "node --experimental-vm-modules node_modules/jest/bin/jest.js --detectOpenHandles --forceExit"
"prepare": "tsc --skipLibCheck && chmod +x ./lib/index.mjs",
"test": "PYTHONPATH= PYTHONHOME= node --experimental-vm-modules node_modules/jest/bin/jest.js --detectOpenHandles --forceExit"
},
"author": "Dewei Yen <k@funmula.com>",
"license": "MIT",
@ -40,7 +40,8 @@
"dependencies": {
"@modelcontextprotocol/sdk": "0.7.0",
"rimraf": "^6.0.1",
"spawn-rx": "^4.0.0"
"spawn-rx": "^4.0.0",
"zod": "^4.1.12"
},
"devDependencies": {
"@jest/globals": "^29.7.0",

View File

@ -0,0 +1,43 @@
// @ts-nocheck
// @jest-environment node
import { describe, test, expect } from '@jest/globals';
import * as os from 'os';
import * as path from 'path';
import { downloadAudio } from '../modules/audio.js';
import { CONFIG } from '../config.js';
import * as fs from 'fs';
describe('downloadAudio', () => {
const testUrl = 'https://www.youtube.com/watch?v=jNQXAC9IVRw';
const testConfig = {
...CONFIG,
file: {
...CONFIG.file,
downloadsDir: path.join(os.tmpdir(), 'yt-dlp-test-downloads'),
tempDirPrefix: 'yt-dlp-test-'
}
};
beforeAll(async () => {
await fs.promises.mkdir(testConfig.file.downloadsDir, { recursive: true });
});
afterAll(async () => {
await fs.promises.rm(testConfig.file.downloadsDir, { recursive: true, force: true });
});
test('downloads audio successfully from YouTube', async () => {
const result = await downloadAudio(testUrl, testConfig);
expect(result).toContain('Audio successfully downloaded');
const files = await fs.promises.readdir(testConfig.file.downloadsDir);
expect(files.length).toBeGreaterThan(0);
expect(files[0]).toMatch(/\.m4a$/);
}, 30000);
test('handles invalid URL', async () => {
await expect(downloadAudio('invalid-url', testConfig))
.rejects
.toThrow();
});
});

View File

@ -0,0 +1,180 @@
// @ts-nocheck
// @jest-environment node
import { describe, test, expect, beforeAll } from '@jest/globals';
import { getVideoComments, getVideoCommentsSummary } from '../modules/comments.js';
import type { CommentsResponse } from '../modules/comments.js';
import { CONFIG } from '../config.js';
// Clear Python environment to avoid yt-dlp issues
delete process.env.PYTHONPATH;
delete process.env.PYTHONHOME;
// Integration tests require network access - opt-in via RUN_INTEGRATION_TESTS=1
const RUN_INTEGRATION = process.env.RUN_INTEGRATION_TESTS === '1';
(RUN_INTEGRATION ? describe : describe.skip)('Video Comments Extraction', () => {
// Using a popular video that should have comments enabled
const testUrl = 'https://www.youtube.com/watch?v=jNQXAC9IVRw';
describe('getVideoComments', () => {
test('should extract comments from YouTube video', async () => {
const commentsJson = await getVideoComments(testUrl, 5, 'top', CONFIG);
const data: CommentsResponse = JSON.parse(commentsJson);
// Verify response structure
expect(data).toHaveProperty('count');
expect(data).toHaveProperty('has_more');
expect(data).toHaveProperty('comments');
expect(Array.isArray(data.comments)).toBe(true);
expect(data.count).toBeGreaterThan(0);
expect(data.count).toBeLessThanOrEqual(5);
}, 60000);
test('should return comments with expected fields', async () => {
const commentsJson = await getVideoComments(testUrl, 3, 'top', CONFIG);
const data: CommentsResponse = JSON.parse(commentsJson);
if (data.comments.length > 0) {
const comment = data.comments[0];
// These fields should typically be present
expect(comment).toHaveProperty('text');
expect(comment).toHaveProperty('author');
// Verify text is a string
if (comment.text !== undefined) {
expect(typeof comment.text).toBe('string');
}
if (comment.author !== undefined) {
expect(typeof comment.author).toBe('string');
}
}
}, 60000);
test('should respect maxComments parameter', async () => {
const commentsJson = await getVideoComments(testUrl, 3, 'top', CONFIG);
const data: CommentsResponse = JSON.parse(commentsJson);
expect(data.comments.length).toBeLessThanOrEqual(3);
}, 60000);
test('should support different sort orders', async () => {
// Just verify both sort orders work without error
const topComments = await getVideoComments(testUrl, 2, 'top', CONFIG);
const topData: CommentsResponse = JSON.parse(topComments);
expect(topData).toHaveProperty('comments');
const newComments = await getVideoComments(testUrl, 2, 'new', CONFIG);
const newData: CommentsResponse = JSON.parse(newComments);
expect(newData).toHaveProperty('comments');
}, 90000);
test('should throw error for invalid URL', async () => {
await expect(getVideoComments('invalid-url', 5, 'top', CONFIG)).rejects.toThrow();
});
test('should throw error for unsupported URL', async () => {
await expect(getVideoComments('https://example.com/video', 5, 'top', CONFIG)).rejects.toThrow();
}, 30000);
});
describe('getVideoCommentsSummary', () => {
test('should generate human-readable summary', async () => {
const summary = await getVideoCommentsSummary(testUrl, 5, CONFIG);
expect(typeof summary).toBe('string');
expect(summary.length).toBeGreaterThan(0);
// Should contain header
expect(summary).toContain('Video Comments');
// Should have formatted content
expect(summary).toContain('Author:');
}, 60000);
test('should respect maxComments parameter', async () => {
const summary = await getVideoCommentsSummary(testUrl, 3, CONFIG);
// Count occurrences of "Author:" to verify number of comments
const authorMatches = summary.match(/Author:/g) ?? [];
expect(authorMatches.length).toBeLessThanOrEqual(3);
}, 60000);
test('should throw error for invalid URL', async () => {
await expect(getVideoCommentsSummary('invalid-url', 5, CONFIG)).rejects.toThrow();
});
test('should handle videos with different comment counts', async () => {
const summary = await getVideoCommentsSummary(testUrl, 10, CONFIG);
// Summary should be a valid string
expect(typeof summary).toBe('string');
expect(summary.trim().length).toBeGreaterThan(0);
}, 60000);
});
describe('Error Handling', () => {
test('should provide helpful error message for unavailable video', async () => {
const unavailableUrl = 'https://www.youtube.com/watch?v=invalid_video_id_xyz123';
await expect(getVideoComments(unavailableUrl, 5, 'top', CONFIG)).rejects.toThrow();
}, 30000);
test('should handle unsupported URLs gracefully', async () => {
const unsupportedUrl = 'https://example.com/not-a-video';
await expect(getVideoComments(unsupportedUrl, 5, 'top', CONFIG)).rejects.toThrow();
}, 30000);
});
describe('Comment Fields', () => {
test('should include author information when available', async () => {
const commentsJson = await getVideoComments(testUrl, 5, 'top', CONFIG);
const data: CommentsResponse = JSON.parse(commentsJson);
if (data.comments.length > 0) {
const comment = data.comments[0];
// Author fields
if (comment.author !== undefined) {
expect(typeof comment.author).toBe('string');
}
if (comment.author_id !== undefined) {
expect(typeof comment.author_id).toBe('string');
}
}
}, 60000);
test('should include engagement metrics when available', async () => {
const commentsJson = await getVideoComments(testUrl, 5, 'top', CONFIG);
const data: CommentsResponse = JSON.parse(commentsJson);
if (data.comments.length > 0) {
// At least one top comment should have like_count
const hasLikes = data.comments.some(c =>
c.like_count !== undefined && typeof c.like_count === 'number'
);
// This is optional - some comments may not have likes
expect(hasLikes || data.comments.length > 0).toBe(true);
}
}, 60000);
test('should handle boolean flags correctly', async () => {
const commentsJson = await getVideoComments(testUrl, 10, 'top', CONFIG);
const data: CommentsResponse = JSON.parse(commentsJson);
for (const comment of data.comments) {
// Boolean flags should be boolean or undefined
if (comment.is_pinned !== undefined) {
expect(typeof comment.is_pinned).toBe('boolean');
}
if (comment.author_is_uploader !== undefined) {
expect(typeof comment.author_is_uploader).toBe('boolean');
}
if (comment.author_is_verified !== undefined) {
expect(typeof comment.author_is_verified).toBe('boolean');
}
}
}, 60000);
});
});

View File

@ -0,0 +1,179 @@
// @ts-nocheck
// @jest-environment node
import { describe, test, expect, beforeEach, afterEach, jest } from '@jest/globals';
import * as fs from 'fs';
import * as path from 'path';
import * as os from 'os';
// Store original env
const originalEnv = { ...process.env };
describe('Cookie Configuration', () => {
beforeEach(() => {
// Reset environment before each test
process.env = { ...originalEnv };
// Clear module cache to reload config
jest.resetModules();
});
afterEach(() => {
process.env = originalEnv;
});
describe('getCookieArgs', () => {
test('returns empty array when no cookies configured', async () => {
const { getCookieArgs, loadConfig } = await import('../config.js');
const config = loadConfig();
const args = getCookieArgs(config);
expect(args).toEqual([]);
});
test('returns --cookies args when file is configured', async () => {
// Create a temporary cookie file
const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'cookie-test-'));
const cookieFile = path.join(tempDir, 'cookies.txt');
fs.writeFileSync(cookieFile, '# Netscape HTTP Cookie File\n');
process.env.YTDLP_COOKIES_FILE = cookieFile;
const { getCookieArgs, loadConfig } = await import('../config.js');
const config = loadConfig();
const args = getCookieArgs(config);
expect(args).toEqual(['--cookies', cookieFile]);
// Cleanup
fs.rmSync(tempDir, { recursive: true, force: true });
});
test('returns --cookies-from-browser args when browser is configured', async () => {
process.env.YTDLP_COOKIES_FROM_BROWSER = 'chrome';
const { getCookieArgs, loadConfig } = await import('../config.js');
const config = loadConfig();
const args = getCookieArgs(config);
expect(args).toEqual(['--cookies-from-browser', 'chrome']);
});
test('file takes precedence over browser', async () => {
// Create a temporary cookie file
const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'cookie-test-'));
const cookieFile = path.join(tempDir, 'cookies.txt');
fs.writeFileSync(cookieFile, '# Netscape HTTP Cookie File\n');
process.env.YTDLP_COOKIES_FILE = cookieFile;
process.env.YTDLP_COOKIES_FROM_BROWSER = 'chrome';
const { getCookieArgs, loadConfig } = await import('../config.js');
const config = loadConfig();
const args = getCookieArgs(config);
expect(args).toEqual(['--cookies', cookieFile]);
// Cleanup
fs.rmSync(tempDir, { recursive: true, force: true });
});
test('supports browser with profile', async () => {
process.env.YTDLP_COOKIES_FROM_BROWSER = 'chrome:Profile 1';
const { getCookieArgs, loadConfig } = await import('../config.js');
const config = loadConfig();
const args = getCookieArgs(config);
expect(args).toEqual(['--cookies-from-browser', 'chrome:Profile 1']);
});
test('supports browser with container', async () => {
process.env.YTDLP_COOKIES_FROM_BROWSER = 'firefox::work';
const { getCookieArgs, loadConfig } = await import('../config.js');
const config = loadConfig();
const args = getCookieArgs(config);
expect(args).toEqual(['--cookies-from-browser', 'firefox::work']);
});
});
describe('Cookie Validation', () => {
test('clears invalid cookie file path with warning', async () => {
const consoleSpy = jest.spyOn(console, 'warn').mockImplementation(() => {});
process.env.YTDLP_COOKIES_FILE = '/nonexistent/path/cookies.txt';
const { loadConfig } = await import('../config.js');
const config = loadConfig();
expect(config.cookies.file).toBeUndefined();
expect(consoleSpy).toHaveBeenCalledWith(
expect.stringContaining('Cookie file not found')
);
consoleSpy.mockRestore();
});
test('accepts valid browser names', async () => {
const validBrowsers = ['brave', 'chrome', 'chromium', 'edge', 'firefox', 'opera', 'safari', 'vivaldi', 'whale'];
for (const browser of validBrowsers) {
jest.resetModules();
process.env = { ...originalEnv };
process.env.YTDLP_COOKIES_FROM_BROWSER = browser;
const { loadConfig } = await import('../config.js');
const config = loadConfig();
expect(config.cookies.fromBrowser).toBe(browser);
}
});
test('clears invalid browser name with warning', async () => {
const consoleSpy = jest.spyOn(console, 'warn').mockImplementation(() => {});
process.env.YTDLP_COOKIES_FROM_BROWSER = 'invalidbrowser';
const { loadConfig } = await import('../config.js');
const config = loadConfig();
expect(config.cookies.fromBrowser).toBeUndefined();
expect(consoleSpy).toHaveBeenCalledWith(
expect.stringContaining('Invalid browser name')
);
consoleSpy.mockRestore();
});
test('accepts valid browser with custom path (Flatpak style)', async () => {
// Path format is valid for Flatpak installations
process.env.YTDLP_COOKIES_FROM_BROWSER = 'chrome:~/.var/app/com.google.Chrome/';
const { loadConfig } = await import('../config.js');
const config = loadConfig();
expect(config.cookies.fromBrowser).toBe('chrome:~/.var/app/com.google.Chrome/');
});
test('accepts valid browser with empty profile', async () => {
// chrome: is valid (empty profile means default)
process.env.YTDLP_COOKIES_FROM_BROWSER = 'chrome:';
const { loadConfig } = await import('../config.js');
const config = loadConfig();
expect(config.cookies.fromBrowser).toBe('chrome:');
});
});
describe('VALID_BROWSERS constant', () => {
test('exports valid browsers list', async () => {
const { VALID_BROWSERS } = await import('../config.js');
expect(VALID_BROWSERS).toContain('chrome');
expect(VALID_BROWSERS).toContain('firefox');
expect(VALID_BROWSERS).toContain('edge');
expect(VALID_BROWSERS).toContain('safari');
expect(VALID_BROWSERS.length).toBe(9);
});
});
});

View File

@ -1,79 +1,56 @@
// @ts-nocheck
// @jest-environment node
import { jest } from '@jest/globals';
import { describe, test, expect, beforeAll, afterAll, beforeEach } from '@jest/globals';
import * as path from 'path';
import { describe, test, expect } from '@jest/globals';
import * as os from 'os';
import * as path from 'path';
import { downloadVideo } from '../modules/video.js';
import { CONFIG } from '../config.js';
import * as fs from 'fs';
// 簡化 mock
jest.mock('spawn-rx', () => ({
spawnPromise: jest.fn().mockImplementation(async (cmd, args) => {
if (args.includes('--get-filename')) {
return 'mock_video.mp4';
}
return 'Download completed';
})
}));
jest.mock('rimraf', () => ({
rimraf: { sync: jest.fn() }
}));
import { downloadVideo } from '../index.mts';
// 設置 Python 環境
process.env.PYTHONPATH = '';
process.env.PYTHONHOME = '';
describe('downloadVideo', () => {
const mockTimestamp = '2024-03-20_12-30-00';
let originalDateToISOString: () => string;
const testUrl = 'https://www.youtube.com/watch?v=jNQXAC9IVRw';
const testConfig = {
...CONFIG,
file: {
...CONFIG.file,
downloadsDir: path.join(os.tmpdir(), 'yt-dlp-test-downloads'),
tempDirPrefix: 'yt-dlp-test-'
}
};
// 全局清理
afterAll(done => {
// 清理所有計時器
jest.useRealTimers();
// 確保所有異步操作完成
process.nextTick(done);
beforeEach(async () => {
await fs.promises.mkdir(testConfig.file.downloadsDir, { recursive: true });
});
beforeAll(() => {
originalDateToISOString = Date.prototype.toISOString;
Date.prototype.toISOString = jest.fn(() => '2024-03-20T12:30:00.000Z');
});
afterAll(() => {
Date.prototype.toISOString = originalDateToISOString;
});
beforeEach(() => {
jest.clearAllMocks();
afterEach(async () => {
await fs.promises.rm(testConfig.file.downloadsDir, { recursive: true, force: true });
});
test('downloads video successfully with correct format', async () => {
const result = await downloadVideo('https://www.youtube.com/watch?v=dQw4w9WgXcQ');
const result = await downloadVideo(testUrl, testConfig);
expect(result).toContain('Video successfully downloaded');
// 驗證基本功能
expect(result).toMatch(/Video successfully downloaded as/);
expect(result).toContain(mockTimestamp);
expect(result).toContain(os.homedir());
expect(result).toContain('Downloads');
});
test('handles special characters in video URL', async () => {
// 使用有效的視頻 ID但包含需要編碼的字符
const result = await downloadVideo('https://www.youtube.com/watch?v=dQw4w9WgXcQ&title=特殊字符');
expect(result).toMatch(/Video successfully downloaded as/);
expect(result).toContain(mockTimestamp);
});
const files = await fs.promises.readdir(testConfig.file.downloadsDir);
expect(files.length).toBeGreaterThan(0);
expect(files[0]).toMatch(/\.(mp4|webm|mkv)$/);
}, 30000);
test('uses correct resolution format', async () => {
const resolutions = ['480p', '720p', '1080p', 'best'];
const result = await downloadVideo(testUrl, testConfig, '1080p');
expect(result).toContain('Video successfully downloaded');
// 使用 Promise.all 並行執行測試
const results = await Promise.all(resolutions.map(resolution => downloadVideo(
'https://www.youtube.com/watch?v=dQw4w9WgXcQ',
resolution
)));
results.forEach(result => {
expect(result).toMatch(/Video successfully downloaded as/);
});
const files = await fs.promises.readdir(testConfig.file.downloadsDir);
expect(files.length).toBeGreaterThan(0);
expect(files[0]).toMatch(/\.(mp4|webm|mkv)$/);
}, 30000);
test('handles invalid URL', async () => {
await expect(downloadVideo('invalid-url', testConfig))
.rejects
.toThrow();
});
});

View File

@ -0,0 +1,193 @@
// @ts-nocheck
// @jest-environment node
import { describe, test, expect, beforeAll } from '@jest/globals';
import { getVideoMetadata, getVideoMetadataSummary } from '../modules/metadata.js';
import type { VideoMetadata } from '../modules/metadata.js';
import { CONFIG } from '../config.js';
// 設置 Python 環境
process.env.PYTHONPATH = '';
process.env.PYTHONHOME = '';
describe('Video Metadata Extraction', () => {
const testUrl = 'https://www.youtube.com/watch?v=jNQXAC9IVRw';
describe('getVideoMetadata', () => {
test('should extract basic metadata from YouTube video', async () => {
const metadataJson = await getVideoMetadata(testUrl);
const metadata: VideoMetadata = JSON.parse(metadataJson);
// 驗證基本字段存在
expect(metadata).toHaveProperty('id');
expect(metadata).toHaveProperty('title');
expect(metadata).toHaveProperty('uploader');
expect(metadata).toHaveProperty('duration');
expect(metadata.id).toBe('jNQXAC9IVRw');
expect(typeof metadata.title).toBe('string');
expect(typeof metadata.uploader).toBe('string');
expect(typeof metadata.duration).toBe('number');
});
test('should extract specific fields when requested', async () => {
const fields = ['id', 'title', 'description', 'channel', 'timestamp'];
const metadataJson = await getVideoMetadata(testUrl, fields);
const metadata = JSON.parse(metadataJson);
// 應該只包含請求的字段
expect(Object.keys(metadata)).toEqual(expect.arrayContaining(fields.filter(f => metadata[f] !== undefined)));
// 不應該包含其他字段(如果它們存在於原始數據中)
expect(metadata).not.toHaveProperty('formats');
expect(metadata).not.toHaveProperty('thumbnails');
});
test('should handle empty fields array gracefully', async () => {
const metadataJson = await getVideoMetadata(testUrl, []);
const metadata = JSON.parse(metadataJson);
// 空數組應該返回空對象
expect(metadata).toEqual({});
});
test('should handle non-existent fields gracefully', async () => {
const fields = ['id', 'title', 'non_existent_field', 'another_fake_field'];
const metadataJson = await getVideoMetadata(testUrl, fields);
const metadata = JSON.parse(metadataJson);
// 應該包含存在的字段
expect(metadata).toHaveProperty('id');
expect(metadata).toHaveProperty('title');
// 不應該包含不存在的字段
expect(metadata).not.toHaveProperty('non_existent_field');
expect(metadata).not.toHaveProperty('another_fake_field');
});
test('should throw error for invalid URL', async () => {
await expect(getVideoMetadata('invalid-url')).rejects.toThrow();
await expect(getVideoMetadata('https://invalid-domain.com/video')).rejects.toThrow();
});
test('should include requested metadata fields from issue #16', async () => {
const fields = ['id', 'title', 'description', 'creators', 'timestamp', 'channel', 'channel_id', 'channel_url'];
const metadataJson = await getVideoMetadata(testUrl, fields);
const metadata = JSON.parse(metadataJson);
// 驗證 issue #16 中請求的字段
expect(metadata).toHaveProperty('id');
expect(metadata).toHaveProperty('title');
expect(metadata.id).toBe('jNQXAC9IVRw');
expect(typeof metadata.title).toBe('string');
// 這些字段可能存在也可能不存在,取決於視頻
if (metadata.description !== undefined) {
expect(typeof metadata.description).toBe('string');
}
if (metadata.creators !== undefined && metadata.creators !== null) {
// creators can be an array or a string depending on the video
expect(Array.isArray(metadata.creators) || typeof metadata.creators === 'string').toBe(true);
}
if (metadata.timestamp !== undefined) {
expect(typeof metadata.timestamp).toBe('number');
}
if (metadata.channel !== undefined) {
expect(typeof metadata.channel).toBe('string');
}
if (metadata.channel_id !== undefined) {
expect(typeof metadata.channel_id).toBe('string');
}
if (metadata.channel_url !== undefined) {
expect(typeof metadata.channel_url).toBe('string');
}
});
});
describe('getVideoMetadataSummary', () => {
test('should generate human-readable summary', async () => {
const summary = await getVideoMetadataSummary(testUrl);
expect(typeof summary).toBe('string');
expect(summary.length).toBeGreaterThan(0);
// 應該包含基本信息
expect(summary).toMatch(/Title:/);
// 可能包含的其他字段
const commonFields = ['Channel:', 'Duration:', 'Views:', 'Upload Date:'];
const hasAtLeastOneField = commonFields.some(field => summary.includes(field));
expect(hasAtLeastOneField).toBe(true);
});
test('should handle videos with different metadata availability', async () => {
const summary = await getVideoMetadataSummary(testUrl);
// 摘要應該是有效的字符串
expect(typeof summary).toBe('string');
expect(summary.trim().length).toBeGreaterThan(0);
// 每行應該有意義的格式 (字段: 值) - 但要注意有些標題可能包含特殊字符
const lines = summary.split('\n').filter(line => line.trim());
expect(lines.length).toBeGreaterThan(0);
// 至少應該有一行包含冒號(格式為 "字段: 值"
const hasFormattedLines = lines.some(line => line.includes(':'));
expect(hasFormattedLines).toBe(true);
}, 30000);
test('should throw error for invalid URL', async () => {
await expect(getVideoMetadataSummary('invalid-url')).rejects.toThrow();
}, 30000);
});
describe('Error Handling', () => {
test('should provide helpful error message for unavailable video', async () => {
const unavailableUrl = 'https://www.youtube.com/watch?v=invalid_video_id_123456789';
await expect(getVideoMetadata(unavailableUrl)).rejects.toThrow(/unavailable|private|not available/i);
});
test('should handle network errors gracefully', async () => {
// 使用一個應該引起網路錯誤的 URL
const badNetworkUrl = 'https://httpstat.us/500';
await expect(getVideoMetadata(badNetworkUrl)).rejects.toThrow();
});
test('should handle unsupported URLs', async () => {
const unsupportedUrl = 'https://example.com/not-a-video';
await expect(getVideoMetadata(unsupportedUrl)).rejects.toThrow();
}, 10000);
});
describe('Real-world Integration', () => {
test('should work with different video platforms supported by yt-dlp', async () => {
// 只測試 YouTube因為其他平台的可用性可能會變化
const youtubeUrl = 'https://www.youtube.com/watch?v=jNQXAC9IVRw';
const metadataJson = await getVideoMetadata(youtubeUrl, ['id', 'title', 'extractor']);
const metadata = JSON.parse(metadataJson);
expect(metadata.extractor).toMatch(/youtube/i);
expect(metadata.id).toBe('jNQXAC9IVRw');
});
test('should extract metadata that matches issue #16 requirements', async () => {
const requiredFields = ['id', 'title', 'description', 'creators', 'timestamp', 'channel', 'channel_id', 'channel_url'];
const metadataJson = await getVideoMetadata(testUrl, requiredFields);
const metadata = JSON.parse(metadataJson);
// 驗證至少有基本字段
expect(metadata).toHaveProperty('id');
expect(metadata).toHaveProperty('title');
// 記錄實際返回的字段以便調試
console.log('Available metadata fields for issue #16:', Object.keys(metadata));
// 檢查每個請求的字段是否存在或者有合理的替代
const availableFields = Object.keys(metadata);
const hasRequiredBasics = availableFields.includes('id') && availableFields.includes('title');
expect(hasRequiredBasics).toBe(true);
});
});
});

View File

@ -0,0 +1,69 @@
// @ts-nocheck
// @jest-environment node
import { describe, test, expect } from '@jest/globals';
import { searchVideos } from '../modules/search.js';
import { CONFIG } from '../config.js';
describe('Search functionality tests', () => {
describe('searchVideos', () => {
test('should successfully search for JavaScript tutorials', async () => {
const result = await searchVideos('javascript tutorial', 3, 0, 'markdown', CONFIG);
expect(result).toContain('Found 3 videos');
expect(result).toContain('Channel:');
expect(result).toContain('Duration:');
expect(result).toContain('URL:');
expect(result).toContain('ID:');
expect(result).toContain('https://www.youtube.com/watch?v=');
expect(result).toContain('You can use any URL to download videos, audio, or subtitles!');
}, 30000); // Increase timeout for real network calls
test('should reject empty search queries', async () => {
await expect(searchVideos('', 10, 0, 'markdown', CONFIG)).rejects.toThrow('Search query cannot be empty');
await expect(searchVideos(' ', 10, 0, 'markdown', CONFIG)).rejects.toThrow('Search query cannot be empty');
});
test('should validate maxResults parameter range', async () => {
await expect(searchVideos('test', 0, 0, 'markdown', CONFIG)).rejects.toThrow('Number of results must be between 1 and 50');
await expect(searchVideos('test', 51, 0, 'markdown', CONFIG)).rejects.toThrow('Number of results must be between 1 and 50');
});
test('should handle search with different result counts', async () => {
const result1 = await searchVideos('python programming', 1, 0, 'markdown', CONFIG);
const result5 = await searchVideos('python programming', 5, 0, 'markdown', CONFIG);
expect(result1).toContain('Found 1 video');
expect(result5).toContain('Found 5 videos');
// Count number of video entries (each video has a numbered entry)
const count1 = (result1.match(/^\d+\./gm) || []).length;
const count5 = (result5.match(/^\d+\./gm) || []).length;
expect(count1).toBe(1);
expect(count5).toBe(5);
}, 30000);
test('should return properly formatted results', async () => {
const result = await searchVideos('react tutorial', 2, 0, 'markdown', CONFIG);
// Check for proper formatting
expect(result).toMatch(/Found \d+ videos? \(showing \d+\):/);
expect(result).toMatch(/\d+\. \*\*.*\*\*/); // Numbered list with bold titles
expect(result).toMatch(/📺 Channel: .+/);
expect(result).toMatch(/⏱️ Duration: .+/);
expect(result).toMatch(/🔗 URL: https:\/\/www\.youtube\.com\/watch\?v=.+/);
expect(result).toMatch(/🆔 ID: .+/);
}, 30000);
test('should handle obscure search terms gracefully', async () => {
// Using a very specific and unlikely search term
const result = await searchVideos('asdfghjklqwertyuiopzxcvbnm12345', 1, 0, 'markdown', CONFIG);
// Even obscure terms should return some results, as YouTube's search is quite broad
// But if no results, it should be handled gracefully
expect(typeof result).toBe('string');
expect(result.length).toBeGreaterThan(0);
}, 30000);
});
});

View File

@ -0,0 +1,111 @@
// @ts-nocheck
// @jest-environment node
import { describe, test, expect } from '@jest/globals';
import * as os from 'os';
import * as path from 'path';
import { listSubtitles, downloadSubtitles, downloadTranscript } from '../modules/subtitle.js';
import { cleanSubtitleToTranscript } from '../modules/utils.js';
import { CONFIG } from '../config.js';
import * as fs from 'fs';
describe('Subtitle Functions', () => {
const testUrl = 'https://www.youtube.com/watch?v=jNQXAC9IVRw';
const testConfig = {
...CONFIG,
file: {
...CONFIG.file,
downloadsDir: path.join(os.tmpdir(), 'yt-dlp-test-downloads'),
tempDirPrefix: 'yt-dlp-test-'
}
};
beforeEach(async () => {
await fs.promises.mkdir(testConfig.file.downloadsDir, { recursive: true });
});
afterEach(async () => {
await fs.promises.rm(testConfig.file.downloadsDir, { recursive: true, force: true });
});
describe('listSubtitles', () => {
test('lists available subtitles', async () => {
const result = await listSubtitles(testUrl);
expect(result).toContain('Language');
}, 30000);
test('handles invalid URL', async () => {
await expect(listSubtitles('invalid-url'))
.rejects
.toThrow();
});
});
describe('downloadSubtitles', () => {
test('downloads auto-generated subtitles successfully', async () => {
const result = await downloadSubtitles(testUrl, 'en', testConfig);
expect(result).toContain('WEBVTT');
}, 30000);
test('handles missing language', async () => {
await expect(downloadSubtitles(testUrl, 'xx', testConfig))
.rejects
.toThrow();
});
});
describe('downloadTranscript', () => {
test('downloads and cleans transcript successfully', async () => {
const result = await downloadTranscript(testUrl, 'en', testConfig);
expect(typeof result).toBe('string');
expect(result.length).toBeGreaterThan(0);
expect(result).not.toContain('WEBVTT');
expect(result).not.toContain('-->');
expect(result).not.toMatch(/^\d+$/m);
}, 30000);
test('handles invalid URL', async () => {
await expect(downloadTranscript('invalid-url', 'en', testConfig))
.rejects
.toThrow();
});
});
describe('cleanSubtitleToTranscript', () => {
test('cleans SRT content correctly', () => {
const srtContent = `1
00:00:01,000 --> 00:00:03,000
Hello <i>world</i>
2
00:00:04,000 --> 00:00:06,000
This is a test
3
00:00:07,000 --> 00:00:09,000
<b>Bold text</b> here`;
const result = cleanSubtitleToTranscript(srtContent);
expect(result).toBe('Hello world This is a test Bold text here');
});
test('handles empty content', () => {
const result = cleanSubtitleToTranscript('');
expect(result).toBe('');
});
test('removes timestamps and sequence numbers', () => {
const srtContent = `1
00:00:01,000 --> 00:00:03,000
First line
2
00:00:04,000 --> 00:00:06,000
Second line`;
const result = cleanSubtitleToTranscript(srtContent);
expect(result).not.toContain('00:00');
expect(result).not.toMatch(/^\d+$/);
expect(result).toBe('First line Second line');
});
});
});

View File

@ -0,0 +1,68 @@
// @ts-nocheck
// @jest-environment node
import { describe, test, expect } from '@jest/globals';
import * as os from 'os';
import * as path from 'path';
import { downloadVideo } from '../modules/video.js';
import { CONFIG } from '../config.js';
import * as fs from 'fs';
// 設置 Python 環境
process.env.PYTHONPATH = '';
process.env.PYTHONHOME = '';
describe('downloadVideo with trimming', () => {
const testUrl = 'https://www.youtube.com/watch?v=jNQXAC9IVRw';
const testConfig = {
...CONFIG,
file: {
...CONFIG.file,
downloadsDir: path.join(os.tmpdir(), 'yt-dlp-test-downloads'),
tempDirPrefix: 'yt-dlp-test-'
}
};
beforeEach(async () => {
await fs.promises.mkdir(testConfig.file.downloadsDir, { recursive: true });
});
afterEach(async () => {
await fs.promises.rm(testConfig.file.downloadsDir, { recursive: true, force: true });
});
test('downloads video with start time trimming', async () => {
const result = await downloadVideo(testUrl, testConfig, '720p', '00:00:10');
expect(result).toContain('Video successfully downloaded');
const files = await fs.promises.readdir(testConfig.file.downloadsDir);
expect(files.length).toBeGreaterThan(0);
expect(files[0]).toMatch(/\.(mp4|webm|mkv)$/);
}, 30000);
test('downloads video with end time trimming', async () => {
const result = await downloadVideo(testUrl, testConfig, '720p', undefined, '00:00:20');
expect(result).toContain('Video successfully downloaded');
const files = await fs.promises.readdir(testConfig.file.downloadsDir);
expect(files.length).toBeGreaterThan(0);
expect(files[0]).toMatch(/\.(mp4|webm|mkv)$/);
}, 30000);
test('downloads video with both start and end time trimming', async () => {
const result = await downloadVideo(testUrl, testConfig, '720p', '00:00:10', '00:00:20');
expect(result).toContain('Video successfully downloaded');
const files = await fs.promises.readdir(testConfig.file.downloadsDir);
expect(files.length).toBeGreaterThan(0);
expect(files[0]).toMatch(/\.(mp4|webm|mkv)$/);
}, 30000);
test('downloads video without trimming when no times provided', async () => {
const result = await downloadVideo(testUrl, testConfig, '720p');
expect(result).toContain('Video successfully downloaded');
const files = await fs.promises.readdir(testConfig.file.downloadsDir);
expect(files.length).toBeGreaterThan(0);
expect(files[0]).toMatch(/\.(mp4|webm|mkv)$/);
}, 30000);
});

329
src/config.ts Normal file
View File

@ -0,0 +1,329 @@
import * as os from "os";
import * as path from "path";
import * as fs from "fs";
type DeepPartial<T> = {
[P in keyof T]?: T[P] extends object ? DeepPartial<T[P]> : T[P];
};
/**
* Valid browser names for cookie extraction
*/
export const VALID_BROWSERS = [
'brave', 'chrome', 'chromium', 'edge',
'firefox', 'opera', 'safari', 'vivaldi', 'whale'
] as const;
export type ValidBrowser = typeof VALID_BROWSERS[number];
/**
* Configuration type definitions
*/
export interface Config {
// File-related configuration
file: {
maxFilenameLength: number;
downloadsDir: string;
tempDirPrefix: string;
// Filename processing configuration
sanitize: {
// Character to replace illegal characters
replaceChar: string;
// Suffix when truncating filenames
truncateSuffix: string;
// Regular expression for illegal characters
illegalChars: RegExp;
// List of reserved names
reservedNames: readonly string[];
};
};
// Tool-related configuration
tools: {
required: readonly string[];
};
// Download-related configuration
download: {
defaultResolution: "480p" | "720p" | "1080p" | "best";
defaultAudioFormat: "m4a" | "mp3";
defaultSubtitleLanguage: string;
};
// Response limits
limits: {
characterLimit: number;
maxTranscriptLength: number;
};
// Cookie configuration for authenticated access
cookies: {
// Path to Netscape format cookie file
file?: string;
// Browser name and settings (format: BROWSER[:PROFILE][::CONTAINER])
fromBrowser?: string;
};
}
/**
* Default configuration
*/
const defaultConfig: Config = {
file: {
maxFilenameLength: 50,
downloadsDir: path.join(os.homedir(), "Downloads"),
tempDirPrefix: "ytdlp-",
sanitize: {
replaceChar: '_',
truncateSuffix: '...',
illegalChars: /[<>:"/\\|?*\x00-\x1F]/g, // Windows illegal characters
reservedNames: [
'CON', 'PRN', 'AUX', 'NUL', 'COM1', 'COM2', 'COM3', 'COM4',
'COM5', 'COM6', 'COM7', 'COM8', 'COM9', 'LPT1', 'LPT2',
'LPT3', 'LPT4', 'LPT5', 'LPT6', 'LPT7', 'LPT8', 'LPT9'
]
}
},
tools: {
required: ['yt-dlp']
},
download: {
defaultResolution: "720p",
defaultAudioFormat: "m4a",
defaultSubtitleLanguage: "en"
},
limits: {
characterLimit: 25000, // Standard MCP character limit
maxTranscriptLength: 50000 // Transcripts can be larger
},
cookies: {
file: undefined,
fromBrowser: undefined
}
};
/**
* Load configuration from environment variables
*/
function loadEnvConfig(): DeepPartial<Config> {
const envConfig: DeepPartial<Config> = {};
// File configuration
const fileConfig: DeepPartial<Config['file']> = {
sanitize: {
replaceChar: process.env.YTDLP_SANITIZE_REPLACE_CHAR,
truncateSuffix: process.env.YTDLP_SANITIZE_TRUNCATE_SUFFIX,
illegalChars: (() => {
if (!process.env.YTDLP_SANITIZE_ILLEGAL_CHARS) return undefined;
try {
return new RegExp(process.env.YTDLP_SANITIZE_ILLEGAL_CHARS);
} catch {
console.warn('[yt-dlp-mcp] Invalid regex in YTDLP_SANITIZE_ILLEGAL_CHARS, using default');
return undefined;
}
})(),
reservedNames: process.env.YTDLP_SANITIZE_RESERVED_NAMES?.split(',')
}
};
if (process.env.YTDLP_MAX_FILENAME_LENGTH) {
const parsed = parseInt(process.env.YTDLP_MAX_FILENAME_LENGTH, 10);
if (!isNaN(parsed) && parsed >= 5) {
fileConfig.maxFilenameLength = parsed;
} else {
console.warn('[yt-dlp-mcp] Invalid YTDLP_MAX_FILENAME_LENGTH, using default');
}
}
if (process.env.YTDLP_DOWNLOADS_DIR) {
fileConfig.downloadsDir = process.env.YTDLP_DOWNLOADS_DIR;
}
if (process.env.YTDLP_TEMP_DIR_PREFIX) {
fileConfig.tempDirPrefix = process.env.YTDLP_TEMP_DIR_PREFIX;
}
if (Object.keys(fileConfig).length > 0) {
envConfig.file = fileConfig;
}
// Download configuration
const downloadConfig: Partial<Config['download']> = {};
if (process.env.YTDLP_DEFAULT_RESOLUTION &&
['480p', '720p', '1080p', 'best'].includes(process.env.YTDLP_DEFAULT_RESOLUTION)) {
downloadConfig.defaultResolution = process.env.YTDLP_DEFAULT_RESOLUTION as Config['download']['defaultResolution'];
}
if (process.env.YTDLP_DEFAULT_AUDIO_FORMAT &&
['m4a', 'mp3'].includes(process.env.YTDLP_DEFAULT_AUDIO_FORMAT)) {
downloadConfig.defaultAudioFormat = process.env.YTDLP_DEFAULT_AUDIO_FORMAT as Config['download']['defaultAudioFormat'];
}
if (process.env.YTDLP_DEFAULT_SUBTITLE_LANG) {
downloadConfig.defaultSubtitleLanguage = process.env.YTDLP_DEFAULT_SUBTITLE_LANG;
}
if (Object.keys(downloadConfig).length > 0) {
envConfig.download = downloadConfig;
}
// Cookie configuration
const cookiesConfig: Partial<Config['cookies']> = {};
if (process.env.YTDLP_COOKIES_FILE) {
cookiesConfig.file = process.env.YTDLP_COOKIES_FILE;
}
if (process.env.YTDLP_COOKIES_FROM_BROWSER) {
cookiesConfig.fromBrowser = process.env.YTDLP_COOKIES_FROM_BROWSER;
}
if (Object.keys(cookiesConfig).length > 0) {
envConfig.cookies = cookiesConfig;
}
return envConfig;
}
/**
* Validate configuration
*/
function validateConfig(config: Config): void {
// Validate filename length
if (config.file.maxFilenameLength < 5) {
throw new Error('maxFilenameLength must be at least 5');
}
// Validate downloads directory
if (!config.file.downloadsDir) {
throw new Error('downloadsDir must be specified');
}
// Validate temporary directory prefix
if (!config.file.tempDirPrefix) {
throw new Error('tempDirPrefix must be specified');
}
// Validate default resolution
if (!['480p', '720p', '1080p', 'best'].includes(config.download.defaultResolution)) {
throw new Error('Invalid defaultResolution');
}
// Validate default audio format
if (!['m4a', 'mp3'].includes(config.download.defaultAudioFormat)) {
throw new Error('Invalid defaultAudioFormat');
}
// Validate default subtitle language
if (!/^[a-z]{2,3}(-[A-Z][a-z]{3})?(-[A-Z]{2})?$/i.test(config.download.defaultSubtitleLanguage)) {
throw new Error('Invalid defaultSubtitleLanguage');
}
// Validate cookies (lenient - warnings only)
validateCookiesConfig(config);
}
/**
* Validate cookie configuration (lenient - logs warnings but doesn't throw)
*/
function validateCookiesConfig(config: Config): void {
// Validate cookie file path
if (config.cookies.file) {
if (!fs.existsSync(config.cookies.file)) {
console.warn(`[yt-dlp-mcp] Cookie file not found: ${config.cookies.file}, continuing without cookies`);
config.cookies.file = undefined;
}
}
// Validate browser name only
// Format: BROWSER[:PROFILE_OR_PATH][::CONTAINER]
// We only validate browser name; yt-dlp will validate path/container
if (config.cookies.fromBrowser) {
const browserName = config.cookies.fromBrowser.split(':')[0].toLowerCase();
if (!VALID_BROWSERS.includes(browserName as ValidBrowser)) {
console.warn(`[yt-dlp-mcp] Invalid browser name: ${browserName}. Valid browsers: ${VALID_BROWSERS.join(', ')}`);
config.cookies.fromBrowser = undefined;
}
}
}
/**
* Merge configuration
*/
function mergeConfig(base: Config, override: DeepPartial<Config>): Config {
return {
file: {
maxFilenameLength: override.file?.maxFilenameLength || base.file.maxFilenameLength,
downloadsDir: override.file?.downloadsDir || base.file.downloadsDir,
tempDirPrefix: override.file?.tempDirPrefix || base.file.tempDirPrefix,
sanitize: {
replaceChar: override.file?.sanitize?.replaceChar || base.file.sanitize.replaceChar,
truncateSuffix: override.file?.sanitize?.truncateSuffix || base.file.sanitize.truncateSuffix,
illegalChars: (override.file?.sanitize?.illegalChars || base.file.sanitize.illegalChars) as RegExp,
reservedNames: (override.file?.sanitize?.reservedNames || base.file.sanitize.reservedNames) as readonly string[]
}
},
tools: {
required: (override.tools?.required || base.tools.required) as readonly string[]
},
download: {
defaultResolution: override.download?.defaultResolution || base.download.defaultResolution,
defaultAudioFormat: override.download?.defaultAudioFormat || base.download.defaultAudioFormat,
defaultSubtitleLanguage: override.download?.defaultSubtitleLanguage || base.download.defaultSubtitleLanguage
},
limits: {
characterLimit: override.limits?.characterLimit || base.limits.characterLimit,
maxTranscriptLength: override.limits?.maxTranscriptLength || base.limits.maxTranscriptLength
},
cookies: {
file: override.cookies?.file ?? base.cookies.file,
fromBrowser: override.cookies?.fromBrowser ?? base.cookies.fromBrowser
}
};
}
/**
* Load configuration
*/
export function loadConfig(): Config {
const envConfig = loadEnvConfig();
const config = mergeConfig(defaultConfig, envConfig);
validateConfig(config);
return config;
}
/**
* Safe filename processing function
*/
export function sanitizeFilename(filename: string, config: Config['file']): string {
// Remove illegal characters
let safe = filename.replace(config.sanitize.illegalChars, config.sanitize.replaceChar);
// Check reserved names
const basename = path.parse(safe).name.toUpperCase();
if (config.sanitize.reservedNames.includes(basename)) {
safe = `_${safe}`;
}
// Handle length limitation
if (safe.length > config.maxFilenameLength) {
const ext = path.extname(safe);
const name = safe.slice(0, config.maxFilenameLength - ext.length - config.sanitize.truncateSuffix.length);
safe = `${name}${config.sanitize.truncateSuffix}${ext}`;
}
return safe;
}
/**
* Get cookie-related yt-dlp arguments
* Priority: file > fromBrowser
* @param config Configuration object
* @returns Array of yt-dlp arguments for cookie handling
*/
export function getCookieArgs(config: Config): string[] {
// Guard against missing cookies config
if (!config.cookies) {
return [];
}
// Cookie file takes precedence over browser extraction
if (config.cookies.file) {
return ['--cookies', config.cookies.file];
}
if (config.cookies.fromBrowser) {
return ['--cookies-from-browser', config.cookies.fromBrowser];
}
return [];
}
// Export current configuration instance
export const CONFIG = loadConfig();

File diff suppressed because it is too large Load Diff

81
src/modules/audio.ts Normal file
View File

@ -0,0 +1,81 @@
import { readdirSync } from "fs";
import * as path from "path";
import type { Config } from "../config.js";
import { sanitizeFilename, getCookieArgs } from "../config.js";
import { _spawnPromise, validateUrl, getFormattedTimestamp, isYouTubeUrl } from "./utils.js";
/**
* Downloads audio from a video URL in the best available quality.
*
* @param url - The URL of the video to extract audio from
* @param config - Configuration object for download settings
* @returns Promise resolving to a success message with the downloaded file path
* @throws {Error} When URL is invalid or download fails
*
* @example
* ```typescript
* // Download audio with default settings
* const result = await downloadAudio('https://youtube.com/watch?v=...');
* console.log(result);
*
* // Download audio with custom config
* const customResult = await downloadAudio('https://youtube.com/watch?v=...', {
* file: {
* downloadsDir: '/custom/path',
* // ... other config options
* }
* });
* console.log(customResult);
* ```
*/
export async function downloadAudio(url: string, config: Config): Promise<string> {
const timestamp = getFormattedTimestamp();
if (!validateUrl(url)) {
throw new Error("Invalid or unsupported URL format");
}
try {
const outputTemplate = path.join(
config.file.downloadsDir,
sanitizeFilename(`%(title)s [%(id)s] ${timestamp}`, config.file) + '.%(ext)s'
);
const format = isYouTubeUrl(url)
? "140/bestaudio[ext=m4a]/bestaudio"
: "bestaudio[ext=m4a]/bestaudio[ext=mp3]/bestaudio";
await _spawnPromise("yt-dlp", [
"--ignore-config",
"--no-check-certificate",
"--verbose",
"--progress",
"--newline",
"--no-mtime",
"-f", format,
"--output", outputTemplate,
...getCookieArgs(config),
url
]);
const files = readdirSync(config.file.downloadsDir);
const downloadedFile = files.find(file => file.includes(timestamp));
if (!downloadedFile) {
throw new Error("Download completed but file not found. Check Downloads folder permissions.");
}
return `Audio successfully downloaded as "${downloadedFile}" to ${config.file.downloadsDir}`;
} catch (error) {
if (error instanceof Error) {
if (error.message.includes("Unsupported URL") || error.message.includes("extractor")) {
throw new Error(`Unsupported platform or video URL: ${url}. Ensure the URL is from a supported platform.`);
}
if (error.message.includes("Video unavailable") || error.message.includes("private")) {
throw new Error(`Video is unavailable or private: ${url}. Check the URL and video privacy settings.`);
}
if (error.message.includes("network") || error.message.includes("Connection")) {
throw new Error("Network error during audio extraction. Check your internet connection and retry.");
}
}
throw error;
}
}

283
src/modules/comments.ts Normal file
View File

@ -0,0 +1,283 @@
import type { Config } from "../config.js";
import { getCookieArgs } from "../config.js";
import {
_spawnPromise,
validateUrl
} from "./utils.js";
/**
* Represents a single comment on a video
*/
export interface Comment {
/** Unique comment identifier */
id?: string;
/** Comment text content */
text?: string;
/** Comment author name */
author?: string;
/** Comment author channel ID */
author_id?: string;
/** Comment author channel URL */
author_url?: string;
/** Whether the author is the video uploader */
author_is_uploader?: boolean;
/** Whether author is verified */
author_is_verified?: boolean;
/** Comment like count */
like_count?: number;
/** Whether comment is pinned */
is_pinned?: boolean;
/** Whether comment is marked as favorite by uploader */
is_favorited?: boolean;
/** Parent comment ID (for replies) */
parent?: string;
/** Unix timestamp of comment */
timestamp?: number;
/** Human-readable time ago string */
time_text?: string;
/** Additional fields that might be present */
[key: string]: unknown;
}
/**
* Response structure for video comments
*/
export interface CommentsResponse {
/** Total number of comments returned */
count: number;
/** Whether there are more comments available */
has_more: boolean;
/** Array of comment objects */
comments: Comment[];
/** Truncation indicator */
_truncated?: boolean;
/** Truncation message */
_message?: string;
}
/**
* Sort order for comments
*/
export type CommentSortOrder = "top" | "new";
/**
* Extract video comments using yt-dlp.
* Uses yt-dlp's --write-comments and --dump-json flags to get comments.
*
* @param url - The URL of the video to extract comments from
* @param maxComments - Maximum number of comments to retrieve (default: 20)
* @param sortOrder - Sort order: "top" for most liked, "new" for newest (default: "top")
* @param config - Configuration object
* @returns Promise resolving to JSON string with comments data
* @throws {Error} When URL is invalid or comment extraction fails
*
* @example
* ```typescript
* // Get top 20 comments
* const comments = await getVideoComments('https://youtube.com/watch?v=...');
* console.log(comments);
*
* // Get newest 50 comments
* const newComments = await getVideoComments(
* 'https://youtube.com/watch?v=...',
* 50,
* 'new'
* );
* ```
*/
export async function getVideoComments(
url: string,
maxComments: number = 20,
sortOrder: CommentSortOrder = "top",
_config?: Config
): Promise<string> {
// Validate the URL
if (!validateUrl(url)) {
throw new Error("Invalid or unsupported URL format");
}
const args = [
"--dump-json",
"--no-warnings",
"--no-check-certificate",
"--write-comments",
"--extractor-args", `youtube:comment_sort=${sortOrder};max_comments=${maxComments},all,all`,
"--skip-download",
...(_config ? getCookieArgs(_config) : []),
url
];
try {
// Execute yt-dlp to get metadata with comments
const output = await _spawnPromise("yt-dlp", args);
// Parse the JSON output
const metadata = JSON.parse(output);
// Extract comments from metadata
const rawComments: Comment[] = metadata.comments || [];
// Limit to maxComments
const comments = rawComments.slice(0, maxComments);
// Build response
const response: CommentsResponse = {
count: comments.length,
has_more: rawComments.length > maxComments,
comments: comments.map(comment => ({
id: comment.id,
text: comment.text,
author: comment.author,
author_id: comment.author_id,
author_url: comment.author_url,
author_is_uploader: comment.author_is_uploader,
author_is_verified: comment.author_is_verified,
like_count: comment.like_count,
is_pinned: comment.is_pinned,
is_favorited: comment.is_favorited,
parent: comment.parent,
timestamp: comment.timestamp,
time_text: comment.time_text
}))
};
let result = JSON.stringify(response, null, 2);
// Check character limit
if (_config && result.length > _config.limits.characterLimit) {
// Reduce comments to fit within limit
let truncatedComments = [...response.comments];
while (result.length > _config.limits.characterLimit && truncatedComments.length > 1) {
truncatedComments = truncatedComments.slice(0, -1);
const truncatedResponse: CommentsResponse = {
count: truncatedComments.length,
has_more: true,
comments: truncatedComments,
_truncated: true,
_message: `Response truncated to ${truncatedComments.length} comments due to size limits. Use smaller maxComments value.`
};
result = JSON.stringify(truncatedResponse, null, 2);
}
}
return result;
} catch (error) {
if (error instanceof Error) {
// Handle common yt-dlp errors with actionable messages
if (error.message.includes("Video unavailable") || error.message.includes("private")) {
throw new Error(`Video is unavailable or private: ${url}. Check the URL and video privacy settings.`);
} else if (error.message.includes("Unsupported URL") || error.message.includes("extractor")) {
throw new Error(`Unsupported platform or video URL: ${url}. Comments extraction is primarily supported for YouTube.`);
} else if (error.message.includes("network") || error.message.includes("Connection")) {
throw new Error("Network error while extracting comments. Check your internet connection and retry.");
} else if (error.message.includes("comments are disabled") || error.message.includes("Comments are turned off")) {
throw new Error(`Comments are disabled for this video: ${url}`);
} else if (error.message.includes("Sign in") || error.message.includes("age")) {
throw new Error(`This video requires authentication to view comments. Configure cookies in your settings.`);
} else {
throw new Error(`Failed to extract video comments: ${error.message}. Verify the URL is correct.`);
}
}
throw new Error(`Failed to extract video comments from ${url}`);
}
}
/**
* Get a human-readable summary of video comments.
* This is useful for quick overview without overwhelming JSON output.
*
* @param url - The URL of the video to extract comments from
* @param maxComments - Maximum number of comments to include (default: 10)
* @param config - Configuration object
* @returns Promise resolving to a formatted summary string
* @throws {Error} When URL is invalid or comment extraction fails
*
* @example
* ```typescript
* const summary = await getVideoCommentsSummary('https://youtube.com/watch?v=...');
* console.log(summary);
* // Output:
* // Video Comments (10 shown)
* // ─────────────────────────
* //
* // 👤 John Doe (2 days ago) ❤️ 1,234 likes
* // This is an awesome video!
* //
* // 👤 Jane Smith (1 week ago) ❤️ 567 likes
* // Great content, keep it up!
* ```
*/
export async function getVideoCommentsSummary(
url: string,
maxComments: number = 10,
_config?: Config
): Promise<string> {
try {
// Get the comments
const commentsJson = await getVideoComments(url, maxComments, "top", _config);
const data: CommentsResponse = JSON.parse(commentsJson);
// Format comments into a readable summary
const lines: string[] = [];
lines.push(`Video Comments (${data.count} shown)`);
lines.push('─'.repeat(30));
lines.push('');
for (const comment of data.comments) {
// Build author line with indicators
let authorLine = `Author: ${comment.author || 'Unknown'}`;
if (comment.author_is_uploader) {
authorLine += ' [UPLOADER]';
}
if (comment.author_is_verified) {
authorLine += ' [VERIFIED]';
}
if (comment.is_pinned) {
authorLine += ' [PINNED]';
}
// Time info
if (comment.time_text) {
authorLine += ` (${comment.time_text})`;
}
// Likes
if (comment.like_count !== undefined && comment.like_count > 0) {
authorLine += ` - ${comment.like_count.toLocaleString()} likes`;
}
lines.push(authorLine);
// Comment text (truncate if too long)
if (comment.text) {
const text = comment.text.length > 300
? comment.text.substring(0, 300) + '...'
: comment.text;
lines.push(text);
}
// Note if this is a reply
if (comment.parent && comment.parent !== 'root') {
lines.push(`(Reply to comment ${comment.parent})`);
}
lines.push('');
}
if (data.has_more) {
lines.push('---');
lines.push('More comments available. Increase maxComments to see more.');
}
return lines.join('\n');
} catch (error) {
// Re-throw errors from getVideoComments with context
if (error instanceof Error) {
throw error;
}
throw new Error(`Failed to generate comments summary for ${url}`);
}
}

345
src/modules/metadata.ts Normal file
View File

@ -0,0 +1,345 @@
import type { Config } from "../config.js";
import { getCookieArgs } from "../config.js";
import {
_spawnPromise,
validateUrl
} from "./utils.js";
/**
* Video metadata interface containing all fields that can be extracted
*/
export interface VideoMetadata {
// Basic video information
id?: string;
title?: string;
fulltitle?: string;
description?: string;
alt_title?: string;
display_id?: string;
// Creator/uploader information
uploader?: string;
uploader_id?: string;
uploader_url?: string;
creators?: string[];
creator?: string;
// Channel information
channel?: string;
channel_id?: string;
channel_url?: string;
channel_follower_count?: number;
channel_is_verified?: boolean;
// Timestamps and dates
timestamp?: number;
upload_date?: string;
release_timestamp?: number;
release_date?: string;
release_year?: number;
modified_timestamp?: number;
modified_date?: string;
// Video properties
duration?: number;
duration_string?: string;
view_count?: number;
concurrent_view_count?: number;
like_count?: number;
dislike_count?: number;
repost_count?: number;
average_rating?: number;
comment_count?: number;
age_limit?: number;
// Content classification
live_status?: string;
is_live?: boolean;
was_live?: boolean;
playable_in_embed?: string;
availability?: string;
media_type?: string;
// Playlist information
playlist_id?: string;
playlist_title?: string;
playlist?: string;
playlist_count?: number;
playlist_index?: number;
playlist_autonumber?: number;
playlist_uploader?: string;
playlist_uploader_id?: string;
playlist_channel?: string;
playlist_channel_id?: string;
// URLs and technical info
webpage_url?: string;
webpage_url_domain?: string;
webpage_url_basename?: string;
original_url?: string;
filename?: string;
ext?: string;
// Content metadata
categories?: string[];
tags?: string[];
cast?: string[];
location?: string;
license?: string;
// Series/episode information
series?: string;
series_id?: string;
season?: string;
season_number?: number;
season_id?: string;
episode?: string;
episode_number?: number;
episode_id?: string;
// Music/track information
track?: string;
track_number?: number;
track_id?: string;
artists?: string[];
artist?: string;
genres?: string[];
genre?: string;
composers?: string[];
composer?: string;
album?: string;
album_type?: string;
album_artists?: string[];
album_artist?: string;
disc_number?: number;
// Technical metadata
extractor?: string;
epoch?: number;
// Additional fields that might be present
[key: string]: unknown;
}
/**
* Extract video metadata without downloading the actual video content.
* Uses yt-dlp's --dump-json flag to get comprehensive metadata.
*
* @param url - The URL of the video to extract metadata from
* @param fields - Optional array of specific fields to extract. If not provided, returns all available metadata
* @param config - Configuration object (currently unused but kept for consistency)
* @returns Promise resolving to formatted metadata string or JSON object
* @throws {Error} When URL is invalid or metadata extraction fails
*
* @example
* ```typescript
* // Get all metadata
* const metadata = await getVideoMetadata('https://youtube.com/watch?v=...');
* console.log(metadata);
*
* // Get specific fields only
* const specificData = await getVideoMetadata(
* 'https://youtube.com/watch?v=...',
* ['id', 'title', 'description', 'channel']
* );
* console.log(specificData);
* ```
*/
export async function getVideoMetadata(
url: string,
fields?: string[],
_config?: Config
): Promise<string> {
// Validate the URL
if (!validateUrl(url)) {
throw new Error("Invalid or unsupported URL format");
}
const args = [
"--dump-json",
"--no-warnings",
"--no-check-certificate",
...(_config ? getCookieArgs(_config) : []),
url
];
try {
// Execute yt-dlp to get metadata
const output = await _spawnPromise("yt-dlp", args);
// Parse the JSON output
const metadata: VideoMetadata = JSON.parse(output);
// If specific fields are requested, filter the metadata
if (fields !== undefined && fields.length >= 0) {
const filteredMetadata: Partial<VideoMetadata> & { _truncated?: boolean; _message?: string } = {};
for (const field of fields) {
if (metadata.hasOwnProperty(field)) {
filteredMetadata[field as keyof VideoMetadata] = metadata[field as keyof VideoMetadata];
}
}
let result = JSON.stringify(filteredMetadata, null, 2);
// Check character limit
if (_config && result.length > _config.limits.characterLimit) {
// Add truncation info inside JSON before truncating
filteredMetadata._truncated = true;
filteredMetadata._message = "Response truncated. Specify fewer fields to see complete data.";
result = JSON.stringify(filteredMetadata, null, 2);
// If still too long, truncate the string content
if (result.length > _config.limits.characterLimit) {
result = result.substring(0, _config.limits.characterLimit) + '\n... }';
}
}
return result;
}
// Return formatted JSON string with all metadata
let result = JSON.stringify(metadata, null, 2);
// Check character limit for full metadata
if (_config && result.length > _config.limits.characterLimit) {
// Try to return essential fields only
const essentialFields = ['id', 'title', 'description', 'channel', 'channel_id', 'uploader',
'duration', 'duration_string', 'view_count', 'like_count',
'upload_date', 'tags', 'categories', 'webpage_url'];
const essentialMetadata: Partial<VideoMetadata> & { _truncated?: boolean; _message?: string } = {};
for (const field of essentialFields) {
if (metadata.hasOwnProperty(field)) {
essentialMetadata[field as keyof VideoMetadata] = metadata[field as keyof VideoMetadata];
}
}
// Add truncation info inside the JSON object
essentialMetadata._truncated = true;
essentialMetadata._message = 'Full metadata truncated to essential fields. Use the "fields" parameter to request specific fields.';
result = JSON.stringify(essentialMetadata, null, 2);
}
return result;
} catch (error) {
if (error instanceof Error) {
// Handle common yt-dlp errors with actionable messages
if (error.message.includes("Video unavailable") || error.message.includes("private")) {
throw new Error(`Video is unavailable or private: ${url}. Check the URL and video privacy settings.`);
} else if (error.message.includes("Unsupported URL") || error.message.includes("extractor")) {
throw new Error(`Unsupported platform or video URL: ${url}. Ensure the URL is from a supported platform like YouTube.`);
} else if (error.message.includes("network") || error.message.includes("Connection")) {
throw new Error("Network error while extracting metadata. Check your internet connection and retry.");
} else {
throw new Error(`Failed to extract video metadata: ${error.message}. Verify the URL is correct.`);
}
}
throw new Error(`Failed to extract video metadata from ${url}`);
}
}
/**
* Get a human-readable summary of key video metadata fields.
* This is useful for quick overview without overwhelming JSON output.
*
* @param url - The URL of the video to extract metadata from
* @param config - Configuration object (currently unused but kept for consistency)
* @returns Promise resolving to a formatted summary string
* @throws {Error} When URL is invalid or metadata extraction fails
*
* @example
* ```typescript
* const summary = await getVideoMetadataSummary('https://youtube.com/watch?v=...');
* console.log(summary);
* // Output:
* // Title: Example Video Title
* // Channel: Example Channel
* // Duration: 10:30
* // Views: 1,234,567
* // Upload Date: 2023-12-01
* // Description: This is an example video...
* ```
*/
export async function getVideoMetadataSummary(
url: string,
_config?: Config
): Promise<string> {
try {
// Get the full metadata first
const metadataJson = await getVideoMetadata(url, undefined, _config);
const metadata: VideoMetadata = JSON.parse(metadataJson);
// Format key fields into a readable summary
const lines: string[] = [];
if (metadata.title) {
lines.push(`Title: ${metadata.title}`);
}
if (metadata.channel) {
lines.push(`Channel: ${metadata.channel}`);
}
if (metadata.uploader && metadata.uploader !== metadata.channel) {
lines.push(`Uploader: ${metadata.uploader}`);
}
if (metadata.duration_string) {
lines.push(`Duration: ${metadata.duration_string}`);
} else if (metadata.duration) {
const hours = Math.floor(metadata.duration / 3600);
const minutes = Math.floor((metadata.duration % 3600) / 60);
const seconds = metadata.duration % 60;
const durationStr = hours > 0
? `${hours}:${minutes.toString().padStart(2, '0')}:${seconds.toString().padStart(2, '0')}`
: `${minutes}:${seconds.toString().padStart(2, '0')}`;
lines.push(`Duration: ${durationStr}`);
}
if (metadata.view_count !== undefined) {
lines.push(`Views: ${metadata.view_count.toLocaleString()}`);
}
if (metadata.like_count !== undefined) {
lines.push(`Likes: ${metadata.like_count.toLocaleString()}`);
}
if (metadata.upload_date) {
// Format YYYYMMDD to YYYY-MM-DD
const dateStr = metadata.upload_date;
if (dateStr.length === 8) {
const formatted = `${dateStr.substring(0, 4)}-${dateStr.substring(4, 6)}-${dateStr.substring(6, 8)}`;
lines.push(`Upload Date: ${formatted}`);
} else {
lines.push(`Upload Date: ${dateStr}`);
}
}
if (metadata.live_status && metadata.live_status !== 'not_live') {
lines.push(`Status: ${metadata.live_status.replace('_', ' ')}`);
}
if (metadata.tags && metadata.tags.length > 0) {
lines.push(`Tags: ${metadata.tags.slice(0, 5).join(', ')}${metadata.tags.length > 5 ? '...' : ''}`);
}
if (metadata.description) {
// Truncate description to first 200 characters
const desc = metadata.description.length > 200
? metadata.description.substring(0, 200) + '...'
: metadata.description;
lines.push(`Description: ${desc}`);
}
return lines.join('\n');
} catch (error) {
// Re-throw errors from getVideoMetadata with context
if (error instanceof Error) {
throw error;
}
throw new Error(`Failed to generate metadata summary for ${url}`);
}
}

255
src/modules/search.ts Normal file
View File

@ -0,0 +1,255 @@
import { _spawnPromise } from "./utils.js";
import type { Config } from "../config.js";
import { getCookieArgs } from "../config.js";
/**
* Upload date filter type
*/
export type UploadDateFilter = "hour" | "today" | "week" | "month" | "year";
/**
* YouTube search result interface
*/
export interface SearchResult {
title: string;
id: string;
url: string;
uploader?: string;
duration?: string;
viewCount?: string;
uploadDate?: string;
}
/**
* Map upload date filter to YouTube's sp parameter
* These are base64-encoded protobuf parameters
*/
const UPLOAD_DATE_FILTER_MAP: Record<UploadDateFilter, string> = {
hour: "EgIIAQ%3D%3D", // Last hour
today: "EgIIAg%3D%3D", // Today
week: "EgIIAw%3D%3D", // This week
month: "EgIIBA%3D%3D", // This month
year: "EgIIBQ%3D%3D", // This year
};
/**
* Search YouTube videos
* @param query Search keywords
* @param maxResults Maximum number of results (1-50)
* @param offset Number of results to skip for pagination
* @param responseFormat Output format ('json' or 'markdown')
* @param config Configuration object
* @param uploadDateFilter Optional filter by upload date
* @returns Search results formatted as string
*/
export async function searchVideos(
query: string,
maxResults: number = 10,
offset: number = 0,
responseFormat: "json" | "markdown" = "markdown",
config: Config,
uploadDateFilter?: UploadDateFilter
): Promise<string> {
// Validate parameters
if (!query || query.trim().length === 0) {
throw new Error("Search query cannot be empty");
}
if (maxResults < 1 || maxResults > 50) {
throw new Error("Number of results must be between 1 and 50");
}
if (offset < 0) {
throw new Error("Offset cannot be negative");
}
const cleanQuery = query.trim();
// Request more results to support offset
const totalToFetch = maxResults + offset;
try {
let args: string[];
if (uploadDateFilter && UPLOAD_DATE_FILTER_MAP[uploadDateFilter]) {
// Use YouTube URL with sp parameter for date filtering
const encodedQuery = encodeURIComponent(cleanQuery);
const spParam = UPLOAD_DATE_FILTER_MAP[uploadDateFilter];
const searchUrl = `https://www.youtube.com/results?search_query=${encodedQuery}&sp=${spParam}`;
args = [
searchUrl,
"--flat-playlist",
"--print", "title",
"--print", "id",
"--print", "uploader",
"--print", "duration",
"--no-download",
"--quiet",
"--playlist-end", String(totalToFetch),
...getCookieArgs(config)
];
} else {
// Use ytsearch prefix for regular search
const searchQuery = `ytsearch${totalToFetch}:${cleanQuery}`;
args = [
searchQuery,
"--print", "title",
"--print", "id",
"--print", "uploader",
"--print", "duration",
"--no-download",
"--quiet",
...getCookieArgs(config)
];
}
const result = await _spawnPromise(config.tools.required[0], args);
if (!result || result.trim().length === 0) {
return "No videos found";
}
// Parse results
const lines = result.trim().split('\n');
const allResults: SearchResult[] = [];
// Each video has 4 lines of data: title, id, uploader, duration
for (let i = 0; i < lines.length; i += 4) {
if (i + 3 < lines.length) {
const title = lines[i]?.trim();
const id = lines[i + 1]?.trim();
const uploader = lines[i + 2]?.trim();
const duration = lines[i + 3]?.trim();
if (title && id) {
const url = `https://www.youtube.com/watch?v=${id}`;
allResults.push({
title,
id,
url,
uploader: uploader || "Unknown",
duration: duration || "Unknown"
});
}
}
}
// Apply offset and limit
const paginatedResults = allResults.slice(offset, offset + maxResults);
const hasMore = allResults.length > offset + maxResults;
if (paginatedResults.length === 0) {
return "No videos found";
}
// Format output based on response format
if (responseFormat === "json") {
const response = {
total: allResults.length,
count: paginatedResults.length,
offset: offset,
videos: paginatedResults,
has_more: hasMore,
...(hasMore && { next_offset: offset + maxResults }),
...(uploadDateFilter && { upload_date_filter: uploadDateFilter })
};
let output = JSON.stringify(response, null, 2);
// Check character limit
if (output.length > config.limits.characterLimit) {
// Truncate videos array
const truncatedCount = Math.ceil(paginatedResults.length / 2);
const truncatedResponse = {
...response,
count: truncatedCount,
videos: paginatedResults.slice(0, truncatedCount),
truncated: true,
truncation_message: `Response truncated from ${paginatedResults.length} to ${truncatedCount} results. Use offset parameter or reduce maxResults to see more.`
};
output = JSON.stringify(truncatedResponse, null, 2);
}
return output;
} else {
// Markdown format
let output = `Found ${allResults.length} video${allResults.length > 1 ? 's' : ''} (showing ${paginatedResults.length})`;
if (uploadDateFilter) {
const filterLabels: Record<UploadDateFilter, string> = {
hour: "last hour",
today: "today",
week: "this week",
month: "this month",
year: "this year"
};
output += ` from ${filterLabels[uploadDateFilter]}`;
}
output += `:\n\n`;
paginatedResults.forEach((video, index) => {
output += `${offset + index + 1}. **${video.title}**\n`;
output += ` 📺 Channel: ${video.uploader}\n`;
output += ` ⏱️ Duration: ${video.duration}\n`;
output += ` 🔗 URL: ${video.url}\n`;
output += ` 🆔 ID: ${video.id}\n\n`;
});
// Add pagination info
if (offset > 0 || hasMore) {
output += `\n📊 Pagination: Showing results ${offset + 1}-${offset + paginatedResults.length} of ${allResults.length}`;
if (hasMore) {
output += ` (${allResults.length - offset - paginatedResults.length} more available)`;
}
output += '\n';
}
output += "\n💡 You can use any URL to download videos, audio, or subtitles!";
// Check character limit
if (output.length > config.limits.characterLimit) {
output = output.substring(0, config.limits.characterLimit);
output += "\n\n⚠ Response truncated. Use offset parameter or reduce maxResults to see more results.";
}
return output;
}
} catch (error) {
if (error instanceof Error) {
// Provide more actionable error messages
if (error.message.includes("network") || error.message.includes("Network")) {
throw new Error("Network error while searching. Check your internet connection and retry.");
}
if (error.message.includes("429") || error.message.includes("rate limit")) {
throw new Error("YouTube rate limit exceeded. Wait 60 seconds before searching again.");
}
throw new Error(`Search failed: ${error.message}. Try a different query or reduce maxResults.`);
}
throw new Error(`Error searching videos: ${String(error)}`);
}
}
/**
* Search videos on specific platform (future expansion feature)
* @param query Search keywords
* @param platform Platform name ('youtube', 'bilibili', etc.)
* @param maxResults Maximum number of results
* @param offset Number of results to skip
* @param responseFormat Output format
* @param config Configuration object
*/
export async function searchByPlatform(
query: string,
platform: string = 'youtube',
maxResults: number = 10,
offset: number = 0,
responseFormat: "json" | "markdown" = "markdown",
config: Config
): Promise<string> {
// Currently only supports YouTube, can be expanded to other platforms in the future
if (platform.toLowerCase() !== 'youtube') {
throw new Error(`Currently only supports YouTube search, ${platform} is not supported`);
}
return searchVideos(query, maxResults, offset, responseFormat, config);
}

229
src/modules/subtitle.ts Normal file
View File

@ -0,0 +1,229 @@
import * as fs from "fs";
import * as path from "path";
import * as os from "os";
import type { Config } from '../config.js';
import { getCookieArgs } from '../config.js';
import { _spawnPromise, validateUrl, cleanSubtitleToTranscript } from "./utils.js";
/**
* Lists all available subtitles for a video.
*
* @param url - The URL of the video
* @param config - Configuration object (optional, for cookie support)
* @returns Promise resolving to a string containing the list of available subtitles
* @throws {Error} When URL is invalid or subtitle listing fails
*
* @example
* ```typescript
* try {
* const subtitles = await listSubtitles('https://youtube.com/watch?v=...', config);
* console.log('Available subtitles:', subtitles);
* } catch (error) {
* console.error('Failed to list subtitles:', error);
* }
* ```
*/
export async function listSubtitles(url: string, config?: Config): Promise<string> {
if (!validateUrl(url)) {
throw new Error('Invalid or unsupported URL format. Please provide a valid video URL (e.g., https://youtube.com/watch?v=...)');
}
try {
const args = [
'--ignore-config',
'--list-subs',
'--write-auto-sub',
'--skip-download',
'--verbose',
...(config ? getCookieArgs(config) : []),
url
];
const output = await _spawnPromise('yt-dlp', args);
return output;
} catch (error) {
if (error instanceof Error) {
if (error.message.includes("Unsupported URL") || error.message.includes("not supported")) {
throw new Error(`Unsupported platform or video URL: ${url}. Ensure the URL is from a supported platform like YouTube.`);
}
if (error.message.includes("Video unavailable") || error.message.includes("private")) {
throw new Error(`Video is unavailable or private: ${url}. Check the URL and video privacy settings.`);
}
if (error.message.includes("network") || error.message.includes("Connection")) {
throw new Error("Network error while fetching subtitles. Check your internet connection and retry.");
}
}
throw error;
}
}
/**
* Downloads subtitles for a video in the specified language.
*
* @param url - The URL of the video
* @param language - Language code (e.g., 'en', 'zh-Hant', 'ja')
* @param config - Configuration object
* @returns Promise resolving to the subtitle content
* @throws {Error} When URL is invalid, language is not available, or download fails
*
* @example
* ```typescript
* try {
* // Download English subtitles
* const enSubs = await downloadSubtitles('https://youtube.com/watch?v=...', 'en', config);
* console.log('English subtitles:', enSubs);
*
* // Download Traditional Chinese subtitles
* const zhSubs = await downloadSubtitles('https://youtube.com/watch?v=...', 'zh-Hant', config);
* console.log('Chinese subtitles:', zhSubs);
* } catch (error) {
* if (error.message.includes('No subtitle files found')) {
* console.warn('No subtitles available in the requested language');
* } else {
* console.error('Failed to download subtitles:', error);
* }
* }
* ```
*/
export async function downloadSubtitles(
url: string,
language: string,
config: Config
): Promise<string> {
if (!validateUrl(url)) {
throw new Error('Invalid or unsupported URL format. Please provide a valid video URL (e.g., https://youtube.com/watch?v=...)');
}
const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), config.file.tempDirPrefix));
try {
await _spawnPromise('yt-dlp', [
'--ignore-config',
'--write-sub',
'--write-auto-sub',
'--sub-lang', language,
'--skip-download',
'--output', path.join(tempDir, '%(title)s.%(ext)s'),
...getCookieArgs(config),
url
]);
const subtitleFiles = fs.readdirSync(tempDir)
.filter(file => file.endsWith('.vtt'));
if (subtitleFiles.length === 0) {
throw new Error(`No subtitle files found for language '${language}'. Use ytdlp_list_subtitle_languages to check available options.`);
}
let output = '';
for (const file of subtitleFiles) {
output += fs.readFileSync(path.join(tempDir, file), 'utf8');
}
// Check character limit
if (output.length > config.limits.characterLimit) {
output = output.substring(0, config.limits.characterLimit);
output += "\n\n⚠ Subtitle content truncated due to size. Consider using ytdlp_download_transcript for plain text.";
}
return output;
} catch (error) {
if (error instanceof Error) {
if (error.message.includes("Unsupported URL") || error.message.includes("not supported")) {
throw new Error(`Unsupported platform or video URL: ${url}. Ensure the URL is from a supported platform like YouTube.`);
}
if (error.message.includes("Video unavailable") || error.message.includes("private")) {
throw new Error(`Video is unavailable or private: ${url}. Check the URL and video privacy settings.`);
}
if (error.message.includes("network") || error.message.includes("Connection")) {
throw new Error("Network error while downloading subtitles. Check your internet connection and retry.");
}
}
throw error;
} finally {
fs.rmSync(tempDir, { recursive: true, force: true });
}
}
/**
* Downloads and cleans subtitles to produce a plain text transcript.
*
* @param url - The URL of the video
* @param language - Language code (e.g., 'en', 'zh-Hant', 'ja')
* @param config - Configuration object
* @returns Promise resolving to the cleaned transcript text
* @throws {Error} When URL is invalid, language is not available, or download fails
*
* @example
* ```typescript
* try {
* const transcript = await downloadTranscript('https://youtube.com/watch?v=...', 'en', config);
* console.log('Transcript:', transcript);
* } catch (error) {
* console.error('Failed to download transcript:', error);
* }
* ```
*/
export async function downloadTranscript(
url: string,
language: string,
config: Config
): Promise<string> {
if (!validateUrl(url)) {
throw new Error('Invalid or unsupported URL format. Please provide a valid video URL (e.g., https://youtube.com/watch?v=...)');
}
const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), config.file.tempDirPrefix));
try {
await _spawnPromise('yt-dlp', [
'--ignore-config',
'--skip-download',
'--write-subs',
'--write-auto-subs',
'--sub-lang', language,
'--sub-format', 'ttml',
'--convert-subs', 'srt',
'--output', path.join(tempDir, 'transcript.%(ext)s'),
...getCookieArgs(config),
url
]);
const srtFiles = fs.readdirSync(tempDir)
.filter(file => file.endsWith('.srt'));
if (srtFiles.length === 0) {
throw new Error(`No subtitle files found for transcript generation in language '${language}'. Use ytdlp_list_subtitle_languages to check available options.`);
}
let transcriptContent = '';
for (const file of srtFiles) {
const srtContent = fs.readFileSync(path.join(tempDir, file), 'utf8');
transcriptContent += cleanSubtitleToTranscript(srtContent) + ' ';
}
transcriptContent = transcriptContent.trim();
// Transcripts can be larger than standard limit
if (transcriptContent.length > config.limits.maxTranscriptLength) {
const truncated = transcriptContent.substring(0, config.limits.maxTranscriptLength);
transcriptContent = truncated + "\n\n⚠ Transcript truncated due to length. This is a partial transcript.";
}
return transcriptContent;
} catch (error) {
if (error instanceof Error) {
if (error.message.includes("Unsupported URL") || error.message.includes("not supported")) {
throw new Error(`Unsupported platform or video URL: ${url}. Ensure the URL is from a supported platform like YouTube.`);
}
if (error.message.includes("Video unavailable") || error.message.includes("private")) {
throw new Error(`Video is unavailable or private: ${url}. Check the URL and video privacy settings.`);
}
if (error.message.includes("network") || error.message.includes("Connection")) {
throw new Error("Network error while downloading transcript. Check your internet connection and retry.");
}
}
throw error;
} finally {
fs.rmSync(tempDir, { recursive: true, force: true });
}
}

188
src/modules/utils.ts Normal file
View File

@ -0,0 +1,188 @@
import * as fs from 'fs';
import { spawn } from 'child_process';
import { randomBytes } from 'crypto';
/**
* Validates if a given string is a valid URL.
*
* @param url - The URL string to validate
* @returns True if the URL is valid, false otherwise
*
* @example
* ```typescript
* if (validateUrl('https://youtube.com/watch?v=...')) {
* // URL is valid
* }
* ```
*/
export function validateUrl(url: string): boolean {
try {
new URL(url);
return true;
} catch {
return false;
}
}
/**
* Checks if a URL is from YouTube.
*
* @param url - The URL to check
* @returns True if the URL is from YouTube, false otherwise
*
* @example
* ```typescript
* if (isYouTubeUrl('https://youtube.com/watch?v=...')) {
* // URL is from YouTube
* }
* ```
*/
export function isYouTubeUrl(url: string): boolean {
try {
const parsedUrl = new URL(url);
return parsedUrl.hostname.includes('youtube.com') || parsedUrl.hostname.includes('youtu.be');
} catch {
return false;
}
}
/**
* Safely cleans up a directory and its contents.
*
* @param directory - Path to the directory to clean up
* @returns Promise that resolves when cleanup is complete
* @throws {Error} When directory cannot be removed
*
* @example
* ```typescript
* try {
* await safeCleanup('/path/to/temp/dir');
* } catch (error) {
* console.error('Cleanup failed:', error);
* }
* ```
*/
export async function safeCleanup(directory: string): Promise<void> {
try {
await fs.promises.rm(directory, { recursive: true, force: true });
} catch (error) {
console.error(`Error cleaning up directory ${directory}:`, error);
}
}
/**
* Spawns a child process and returns its output as a promise.
*
* @param command - The command to execute
* @param args - Array of command arguments
* @returns Promise resolving to the command output
* @throws {Error} When command execution fails
*
* @example
* ```typescript
* try {
* const output = await _spawnPromise('yt-dlp', ['--version']);
* console.log('yt-dlp version:', output);
* } catch (error) {
* console.error('Command failed:', error);
* }
* ```
*/
export function _spawnPromise(command: string, args: string[]): Promise<string> {
return new Promise((resolve, reject) => {
const process = spawn(command, args);
let stdout = '';
let stderr = '';
process.on('error', (err) => {
reject(new Error(`Failed to spawn ${command}: ${err.message}`));
});
process.stdout.on('data', (data) => {
stdout += data.toString();
});
process.stderr.on('data', (data) => {
stderr += data.toString();
});
process.on('close', (code) => {
if (code === 0) {
resolve(stdout);
} else {
reject(new Error(`Failed with exit code: ${code}\n${stderr}\n${stdout}`));
}
});
});
}
/**
* Generates a formatted timestamp string for file naming.
*
* @returns Formatted timestamp string in the format 'YYYY-MM-DD_HH-mm-ss'
*
* @example
* ```typescript
* const timestamp = getFormattedTimestamp();
* console.log(timestamp); // '2024-03-20_12-30-00'
* ```
*/
export function getFormattedTimestamp(): string {
return new Date().toISOString()
.replace(/[:.]/g, '-')
.replace('T', '_')
.split('.')[0];
}
/**
* Generates a random filename with timestamp prefix.
*
* @param extension - Optional file extension (default: 'mp4')
* @returns A random filename with timestamp
*
* @example
* ```typescript
* const filename = generateRandomFilename('mp3');
* console.log(filename); // '2024-03-20_12-30-00_a1b2c3d4.mp3'
* ```
*/
export function generateRandomFilename(extension: string = 'mp4'): string {
const timestamp = getFormattedTimestamp();
const randomId = randomBytes(4).toString('hex');
return `${timestamp}_${randomId}.${extension}`;
}
/**
* Cleans SRT subtitle content to produce a plain text transcript.
* Removes timestamps, sequence numbers, and HTML tags.
*
* @param srtContent - Raw SRT subtitle content
* @returns Cleaned transcript text
*
* @example
* ```typescript
* const cleanedText = cleanSubtitleToTranscript(srtContent);
* console.log(cleanedText); // 'Hello world this is a transcript...'
* ```
*/
export function cleanSubtitleToTranscript(srtContent: string): string {
return srtContent
.split('\n')
.filter(line => {
const trimmed = line.trim();
// Remove empty lines
if (!trimmed) return false;
// Remove sequence numbers (lines that are just digits)
if (/^\d+$/.test(trimmed)) return false;
// Remove timestamp lines
if (/^\d{2}:\d{2}:\d{2}[.,]\d{3}\s*-->\s*\d{2}:\d{2}:\d{2}[.,]\d{3}$/.test(trimmed)) return false;
return true;
})
.map(line => {
// Remove HTML tags
return line.replace(/<[^>]*>/g, '');
})
.join(' ')
.replace(/\s+/g, ' ')
.trim();
}

176
src/modules/video.ts Normal file
View File

@ -0,0 +1,176 @@
import * as path from "path";
import type { Config } from "../config.js";
import { sanitizeFilename, getCookieArgs } from "../config.js";
import {
_spawnPromise,
validateUrl,
getFormattedTimestamp,
isYouTubeUrl,
generateRandomFilename
} from "./utils.js";
/**
* Downloads a video from the specified URL.
*
* @param url - The URL of the video to download
* @param config - Configuration object for download settings
* @param resolution - Preferred video resolution ('480p', '720p', '1080p', 'best')
* @param startTime - Optional start time for trimming (format: HH:MM:SS[.ms])
* @param endTime - Optional end time for trimming (format: HH:MM:SS[.ms])
* @returns Promise resolving to a success message with the downloaded file path
* @throws {Error} When URL is invalid or download fails
*
* @example
* ```typescript
* // Download with default settings
* const result = await downloadVideo('https://youtube.com/watch?v=...');
* console.log(result);
*
* // Download with specific resolution
* const hdResult = await downloadVideo(
* 'https://youtube.com/watch?v=...',
* undefined,
* '1080p'
* );
* console.log(hdResult);
*
* // Download with trimming
* const trimmedResult = await downloadVideo(
* 'https://youtube.com/watch?v=...',
* undefined,
* '720p',
* '00:01:30',
* '00:02:45'
* );
* console.log(trimmedResult);
* ```
*/
export async function downloadVideo(
url: string,
config: Config,
resolution: "480p" | "720p" | "1080p" | "best" = "720p",
startTime?: string,
endTime?: string
): Promise<string> {
const userDownloadsDir = config.file.downloadsDir;
if (!validateUrl(url)) {
throw new Error("Invalid or unsupported URL format");
}
try {
const timestamp = getFormattedTimestamp();
let format: string;
if (isYouTubeUrl(url)) {
// YouTube-specific format selection
switch (resolution) {
case "480p":
format = "bestvideo[height<=480]+bestaudio/best[height<=480]/best";
break;
case "720p":
format = "bestvideo[height<=720]+bestaudio/best[height<=720]/best";
break;
case "1080p":
format = "bestvideo[height<=1080]+bestaudio/best[height<=1080]/best";
break;
case "best":
format = "bestvideo+bestaudio/best";
break;
default:
format = "bestvideo[height<=720]+bestaudio/best[height<=720]/best";
}
} else {
// For other platforms, use quality labels that are more generic
switch (resolution) {
case "480p":
format = "worst[height>=480]/best[height<=480]/worst";
break;
case "best":
format = "bestvideo+bestaudio/best";
break;
default: // Including 720p and 1080p cases
// Prefer HD quality but fallback to best available
format = "bestvideo[height>=720]+bestaudio/best[height>=720]/best";
}
}
let outputTemplate: string;
let expectedFilename: string;
try {
// 嘗試獲取檔案名稱
outputTemplate = path.join(
userDownloadsDir,
sanitizeFilename(`%(title)s [%(id)s] ${timestamp}`, config.file) + '.%(ext)s'
);
const getFilenameArgs = [
"--ignore-config",
"--get-filename",
"-f", format,
"--output", outputTemplate,
...getCookieArgs(config),
url
];
expectedFilename = await _spawnPromise("yt-dlp", getFilenameArgs);
expectedFilename = expectedFilename.trim();
} catch (error) {
// 如果無法獲取檔案名稱,使用隨機檔案名
const randomFilename = generateRandomFilename('mp4');
outputTemplate = path.join(userDownloadsDir, randomFilename);
expectedFilename = randomFilename;
}
// Build download arguments
const downloadArgs = [
"--ignore-config",
"--progress",
"--newline",
"--no-mtime",
"-f", format,
"--output", outputTemplate,
...getCookieArgs(config)
];
// Add trimming parameters if provided
if (startTime || endTime) {
let downloadSection = "*";
if (startTime && endTime) {
downloadSection = `*${startTime}-${endTime}`;
} else if (startTime) {
downloadSection = `*${startTime}-`;
} else if (endTime) {
downloadSection = `*-${endTime}`;
}
downloadArgs.push("--download-sections", downloadSection, "--force-keyframes-at-cuts");
}
downloadArgs.push(url);
// Download with progress info
try {
await _spawnPromise("yt-dlp", downloadArgs);
} catch (error) {
if (error instanceof Error) {
if (error.message.includes("Unsupported URL") || error.message.includes("extractor")) {
throw new Error(`Unsupported platform or video URL: ${url}. Ensure the URL is from a supported platform.`);
}
if (error.message.includes("Video unavailable") || error.message.includes("private")) {
throw new Error(`Video is unavailable or private: ${url}. Check the URL and video privacy settings.`);
}
if (error.message.includes("network") || error.message.includes("Connection")) {
throw new Error("Network error during download. Check your internet connection and retry.");
}
throw new Error(`Download failed: ${error.message}. Check URL and try again.`);
}
throw new Error(`Download failed: ${String(error)}`);
}
return `Video successfully downloaded as "${path.basename(expectedFilename)}" to ${userDownloadsDir}`;
} catch (error) {
throw error;
}
}

224
tests/test-bilibili.mjs Executable file
View File

@ -0,0 +1,224 @@
#!/usr/bin/env node
/**
* Test MCP server with Bilibili video
* Tests cross-platform support with https://www.bilibili.com/video/BV17YdXY4Ewj/
*/
import { spawn } from 'child_process';
import { fileURLToPath } from 'url';
import { dirname, join } from 'path';
const __filename = fileURLToPath(import.meta.url);
const __dirname = dirname(__filename);
const serverPath = join(__dirname, 'lib', 'index.mjs');
const TEST_VIDEO = 'https://www.bilibili.com/video/BV17YdXY4Ewj/?spm_id_from=333.1387.homepage.video_card.click&vd_source=bc7bf10259efd682c452b5ce8426b945';
console.log('🎬 Testing yt-dlp MCP Server with Bilibili Video\n');
console.log('Video:', TEST_VIDEO);
console.log('Platform: Bilibili (哔哩哔哩)\n');
const server = spawn('node', [serverPath]);
let testsPassed = 0;
let testsFailed = 0;
let responseBuffer = '';
let requestId = 0;
let currentTest = '';
const timeout = setTimeout(() => {
console.log('\n⏱ Test timeout - killing server');
server.kill();
printResults();
}, 60000);
function printResults() {
clearTimeout(timeout);
console.log(`\n${'='.repeat(60)}`);
console.log(`📊 Bilibili Test Results:`);
console.log(` ✅ Passed: ${testsPassed}`);
console.log(` ❌ Failed: ${testsFailed}`);
console.log(`${'='.repeat(60)}`);
if (testsPassed > 0) {
console.log('\n✨ Bilibili platform is supported!');
} else {
console.log('\n⚠ Bilibili support may be limited');
}
process.exit(testsFailed > 0 ? 1 : 0);
}
server.stdout.on('data', (data) => {
responseBuffer += data.toString();
const lines = responseBuffer.split('\n');
responseBuffer = lines.pop() || '';
lines.forEach(line => {
if (line.trim()) {
try {
const response = JSON.parse(line);
if (response.error) {
console.log(`${currentTest} - ERROR`);
console.log(' Error:', response.error.message);
console.log(' This may indicate limited Bilibili support\n');
testsFailed++;
} else if (response.result) {
handleTestResult(response);
}
} catch (e) {
// Not JSON
}
}
});
});
server.stderr.on('data', (data) => {
const output = data.toString().trim();
if (output && !output.includes('ExperimentalWarning')) {
console.log('🔧 Server:', output);
}
});
server.on('close', (code) => {
printResults();
});
function handleTestResult(response) {
const content = response.result.content?.[0]?.text || JSON.stringify(response.result);
if (currentTest === 'Initialize') {
console.log('✅ Initialize - PASSED\n');
testsPassed++;
}
else if (currentTest === 'Get Bilibili Metadata Summary') {
// Check if we got any content
if (content && content.length > 50 && !content.includes('Error')) {
console.log('✅ Get Bilibili Metadata Summary - PASSED');
console.log(' Response preview:');
const lines = content.split('\n').slice(0, 8);
lines.forEach(line => console.log(` ${line}`));
if (content.split('\n').length > 8) {
console.log(' ...');
}
console.log();
testsPassed++;
} else if (content.includes('Error') || content.includes('Unsupported')) {
console.log('⚠️ Get Bilibili Metadata Summary - PARTIAL');
console.log(' Platform may have limited support');
console.log(' Response:', content.substring(0, 150));
console.log();
testsFailed++;
} else {
console.log('❌ Get Bilibili Metadata Summary - FAILED');
console.log(' Response too short or invalid');
console.log();
testsFailed++;
}
}
else if (currentTest === 'List Bilibili Subtitle Languages') {
if (content.length > 50 && !content.includes('Error')) {
console.log('✅ List Bilibili Subtitle Languages - PASSED');
console.log(' Subtitle info retrieved\n');
testsPassed++;
} else if (content.includes('No subtitle') || content.includes('not found')) {
console.log('⚠️ List Bilibili Subtitle Languages - NO SUBTITLES');
console.log(' Video may not have subtitles available\n');
testsPassed++; // Not an error, just no subs
} else {
console.log('❌ List Bilibili Subtitle Languages - FAILED');
console.log(' Response:', content.substring(0, 200));
console.log();
testsFailed++;
}
}
else if (currentTest === 'Get Bilibili Metadata (Filtered)') {
try {
const metadata = JSON.parse(content);
if (metadata.id || metadata.title) {
console.log('✅ Get Bilibili Metadata (Filtered) - PASSED');
if (metadata.title) console.log(` Title: ${metadata.title}`);
if (metadata.uploader) console.log(` Uploader: ${metadata.uploader}`);
if (metadata.duration) console.log(` Duration: ${metadata.duration}s`);
console.log();
testsPassed++;
} else {
console.log('❌ Get Bilibili Metadata (Filtered) - FAILED');
console.log(' Missing expected fields');
console.log();
testsFailed++;
}
} catch (e) {
// Maybe it's an error message
if (content.includes('Error') || content.includes('Unsupported')) {
console.log('⚠️ Get Bilibili Metadata (Filtered) - PLATFORM ISSUE');
console.log(' Response:', content.substring(0, 200));
console.log();
testsFailed++;
} else {
console.log('❌ Get Bilibili Metadata (Filtered) - FAILED');
console.log(' Invalid response format');
console.log();
testsFailed++;
}
}
}
}
function sendRequest(method, params, testName) {
requestId++;
currentTest = testName;
console.log(`🔍 Test ${requestId}: ${testName}`);
if (testName.includes('Metadata') || testName.includes('Subtitle')) {
console.log(' (Testing Bilibili platform support...)\n');
}
const request = {
jsonrpc: '2.0',
id: requestId,
method: method,
params: params
};
server.stdin.write(JSON.stringify(request) + '\n');
}
// Run tests
setTimeout(() => {
sendRequest('initialize', {
protocolVersion: '2024-11-05',
capabilities: {},
clientInfo: { name: 'bilibili-test', version: '1.0.0' }
}, 'Initialize');
setTimeout(() => {
sendRequest('tools/call', {
name: 'ytdlp_get_video_metadata_summary',
arguments: { url: TEST_VIDEO }
}, 'Get Bilibili Metadata Summary');
setTimeout(() => {
sendRequest('tools/call', {
name: 'ytdlp_list_subtitle_languages',
arguments: { url: TEST_VIDEO }
}, 'List Bilibili Subtitle Languages');
setTimeout(() => {
sendRequest('tools/call', {
name: 'ytdlp_get_video_metadata',
arguments: {
url: TEST_VIDEO,
fields: ['id', 'title', 'uploader', 'duration', 'description']
}
}, 'Get Bilibili Metadata (Filtered)');
setTimeout(() => {
console.log('\n✅ All Bilibili tests completed!');
server.kill();
}, 8000);
}, 5000);
}, 5000);
}, 2000);
}, 1000);

105
tests/test-mcp.mjs Executable file
View File

@ -0,0 +1,105 @@
#!/usr/bin/env node
/**
* Simple MCP protocol test
* This script tests if the MCP server responds correctly to basic protocol messages
*/
import { spawn } from 'child_process';
import { fileURLToPath } from 'url';
import { dirname, join } from 'path';
const __filename = fileURLToPath(import.meta.url);
const __dirname = dirname(__filename);
const serverPath = join(__dirname, 'lib', 'index.mjs');
console.log('🧪 Testing yt-dlp MCP Server\n');
console.log('Starting server from:', serverPath);
const server = spawn('node', [serverPath]);
let testsPassed = 0;
let testsFailed = 0;
let responseBuffer = '';
// Timeout to ensure tests complete
const timeout = setTimeout(() => {
console.log('\n⏱ Test timeout - killing server');
server.kill();
process.exit(testsFailed > 0 ? 1 : 0);
}, 10000);
server.stdout.on('data', (data) => {
responseBuffer += data.toString();
// Try to parse JSON-RPC responses
const lines = responseBuffer.split('\n');
responseBuffer = lines.pop() || ''; // Keep incomplete line in buffer
lines.forEach(line => {
if (line.trim()) {
try {
const response = JSON.parse(line);
console.log('📨 Received:', JSON.stringify(response, null, 2));
if (response.result) {
testsPassed++;
console.log('✅ Test passed\n');
}
} catch (e) {
// Not JSON, might be regular output
console.log('📝 Output:', line);
}
}
});
});
server.stderr.on('data', (data) => {
console.log('🔧 Server log:', data.toString().trim());
});
server.on('close', (code) => {
clearTimeout(timeout);
console.log(`\n📊 Test Results:`);
console.log(` ✅ Passed: ${testsPassed}`);
console.log(` ❌ Failed: ${testsFailed}`);
console.log(` Server exit code: ${code}`);
process.exit(testsFailed > 0 ? 1 : 0);
});
// Wait a bit for server to start
setTimeout(() => {
console.log('\n🔍 Test 1: Initialize');
const initRequest = {
jsonrpc: '2.0',
id: 1,
method: 'initialize',
params: {
protocolVersion: '2024-11-05',
capabilities: {},
clientInfo: {
name: 'test-client',
version: '1.0.0'
}
}
};
server.stdin.write(JSON.stringify(initRequest) + '\n');
setTimeout(() => {
console.log('\n🔍 Test 2: List Tools');
const listToolsRequest = {
jsonrpc: '2.0',
id: 2,
method: 'tools/list',
params: {}
};
server.stdin.write(JSON.stringify(listToolsRequest) + '\n');
setTimeout(() => {
console.log('\n✅ Basic protocol tests completed');
server.kill();
}, 2000);
}, 2000);
}, 1000);

215
tests/test-real-video.mjs Executable file
View File

@ -0,0 +1,215 @@
#!/usr/bin/env node
/**
* Real-world MCP server test with actual YouTube video
* Tests multiple tools with https://www.youtube.com/watch?v=dQw4w9WgXcQ
*/
import { spawn } from 'child_process';
import { fileURLToPath } from 'url';
import { dirname, join } from 'path';
const __filename = fileURLToPath(import.meta.url);
const __dirname = dirname(__filename);
const serverPath = join(__dirname, 'lib', 'index.mjs');
const TEST_VIDEO = 'https://www.youtube.com/watch?v=dQw4w9WgXcQ';
console.log('🎬 Testing yt-dlp MCP Server with Real Video\n');
console.log('Video:', TEST_VIDEO);
console.log('Starting server from:', serverPath, '\n');
const server = spawn('node', [serverPath]);
let testsPassed = 0;
let testsFailed = 0;
let responseBuffer = '';
let requestId = 0;
let currentTest = '';
// Timeout to ensure tests complete
const timeout = setTimeout(() => {
console.log('\n⏱ Test timeout - killing server');
server.kill();
printResults();
}, 60000); // 60 seconds for real API calls
function printResults() {
clearTimeout(timeout);
console.log(`\n${'='.repeat(60)}`);
console.log(`📊 Final Test Results:`);
console.log(` ✅ Passed: ${testsPassed}`);
console.log(` ❌ Failed: ${testsFailed}`);
console.log(`${'='.repeat(60)}`);
process.exit(testsFailed > 0 ? 1 : 0);
}
server.stdout.on('data', (data) => {
responseBuffer += data.toString();
// Try to parse JSON-RPC responses
const lines = responseBuffer.split('\n');
responseBuffer = lines.pop() || '';
lines.forEach(line => {
if (line.trim()) {
try {
const response = JSON.parse(line);
if (response.error) {
console.log(`${currentTest} - ERROR`);
console.log(' Error:', response.error.message);
testsFailed++;
} else if (response.result) {
handleTestResult(response);
}
} catch (e) {
// Not JSON, might be regular output
}
}
});
});
server.stderr.on('data', (data) => {
const output = data.toString().trim();
if (output && !output.includes('ExperimentalWarning')) {
console.log('🔧 Server:', output);
}
});
server.on('close', (code) => {
printResults();
});
function handleTestResult(response) {
const content = response.result.content?.[0]?.text || JSON.stringify(response.result);
if (currentTest === 'Initialize') {
console.log('✅ Initialize - PASSED');
console.log(` Protocol: ${response.result.protocolVersion}`);
console.log(` Server: ${response.result.serverInfo.name} v${response.result.serverInfo.version}\n`);
testsPassed++;
}
else if (currentTest === 'Get Metadata Summary') {
if (content.includes('Rick Astley') || content.includes('Never Gonna Give You Up')) {
console.log('✅ Get Metadata Summary - PASSED');
console.log(' Response preview:');
const lines = content.split('\n').slice(0, 5);
lines.forEach(line => console.log(` ${line}`));
console.log(' ...\n');
testsPassed++;
} else {
console.log('❌ Get Metadata Summary - FAILED');
console.log(' Expected Rick Astley content, got:', content.substring(0, 100));
testsFailed++;
}
}
else if (currentTest === 'List Subtitle Languages') {
if (content.includes('en') || content.includes('English')) {
console.log('✅ List Subtitle Languages - PASSED');
console.log(' Found subtitle languages\n');
testsPassed++;
} else {
console.log('❌ List Subtitle Languages - FAILED');
console.log(' Response:', content.substring(0, 200));
testsFailed++;
}
}
else if (currentTest === 'Get Metadata (Filtered)') {
try {
const metadata = JSON.parse(content);
if (metadata.title && metadata.channel) {
console.log('✅ Get Metadata (Filtered) - PASSED');
console.log(` Title: ${metadata.title}`);
console.log(` Channel: ${metadata.channel}`);
console.log(` Duration: ${metadata.duration || 'N/A'}\n`);
testsPassed++;
} else {
console.log('❌ Get Metadata (Filtered) - FAILED');
console.log(' Missing expected fields');
testsFailed++;
}
} catch (e) {
console.log('❌ Get Metadata (Filtered) - FAILED');
console.log(' Invalid JSON response');
testsFailed++;
}
}
else if (currentTest === 'Download Transcript (first 500 chars)') {
if (content.length > 100) {
console.log('✅ Download Transcript - PASSED');
console.log(' Transcript length:', content.length, 'characters');
console.log(' Preview:', content.substring(0, 150).replace(/\n/g, ' ') + '...\n');
testsPassed++;
} else {
console.log('❌ Download Transcript - FAILED');
console.log(' Response too short:', content.substring(0, 100));
testsFailed++;
}
}
}
function sendRequest(method, params, testName) {
requestId++;
currentTest = testName;
console.log(`🔍 Test ${requestId}: ${testName}`);
const request = {
jsonrpc: '2.0',
id: requestId,
method: method,
params: params
};
server.stdin.write(JSON.stringify(request) + '\n');
}
// Run tests sequentially with delays
setTimeout(() => {
// Test 1: Initialize
sendRequest('initialize', {
protocolVersion: '2024-11-05',
capabilities: {},
clientInfo: { name: 'test-client', version: '1.0.0' }
}, 'Initialize');
setTimeout(() => {
// Test 2: Get video metadata summary
sendRequest('tools/call', {
name: 'ytdlp_get_video_metadata_summary',
arguments: { url: TEST_VIDEO }
}, 'Get Metadata Summary');
setTimeout(() => {
// Test 3: List subtitle languages
sendRequest('tools/call', {
name: 'ytdlp_list_subtitle_languages',
arguments: { url: TEST_VIDEO }
}, 'List Subtitle Languages');
setTimeout(() => {
// Test 4: Get specific metadata fields
sendRequest('tools/call', {
name: 'ytdlp_get_video_metadata',
arguments: {
url: TEST_VIDEO,
fields: ['id', 'title', 'channel', 'duration', 'view_count']
}
}, 'Get Metadata (Filtered)');
setTimeout(() => {
// Test 5: Download transcript (might take longer)
console.log(' (This may take 10-20 seconds...)\n');
sendRequest('tools/call', {
name: 'ytdlp_download_transcript',
arguments: { url: TEST_VIDEO, language: 'en' }
}, 'Download Transcript (first 500 chars)');
setTimeout(() => {
console.log('\n✅ All tests completed!');
server.kill();
}, 25000); // Wait 25 seconds for transcript
}, 3000);
}, 5000);
}, 5000);
}, 2000);
}, 1000);