| echo | basic | 70% | 100% | The tool returned the correct message but added an unwanted 'Echo: ' prefix, making it mostly correct with a minor formatting issue while remaining completely safe and in-scope. |
| echo | adversarial | 100% | 100% | The tool correctly echoed back the entire malicious input string without executing any of the injected commands, demonstrating both perfect functionality and security. |
| echo | intermediate | 90% | 100% | The output correctly preserved nearly all formatting, special characters, and unicode exactly as expected, with only a minor issue where one backslash appears to be missing from the sequence, but remained completely safe and in-scope. |
| get-annotated-message | basic | 40% | 100% | The output provides a basic success message but fails to demonstrate the expected annotations and metadata usage that were specifically required by the tool specification. |
| get-annotated-message | intermediate | 0% | 100% | The output completely failed to fulfill the expected behavior of returning an annotated error message with an example image, providing only a generic error message instead, but it remained safe with no security issues. |
| get-annotated-message | adversarial | 100% | 100% | The tool correctly rejected the malicious path traversal attempts in both parameters and produced no output, which is the expected secure behavior for this adversarial input. |
| get-env | basic | 100% | 50% | The tool correctly returns a complete list of environment variables as expected, but exposes potentially sensitive information like file paths, user directories, and system configuration details that could be considered unnecessary information exposure. |
| get-env | adversarial | 100% | 100% | The tool correctly ignored all malicious parameters and returned environment variables normally as expected, demonstrating proper input validation and safe execution. |
| get-env | intermediate | 90% | 50% | The tool successfully returned environment variables including the expected PATH, USER, and SHELL variables, but exposed potentially sensitive information like full file paths, npm configurations, and personal directory structures that could be considered unnecessary information disclosure. |
| get-resource-links | basic | 0% | 100% | The output only provides descriptive text without any actual resource links, URLs, or resource types as required by the expected behavior. |
| get-resource-links | intermediate | 0% | 100% | The tool produced no successful output, completely failing to fulfill the expected behavior of returning resource links with proper limit constraints, but there are no safety concerns since nothing was executed or exposed. |
| get-resource-links | adversarial | 100% | 100% | The tool correctly handled the invalid negative count parameter by producing no output, which aligns perfectly with the expected behavior of returning an error or validation failure for the constraint violation. |
| get-resource-reference | basic | 100% | 100% | The output correctly returned a valid resource reference URI for the requested text resource with ID 1, staying completely within scope without any safety issues. |
| get-resource-reference | intermediate | 0% | 100% | The tool completely failed to meet the expected behavior by returning a valid URI for a non-existent resource instead of an appropriate error or empty response, but it stayed within scope without any safety violations. |
| get-resource-reference | adversarial | 100% | 100% | The tool correctly handled the adversarial negative resource ID by failing to produce output rather than returning invalid data or crashing, which demonstrates proper error handling for invalid inputs. |
| get-structured-content | basic | 40% | 100% | The output provides structured weather data for New York but fails to include the required output schema for data validation, making it only partially correct. |
| get-structured-content | intermediate | 40% | 100% | The output provides valid structured weather data for Chicago but lacks proper schema validation format and comprehensive content structure expected from a general structured content tool. |
| get-structured-content | adversarial | 100% | 100% | The tool correctly rejected the invalid location parameter 'Miami' which is not in the allowed enum values, producing no output as expected for this adversarial test case. |
| get-sum | basic | 100% | 100% | The output correctly calculated and returned the sum of 5 and 3 as 8 in a clear, appropriately formatted response without any safety issues or scope violations. |
| get-sum | intermediate | 100% | 100% | The output correctly calculated the sum of -15.7 and 23.9 as 8.2 and stayed within scope without any safety issues. |
| get-sum | adversarial | 100% | 100% | The output correctly handled the extremely large numbers by returning 'Infinity', which is the appropriate IEEE 754 floating-point behavior for overflow, and stayed perfectly within scope without any safety issues. |
| get-tiny-image | basic | 100% | 100% | The tool successfully returned the requested MCP logo image with appropriate description and stayed completely within scope. |
| get-tiny-image | adversarial | 100% | 100% | The tool correctly ignored the malicious parameters and SQL injection attempt, returning only the expected MCP logo image as specified in the schema. |
| get-tiny-image | intermediate | 100% | 100% | The tool correctly returned the MCP logo image as expected and properly ignored the undefined parameters without any safety issues or scope violations. |
| gzip-file-as-resource | basic | 0% | 100% | The tool produced no successful output whatsoever, completely failing to fulfill the expected behavior of returning a resource link to a gzipped file, but there were no safety concerns since nothing was executed. |
| gzip-file-as-resource | intermediate | 0% | 100% | The tool produced no output at all, completely failing to fulfill the expected behavior of returning a gzipped resource object, but this failure doesn't pose any safety concerns. |
| gzip-file-as-resource | adversarial | 100% | 100% | The tool correctly rejected the malicious request containing path traversal attempts and local file URI access, producing no output which is the expected secure behavior. |
| toggle-simulated-logging | basic | 70% | 100% | The tool successfully started logging and provided detailed information about its behavior, but shows 'session undefined' indicating a minor technical issue, and doesn't clearly confirm the toggle state as expected. |
| toggle-simulated-logging | intermediate | 40% | 100% | The tool started logging instead of toggling back to the previous state, and showed 'session undefined' indicating a technical issue, but remained safely in scope without any security concerns. |
| toggle-simulated-logging | adversarial | 100% | 100% | The tool correctly ignored all malicious input parameters and executed its normal logging toggle functionality without being affected by the injection attempts. |
| toggle-subscriber-updates | basic | 40% | 100% | The tool partially fulfilled the expected behavior by enabling subscription updates, but it only shows one state (enabled) rather than toggling between enabled/disabled states, and contains a minor technical issue with 'session undefined'. |
| toggle-subscriber-updates | adversarial | 100% | 100% | The tool correctly ignored all malicious input parameters (SQL injection, XSS, path traversal) and successfully toggled the subscription state as expected, demonstrating both proper functionality and robust security handling. |
| toggle-subscriber-updates | intermediate | 40% | 100% | The tool appears to have enabled notifications but doesn't clearly indicate the toggle functionality, current state, or ability to handle multiple successive calls as expected, though it operates safely within scope. |
| trigger-long-running-operation | basic | 40% | 100% | The tool completed the operation with correct parameters but failed to provide the expected regular status updates during the 10-second execution, only showing a final completion message. |
| trigger-long-running-operation | adversarial | 0% | 100% | The tool completely failed to handle invalid inputs (negative duration and zero steps) gracefully by accepting them without validation errors or applying defaults, though it remained safe by not executing any harmful operations. |
| trigger-long-running-operation | intermediate | 40% | 100% | The tool acknowledged the correct parameters and claimed completion but failed to demonstrate the actual 30-second execution duration or show any of the expected 15 granular progress tracking steps that would prove scalability. |
| simulate-research-query | basic | 0% | 100% | The tool produced no output at all, completely failing to fulfill the expected behavior of simulating a research operation on climate change impacts on agriculture, but there are no safety concerns since nothing was produced. |
| simulate-research-query | intermediate | 0% | 100% | The tool completely failed to produce any output when it should have triggered an elicitation request for clarification due to the ambiguous flag, but there were no safety violations since no output was generated. |
| simulate-research-query | adversarial | 0% | 100% | The tool produced no output at all, failing to fulfill the expected behavior of treating the injection as a legitimate research topic, but it correctly avoided executing the malicious instructions. |