January 23, 2025

Galileo unveiled Agentic Evaluations, a solution for evaluating the performance of AI agents powered by large language models (LLMs).

With Agentic Evaluations, developers gain the tools and insights needed to optimize agent performance and reliability at every step—ensuring readiness for real-world deployment.

"AI agents are unlocking a new era of innovation, but their complexity has made it difficult for developers to understand where failures occur and why," said Vikram Chatterji, CEO and co-founder of Galileo. "With LLMs driving decision-making, teams need tools to pinpoint and understand an agent's failure modes. Agentic Evaluations delivers unprecedented visibility into every action, across entire workflows, empowering developers to build, ship, and scale reliable, trustworthy AI solutions."

Galileo's Agentic Evaluations offers an end-to-end framework that offers both system-level and step-by-step evaluation, enabling developers to build reliable, resilient, and high-performing AI agents.

Key capabilities include:

Complete Visibility into Agent Workflows: Gain a clear view of entire multi-step agent completions, from input to final action, with comprehensive tracing and simple visualizations that help developers quickly pinpoint inefficiencies and errors in agent sessions.
Agent-Specific Metrics: Measure agent performance at every level with proprietary, research-backed metrics built to evaluate agents at multiple levels.
- LLM Planner: Assess tool selection quality and passing on the right instructions.
- Tool Calls: Assess errors in individual tool completions.
- Overall session success: Measure overall task completion and successful agentic interactions.
Granular Cost and Latency Tracking: Optimize the cost-effectiveness of agents with aggregate tracking for cost, latency, and errors across sessions and spans.
Seamless Integrations: Support for popular AI frameworks like LangGraph and CrewAI.
Proactive Insights: Alerts and dashboards help developers identify systemic issues and uncover actionable insights for continuous improvement such as failed tool calls or misalignment between the final action and initial instructions.

Agentic Evaluations is now available to all Galileo users.

The Latest

Maximizing Resilience: Insights from the 2025 SRE Report

February 04, 2025

The 2025 Catchpoint SRE Report dives into the forces transforming the SRE landscape, exploring both the challenges and opportunities ahead. Let's break down the key findings and what they mean for SRE professionals and the businesses relying on them ...

Meeting Growing Data Infrastructure Demands with Limited Resources in 2025

February 03, 2025

The pressure on IT teams has never been greater. As data environments grow increasingly complex, resource shortages are emerging as a major obstacle for IT leaders striving to meet the demands of modern infrastructure management ... According to DataStrike's newly released 2025 Data Infrastructure Survey Report, more than half (54%) of IT leaders cite resource limitations as a top challenge, highlighting a growing trend toward outsourcing as a solution ...

Gartner: Top Predictions for IT Organizations and Users in 2025 and Beyond

January 31, 2025

Gartner revealed its top strategic predictions for 2025 and beyond. Gartner's top predictions explore how generative AI (GenAI) is affecting areas where most would assume only humans can have lasting impact ...

AI Is Revolutionizing Network Operations for Service Providers

January 31, 2025

The adoption of artificial intelligence (AI) is accelerating across the telecoms industry, with 88% of fixed broadband service providers now investigating or trialing AI automation to enhance their fixed broadband services, according to new research from Incognito Software Systems and Omdia ...

AWS Monitoring: Metrics You Need to Monitor

January 30, 2025

AWS is a cloud-based computing platform known for its reliability, scalability, and flexibility. However, as helpful as its comprehensive infrastructure is, disparate elements and numerous siloed components make it difficult for admins to visualize the cloud performance in detail. It requires meticulous monitoring techniques and deep visibility to understand cloud performance and analyze operational efficiency in detail to ensure seamless cloud operations ...

Unlocking Potential: AI's Impact on Software Adoption

January 29, 2025

Imagine a future where software, once a complex obstacle, becomes a natural extension of daily workflow — an intuitive, seamless experience that maximizes productivity and efficiency. This future is no longer a distant vision but a reality being crafted by the transformative power of Artificial Intelligence ...

GenAI in the Enterprise: Why Data Security Is at Risk

January 28, 2025

Enterprise data sprawl already challenges companies' ability to protect and back up their data. Much of this information is never fully secured, leaving organizations vulnerable. Now, as GenAI platforms emerge as yet another environment where enterprise data is consumed, transformed, and created, this fragmentation is set to intensify ...

OTel Myth Busting: Untapping the Hidden Value of Logs in Observability

January 27, 2025

OpenTelemetry (OTel) has revolutionized the way we approach observability by standardizing the collection of telemetry data ... Here are five myths — and truths — to help elevate your OTel integration by harnessing the untapped power of logs ...

2025 DataOps Predictions - Part 3

January 23, 2025

Industry experts offer predictions on how DataOps and related technologies will evolve and impact business in 2025. Part 3 covers data technology ...

2025 DataOps Predictions - Part 2

January 22, 2025

Industry experts offer predictions on how DataOps and related technologies will evolve and impact business in 2025. Part 2 covers DataOps roles, Data Observability, Business Intelligence and Analytics ...

The Latest