Teammately is the "AI AI-Engineer" that autonomously handles the DevOps cycle in LLM-based AI as your “teammate” - including prompt and RAG engineering, model selection and tuning, test case preparation, quality evaluation, and post-production monitoring.
Let AI Draft Architecture
You give the first objective of your project, then AI drafts the initial prompt text and the architecture of LLM.
Let AI Create Test Cases
We first need test cases for a consistent evaluation against multiple dev options in LLM architecture.
Our AI generates input text dataset based on your prompt and generation logics.
Let AI Create Eval Metric
LLM-as-a-judge is an efficient method to evaluate the quality of LLM responses at large scale, typically far more efficient than traditional human evaluation.
Open-source metrics may not suit every case. Metrics tailored to each case better fit in evaluating its quality.
Our AI creates custom metrics.
AI Evaluates Original Plan
Simulates generation outcomes from the generated input test cases
Evaluates simulated logs based on the generated metric instructions
Analyzes the evaluation results quantitatively and qualitatively (Results are visualized for human reviewers)
Summarizes the analysis results to construct a problem narrative, forming the basis for alt planning strategies
Review Alternative Plans
LLM architecture improvement suggestions
Missing knowledge identification to fulfill
AI Evaluates Alt Plans
AI evaluates alt plans again after human review.
After evals, our Agentic AI aggregates score and compare which plan performs better than the others in each metric.
AI Judges Final Rankings
As the final step, AI creates overall narrative from data report and judges the final rankings of each plan, to help guiding the the best balanced one among candidates.