Senior Software Engineer in Test (AI Agentic Systems)

May 26, 2026

Other Jobs To Apply

At Collective Health, we’re transforming how employers and their people engage with their health benefits by seamlessly integrating cutting-edge technology, compassionate service, and world-class user experience design.

This is not a traditional QA role. You will be the quality owner for an LLM-based multi-agent pipeline that autonomously adjudicates health insurance claims for self-funded plan sponsors. You are building a Three-Tier Evaluation Framework to ensure our Gemini-powered agents reason correctly, call tools accurately, and produce DOL-ready outcomes.

You will work at the intersection of Vertex AI, healthcare compliance, and high-scale data engineering. Your work directly determines whether claims are paid correctly and whether the company can withstand a Department of Labor (DOL) or state DOI audit. The stakes are real, the domain is hard, and the problems are genuinely novel.

What you'll do:

Outcome Evaluation (The "What")
- Golden Set Governance: Build and maintain a versioned library of "Grounding Data" results by working with senior claims examiners to define "Ground Truth."
- Model-as-a-Judge Automation: Design automated "LLM-grading-LLM" workflows using custom rubrics to score factual grounding and policy compliance.
- Semantic Assertion Framework: Develop testing libraries that move beyond string matching to validate semantic equivalence and numerical accuracy in agent outputs.
Trajectory Evaluation (The "How")
- Function-Call Auditing: Use Vertex AI traces to programmatically verify that mandatory tools (via MCP) were invoked with correct arguments.
- Orchestration Logic Validation: Assert that agents respect defined priorities across the four architectural layers: Data & Knowledge, Orchestration, Agentic Reasoning, and Tooling.
- Reasoning Trace Auditing: Ensure every autonomous decision is traceable to a specific SOP sentence and a live API data point.
Continuous Automated Regression (The "Always")
- CI/CD Integration: Every prompt or model update in Vertex AI Prompt Management must trigger an automated regression run.
- Auto-SxS: Own the automated pairwise comparison process to detect logic drift between "New" and "Production" agent versions.
- Mocking & Resilience: Build a Vertex AI/ADK mocking layer to simulate model responses, allowing for thousands of logic tests in seconds with zero API costs.

To be successful in this role, you'll need:

Required Skills (The Core Bar)
- Python SDET Expertise: Expert in Python and pytest, specifically building custom mocking frameworks for external APIs (Vertex AI/ADK).
- AI/LLM Observability: Hands-on experience with Vertex AI Experiments, Auto-SxS, and Cloud Logging for trace analysis.
- Data Literacy: Expert-level SQL (BigQuery) and Pandas skills to "diff" massive datasets and identify adjudication discrepancies.
- Prompt Engineering for QA: Ability to analyze "System Instructions" and refine prompts based on failed test cases to close logic gaps.
- Architectural Testing: Experience testing multi-layer systems involving RAG (Vertex AI Search), state management (LangGraph), and function calling.
Preferred Skills (The "Nice-to-Haves")
- Healthcare/Claims Domain: Familiarity with claims adjudication concepts (pend reason codes, COB, eligibility, stop-loss).
- Compliance Knowledge: Understanding of HIPAA/PHI handling and writing test evidence for regulatory bodies (DOL/DOI).
- Human-in-the-Loop Testing: Experience in "Shadow Mode" monitoring—comparing agent decisions against human expert (MCA) baselines.

Pay Transparency Statement

This is a hybrid position based out of our Lehi office, with the expectation of being in office at least two weekdays per week. #LI-hybrid

The actual pay rate offered within the range will depend on factors including geographic location, qualifications, experience, and internal equity. In addition to the salary, you will be eligible for 115000 stock options and benefits like health insurance, 401k, and paid time off. Learn more about our benefits at https://jobs.collectivehealth.com/benefits/.

Lehi, UT Pay Range

$99,200 - $124,000 USD

Why Join Us?

Mission-driven culture that values innovation, collaboration, and a commitment to excellence in healthcare
Impactful projects that shape the future of our organization
Opportunities for professional development through internal mobility opportunities, mentorship programs, and courses tailored to your interests
Flexible work arrangements and a supportive work-life balance

We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. Collective Health is committed to providing support to candidates who require reasonable accommodation during the interview process. If you need assistance, please contact recruiting-accommodations@collectivehealth.com.

Privacy Notice

For more information about why we need your data and how we use it, please see our privacy policy: https://collectivehealth.com/privacy-policy/.

Back to blog

Common Interview Questions And Answers

1. HOW DO YOU PLAN YOUR DAY?

This is what this question poses: When do you focus and start working seriously? What are the hours you work optimally? Are you a night owl? A morning bird? Remote teams can be made up of people working on different shifts and around the world, so you won't necessarily be stuck in the 9-5 schedule if it's not for you...

2. HOW DO YOU USE THE DIFFERENT COMMUNICATION TOOLS IN DIFFERENT SITUATIONS?

When you're working on a remote team, there's no way to chat in the hallway between meetings or catch up on the latest project during an office carpool. Therefore, virtual communication will be absolutely essential to get your work done...

3. WHAT IS "WORKING REMOTE" REALLY FOR YOU?

Many people want to work remotely because of the flexibility it allows. You can work anywhere and at any time of the day...

4. WHAT DO YOU NEED IN YOUR PHYSICAL WORKSPACE TO SUCCEED IN YOUR WORK?

With this question, companies are looking to see what equipment they may need to provide you with and to verify how aware you are of what remote working could mean for you physically and logistically...

5. HOW DO YOU PROCESS INFORMATION?

Several years ago, I was working in a team to plan a big event. My supervisor made us all work as a team before the big day. One of our activities has been to find out how each of us processes information...

6. HOW DO YOU MANAGE THE CALENDAR AND THE PROGRAM? WHICH APPLICATIONS / SYSTEM DO YOU USE?

Or you may receive even more specific questions, such as: What's on your calendar? Do you plan blocks of time to do certain types of work? Do you have an open calendar that everyone can see?...

7. HOW DO YOU ORGANIZE FILES, LINKS, AND TABS ON YOUR COMPUTER?

Just like your schedule, how you track files and other information is very important. After all, everything is digital!...

8. HOW TO PRIORITIZE WORK?

The day I watched Marie Forleo's film separating the important from the urgent, my life changed. Not all remote jobs start fast, but most of them are...

9. HOW DO YOU PREPARE FOR A MEETING AND PREPARE A MEETING? WHAT DO YOU SEE HAPPENING DURING THE MEETING?

Just as communication is essential when working remotely, so is organization. Because you won't have those opportunities in the elevator or a casual conversation in the lunchroom, you should take advantage of the little time you have in a video or phone conference...

10. HOW DO YOU USE TECHNOLOGY ON A DAILY BASIS, IN YOUR WORK AND FOR YOUR PLEASURE?

This is a great question because it shows your comfort level with technology, which is very important for a remote worker because you will be working with technology over time...

DISCLAIMER