Senior AI Engineer

ScultureAI

Department:Data Analysis

Type:REMOTE

Region:UK

Location:Greater London, England, United Kingdom

Experience:Mid-Senior level

Estimated Salary:£70,000 - £100,000

Skills:

LLM PROMPT ENGINEERINGMULTI-AGENT ORCHESTRATIONLLM EVALUATION SYSTEMSOPENAI EVALSDEEPEVALPROMPTFOOTRULENSLANGCHAIN EVALFEW-SHOT EVALUATOR PROMPTSLLM-AS-A-JUDGE PIPELINES

Share this job:

Job Description

Posted on: July 15, 2025

About ScultureAI ScultureAI is a B2B SaaS startup developing groundbreaking coaching solutions for shaping organisational culture using large-language models and other cutting-edge Al technologies. An organisation’s culture is a key driver of employee wellbeing and company success, and we are driven by the mission to improve the lives and performance of employees and companies all over the world. ScultureAI been named one of Europe’s hottest startups at The Europas and one of the leading UK AI startups by Generative Group. Having raised over £1.5m to date, we are currently onboarding our first major enterprise clients and this is just the beginning! Our team operates in a dynamic, supportive atmosphere where everyone's voice is heard, and every good idea is valued. We want to build a workplace community where passionate individuals can thrive, grow, and contribute to groundbreaking Al coaching solutions that transform organisational culture and have a positive impact on the world. Join us on this exciting journey and shape the future of coaching and workplace culture. About The Role We are looking for a talented Senior Prompt Engineer to help us push the boundaries of what’s possible with LLMs. This role has two core aspects. First, you’ll design, build and optimise complex multi-agent prompt pipelines that directly power our product and customer outcomes. Second, you'll build and scale rigorous evaluation systems to evaluate these constantly changing pipelines – both automated and human-in-the-loop – to continuously assess and improve the quality, performance, reliability, and cost-efficiency of our prompt architectures. You’ll need to be deeply immersed in the latest techniques across prompting, LLM behaviour, multi-agent orchestration and agent design, and be excited to apply that knowledge in a fast-paced, high-impact startup environment. Requirements

Architect, design, and implement robust multi-agent pipelines leveraging a diverse range of LLMs
Systematically decompose complex problems into structured, scalable prompt-driven solutions
Use advanced prompt engineering techniques to drive desired results
Build and maintain a library of atomic evaluation prompts to measure coaching output quality across several dimensions
Develop and automate evaluation systems that can benchmark, identify regressions, and measure the stability of new prompts, models, and model versions
Use statistical techniques and domain knowledge to define quality thresholds, analyse variance, and surface outliers or black swan failures
Participate in designing and running human-in-the-loop processes to validate and improve evaluator prompts
Contribute to internal tooling for prompt observability, output sampling, evaluator scoring, and user feedback
Optimise the trade-offs between latency, quality and operational cost
Implement safety and security best practices
Build and manage modular, version-controlled prompt libraries with support for templating and reuse
Collaborating with full-stack engineers to design and implement complex automatedsystems to evaluate the quality and consistency of outputs across pipeline stages and agents.
Collaborating with other technical colleagues to develop tools and systems for prompt observability, such as usage tracking, output variance monitoring, and feedback loop integration.

Necessary to have

18+ months of hands-on experience in LLM prompt engineering, ideally with multi-prompt or agentic architectures
Demonstrated experience designing or contributing to LLM evaluation systems, either for QA, R&D, or production monitoring
Strong understanding of LLM behaviour, capabilities, and failure modes
Comfortable working with prompt evaluation tools or libraries (e.g., OpenAI Evals, DeepEval, Promptfoo, TruLens, LangChain eval, etc.)
Familiarity with advanced evaluation metrics and an ability to interpret results
A Masters degree with distinction in a relevant subject
Appreciation of coaching, behaviour change and organizational culture principles
Experience designing few-shot evaluator prompts and LLM-as-a-judge pipelines

Personal Characteristics

Great communicator who builds strong relationships with colleagues
Self-starter and fast learner, able to operate in a fast-paced environment
Creative problem solver with a can-do attitude
Accountable, reliable with high attention to detail
Passionate about our vision to reimagine coaching and corporate culture
Great communicator who builds strong relationships with colleagues
Self-starter and fast learner, able to operate in a fast-paced environment
Creative problem solver with a can-do attitude
Accountable, reliable with high attention to detail
Passionate about our vision to reimagine coaching and corporate culture

Benefits

Competitive salary and equity options.
Flexible working hours and a remote-first environment.
Opportunity to work on groundbreaking AI technology.
Learning and development budget to support your career growth.
A supportive, inclusive team culture where your contributions make a real impact.

Originally posted on LinkedIn

Apply now

Please let the company know that you found this position on our job board. This is a great way to support us, so we can keep posting cool jobs every day!

ScultureAI

View company page

Remote-Work.app

Get Remote-Work.app on your phone!

Get on Google Play Get on App Store

SIMILAR JOBS

Senior AI Engineer

Job Description

Apply now

ScultureAI

Remote-Work.app

Senior AI Engineer

Senior AI Engineer

Artificial Intelligence Engineer

Artificial Intelligence Engineer

Data Administrator

Data Administrator

Customer Insights Manager

Customer Insights Manager

Data Scientist

Data Scientist