Applied Scientist, Observability and Triage, Prime Video

Amazon

Applied Scientist, Observability and Triage, Prime Video Overview

Company Name Amazon
Job Role Applied Scientist, Observability and Triage, Prime Video
Qualifications Bachelor’s
Category IT Jobs
Job Type Full Time
Location London

Prime Video is looking for an Applied Scientist to join its Observability and Triage team in London. This is a high-impact opportunity to help shape how Amazon improves the reliability and operational efficiency of Prime Video systems by applying machine learning and generative AI to incident management. The work sits within a fast-moving streaming business that serves customers in more than 200 countries and territories, offering a broad catalogue of films, series, Amazon Originals, exclusive licensed titles, live sports, add-on channels, and the ability to rent or buy new releases through the Prime Video Store.

The team is building a new initiative from the ground up to reduce the operational workload for Prime Video engineering teams. The focus is on creating AI-based capabilities that can automatically spot anomalies, infer likely root causes, recommend next steps, and even take action during operational incidents. The role is described as wide-reaching, with collaboration across development teams in the UK, India, and the US. The environment is data-rich and technically demanding, with petabytes of metrics, logs, and event data processed every day. The ideal candidate will be excited by the chance to experiment, learn quickly, and help define how AI can improve observability and triage at scale.

What you will do

  • Develop machine learning and generative AI systems that support automated incident triage, root-cause analysis, and resolution recommendations for large-scale operational environments.
  • Prototype ideas quickly and evaluate them in a setting where the problem space is still evolving, using both quantitative testing and practical understanding of operational workflows.
  • Create evaluation methods to measure how accurately the models triage incidents and predict root causes, including approaches that use large language models as judges.
  • Work with software engineering teams to move models from research into production observability systems that support hundreds of development teams.
  • Study patterns in large volumes of incident data to improve automated triage models and to understand how complex failures unfold across services.
  • Design and run experiments to compare new generative AI approaches with existing methods for identifying root causes in multi-service incidents.
  • Communicate results clearly to a range of audiences, including technical teams, non-technical stakeholders, and broader internal or external audiences through written reports, presentations, and publications where appropriate.
  • Collaborate closely with engineers and operational partners around the world so that research outcomes translate into systems that reduce customer-facing impact.

What the team does

The Observability and Triage team builds AI-powered tools for Prime Video development teams. These systems analyse enormous volumes of operational data each day in order to detect incidents, diagnose problems, and recommend resolutions automatically. The goal is to make it faster and easier for internal teams to respond to operational events and to minimise disruption for customers.

What the role is like day to day

On a typical day, you may be reviewing patterns across thousands of incidents to improve an automated triage model. You might then design an experiment to test whether a new generative AI method can better identify the root cause of a complicated incident involving several services. Your internal users are Prime Video development teams, and your work is intended to reduce the time and effort they spend dealing with operational issues. You will work closely with software engineers and operational stakeholders across different regions to make sure the research is turned into production systems that deliver measurable improvements.

What you need

  • Programming experience in Java, C++, Python, or a similar language.
  • Experience in at least one relevant technical area such as algorithms and data structures, parsing, numerical optimisation, data mining, parallel and distributed computing, or high-performance computing.
  • Experience building machine learning models for business applications.
  • A PhD or master’s degree in computer science, computer engineering, machine learning, or a related discipline, or equivalent relevant work experience.
  • Strong technical ability, along with excellent communication and collaboration skills.
  • The ability to work effectively in a fast-paced, ambiguous environment and to move from idea to experiment quickly.
  • A strong drive to create customer value from research and to contribute to work with global impact.

Nice to have

  • Experience working with Unix or Linux systems.
  • Experience in professional software development.

Additional information

Amazon states that it is an equal opportunities employer and makes hiring decisions based on experience and skills. The company says it values a diverse workforce and an inclusive culture, and it provides information about privacy, candidate data handling, and workplace accommodations for applicants who need support during the hiring process. The page also points applicants to Amazon’s accommodation resources and privacy notice for more information.


Degree Requirement: Bachelor’s

Visa Sponsorship May be

To apply for this job please visit www.amazon.jobs.

admin
the authoradmin