{"id":1228,"date":"2026-03-29T03:04:49","date_gmt":"2026-03-29T03:04:49","guid":{"rendered":"https:\/\/coderseditor.com\/itjobs\/?post_type=job_listing&#038;p=1228"},"modified":"2026-03-29T03:04:56","modified_gmt":"2026-03-29T03:04:56","slug":"ml-data-engineer","status":"publish","type":"job_listing","link":"https:\/\/coderseditor.com\/itjobs\/job\/ml-data-engineer\/","title":{"rendered":"ML Data Engineer"},"content":{"rendered":"<p>Founded in the US in 2022 and now based in London, UK, Recraft is an AI tool for professional designers, illustrators, and marketers, setting a new standard for excellence in image generation.<\/p>\n<p>We designed a tool that lets creators quickly generate and iterate original images, vector art, illustrations, icons, and 3D graphics with AI. Over 3 million users across 200 countries have produced hundreds of millions of images using Recraft, and we\u2019re just getting started.<\/p>\n<p>Join a universe of professional opportunities, develop and support large-scale projects, and shape the future of creativity. We are committed to making Recraft an essential, daily tool for every designer and setting the industry standard. Our mission is to ensure that creators can fully control their creative process with AI, providing them with innovative tools to turn ideas into reality.<\/p>\n<p>If you\u2019re passionate about pushing the boundaries of AI, we want you on board!<\/p>\n<h2><strong>Job Description<\/strong><\/h2>\n<p>At Recraft, we\u2019re building the next generation of generative models across images and text. We\u2019re looking for an <strong>ML Data Engineer<\/strong> to scale our data pipelines for unstructured data (primarily images) and keep our training flows fast, reliable, and repeatable. You\u2019ll design and operate high-throughput ingestion and preprocessing on Kubernetes, evolve our internal data-pipeline framework, and work hand-in-hand with ML engineers to ship datasets that move model quality forward.<\/p>\n<h2><strong>Key Responsibilities<\/strong><\/h2>\n<ul>\n<li>Develop and maintain data-ingestion pipelines to source and prepare large-scale image (and occasional text\/HTML) datasets from open, publicly accessible, and permitted sources.<\/li>\n<li>Own the end-to-end flow: raw data \u2192 quality\/beauty\/relevance filtering \u2192 dedup\/validation \u2192 ready-to-train artifacts.<br \/>\nOperate and improve our <strong>Kubernetes-based<\/strong> data-pipeline framework (distributed jobs, retries, monitoring, automation).<\/li>\n<li>Work with <strong>S3-style object storage<\/strong>: efficient layouts, lifecycle, throughput, and cost awareness.<\/li>\n<li>Add tooling around pipelines (progress\/health visualization, metrics, alerts) for observability and faster iteration.<\/li>\n<li>Collaborate closely with ML engineers to align datasets with training needs and accelerate experimentation.<\/li>\n<\/ul>\n<h2><strong>Requirements<\/strong><\/h2>\n<p><strong>Must-have<\/strong><\/p>\n<ul>\n<li>Strong <strong>Python<\/strong> fundamentals; you write clean, maintainable, production-ready code.<\/li>\n<li>Solid hands-on <strong>Kubernetes<\/strong> experience (containers, jobs, batch\/distributed processing).<\/li>\n<li>Proven track record with <strong>unstructured data<\/strong>, especially <strong>images<\/strong> (loading, filtering, transforming at scale).<\/li>\n<li>Experience developing data-ingestion or parsing tools for publicly accessible sources, including handling real-world reliability and failure cases gracefully.<\/li>\n<li>Comfort with <strong>S3\/object storage<\/strong> and moving lots of data efficiently and safely.<\/li>\n<li>Pragmatic, detail-oriented, ownership mindset; you enjoy making systems reliable and fast.<\/li>\n<\/ul>\n<p><strong>Nice-to-have<\/strong><\/p>\n<ul>\n<li>Familiarity with ML workflows (PyTorch) and downstream training considerations.<\/li>\n<li>Experience with image quality scoring, captioning, or image-to-text pipelines.<\/li>\n<li>DAG\/workflow visualizations or pipeline UX tooling.<\/li>\n<li>DevOps fluency: Docker, CI\/CD, infra automation.<\/li>\n<\/ul>\n<h2><strong>What We Offer<\/strong><\/h2>\n<ul>\n<li><strong>\u200b\u200bCompetitive salary and equity.<\/strong><\/li>\n<li>We\u2019re able to offer Skilled Worker visa sponsorship in the UK for qualified candidates.<\/li>\n<li><strong>Real impact on model quality:<\/strong> your pipelines directly power training runs and product improvements.<\/li>\n<li><strong>Ownership with support:<\/strong> autonomy to design and improve systems, alongside experienced ML peers.<\/li>\n<li><strong>Modern stack:<\/strong> Python, Kubernetes, S3, internal pipeline framework built for scale.<\/li>\n<li><strong>Growth:<\/strong> a fast-moving environment where shipping well-engineered systems is the norm.<\/li>\n<\/ul>\n","protected":false},"author":1,"featured_media":0,"template":"","meta":{"_acf_changed":false,"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_promoted":"","_job_location":"London, UK","_application":"https:\/\/jobs.ashbyhq.com\/recraft\/50837f89-9410-4da0-8025-8bfae97a9ff4?locationId=ec348909-0c4d-4456-9c13-abd9c86bdab1","_company_name":"Recraft","_company_website":"","_company_tagline":"","_company_twitter":"","_company_video":"","_filled":0,"_featured":0,"_remote_position":0,"_job_salary":"","_job_salary_currency":"","_job_salary_unit":""},"job-types":[38],"class_list":{"0":"post-1228","1":"job_listing","2":"type-job_listing","3":"status-publish","6":"job-type-experienced"},"acf":[],"aioseo_notices":[],"jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/coderseditor.com\/itjobs\/wp-json\/wp\/v2\/job-listings\/1228","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/coderseditor.com\/itjobs\/wp-json\/wp\/v2\/job-listings"}],"about":[{"href":"https:\/\/coderseditor.com\/itjobs\/wp-json\/wp\/v2\/types\/job_listing"}],"author":[{"embeddable":true,"href":"https:\/\/coderseditor.com\/itjobs\/wp-json\/wp\/v2\/users\/1"}],"wp:attachment":[{"href":"https:\/\/coderseditor.com\/itjobs\/wp-json\/wp\/v2\/media?parent=1228"}],"wp:term":[{"taxonomy":"job_listing_type","embeddable":true,"href":"https:\/\/coderseditor.com\/itjobs\/wp-json\/wp\/v2\/job-types?post=1228"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}