Staff Software Engineer — Search Platform, Ingestion & Indexing (Toronto)

Staff Software Engineer — Search Platform, Ingestion & Indexing (Toronto)

19 Apr
|
Refinitiv
|
Toronto

19 Apr

Refinitiv

Toronto

# **Our Privacy Statement & Cookie Policy**This posting is for proactive recruitment purposes and may be used to fill current openings or future vacancies within our organization.**Overview of the Role**Advanced Content Engineering (ACE) is seeking a Staff Software Engineer to serve as the technical anchor for the search platform’s ingestion and indexing systems. The platform processes millions of documents across TR’s legal, tax, and professional content corpora — parsing, chunking, enriching, embedding, and indexing them into a hybrid search engine that powers both human-facing search interfaces and autonomous AI agents. Getting this pipeline right, at scale, with zero-downtime operations and increasingly agentic retrieval patterns, is one of the platform’s most consequential engineering challenges.This role owns the design, implementation, and operational health of the document ingestion pipeline and search index management systems — from the Kafka-based streaming infrastructure that moves documents through processing stages, to the Vespa application architecture that stores and serves them. Staff Engineers on this team define, build, test, deploy, scale, and operate what they ship — full-stack ownership is not a principle we aspire to, it is the daily reality. AI-assisted development is the team norm, not the exception, and constant delivery to production is the expectation. This is a role for someone who sets architectural boundaries, not just executes within them**About the Role**In this position, you will focus on:Ingestion Pipeline Architecture & Engineering• Plan, design, develop, and own the end-to-end document ingestion pipeline — a Kafka-based stream processing architecture that moves documents through parsing, chunking, enrichment (entity extraction, embedding generation, metadata enrichment), and indexing stages — including all fault tolerance, version ordering, and at-least-once delivery guarantees• Architect and implement pluggable, configurable pipeline components (parsers, chunkers, enrichers, indexers) that client teams can assemble into custom topologies via the platform’s self-service APIs, while maintaining reliable, observable, and performant execution• Own the platform’s Protobuf-based document schema and schema registry integration — establishing schema governance standards, enforcing backward-compatible evolution, and ensuring reliable serialization across all pipeline stages• Design and implement dual-flow ingestion: a high-throughput batch path for full reindexing and a low-latency incremental path for real-time document updates, with robust guarantees around document version ordering and idempotent processing• Lead the migration of ingestion infrastructure from OpenSearch to Vespa, including design of Vespa document processors, custom Kafka feeders,



and application package architecture — resolving complex technical challenges that have little or no precedent within the team**Custom Model Operationalization**• Own the end-to-end lifecycle for custom models integrated into the ingestion pipeline — re-ranking models, embedding models, and enrichment components — including inference serving behind a stable API surface, latency SLO management, hardware and runtime configuration (batching, quantization), and scaling• Build and operate the model promotion pipeline: the CI/CD workflow that moves a model artifact from the fine-tuning team through staging to production, including versioning, canary rollouts, and rollback mechanisms — ensuring the platform team can operate model updates independently without depending on the research team for production changes• Define and maintain integration contracts between custom models and downstream pipeline components — governing input/output schemas, compatibility requirements, and the governance process for model updates that ensures search pipeline consumers are not broken by changes upstream• Instrument model serving for production observability: latency distributions, throughput, error rates, and quality signals such as re-ranking score distributions — enabling the team to detect regressions or model drift without requiring the fine-tuning team’s involvement**Search Engine & Index Management**• Own the search engine layer end-to-end: design and operate Vespa (and OpenSearch during transition) index configurations, ranking profiles, schema definitions, and application package lifecycle management — applying architectural principles that scale to the platform’s long-term content and tenancy goals• Build and operate zero-downtime index management: shadow indexing, blue/green index promotion, and rolling reindex workflows that keep the platform available during major infrastructure changes• Implement and maintain the Component Registry and Index Registry — the platform’s catalog of reusable processing components and active index configurations — with a focus on correctness, observability, and safe concurrent modification• Develop the full-reindex and incremental-update orchestration logic, including change detection, document tracking, Kafka topic management,



and DynamoDB-backed state management**Agentic Search Infrastructure**• Design ingestion and indexing infrastructure with agentic retrieval patterns as a first-class concern — including explicit latency budgets per retrieval hop, chunking and result compression strategies optimized for token economy in context windows, and index boundary definitions that give agents clean, predictable tool contracts• Build trace-level observability into the retrieval stack that captures which tools were called, in what order, and with what inputs — enabling reliable diagnosis and reproduction of failures in non-deterministic agentic retrieval paths• Design session state and cache invalidation patterns for multi-turn agentic search: reasoning carefully about cache validity windows, session state scope (per-user, per-session, per-query), and mechanisms to prevent stale context from corrupting downstream agent responses**Evaluation & Search Quality**• Build and own the integration between the ingestion pipeline and the platform’s offline evaluation framework — ensuring that experiment runs produce query/result outputs that feed seamlessly into the search grading tool, supporting gold test set maintenance, LLM-as-judge evaluation, and side-by-side ranking comparison across pipeline versions• Instrument the query and retrieval stack for online analytics: real-time query latency and throughput monitoring, query log collection for session analysis, and the infrastructure to support A/B and interleaved ranking experiments in production — generating the signals that connect low-level search metrics to downstream product KPIs• Partner with TR Labs and research scientists to ensure that new search components can be evaluated in isolation — with automated offline evaluation on every build and a clear path from evaluation results to production promotion decisions**Reliability & Operational Ownership**• Take full operational responsibility for ingestion and indexing infrastructure: define SLOs, set measurable goals and meet them, build and maintain CloudWatch dashboards and alarms, and participate in on-call rotations — you built it, you own it, you run it• Treat delivery friction as the enemy: identify and remove obstacles that slow the team’s ability to ship ingestion and indexing changes to production safely and frequently — improving CI/CD pipelines, deployment automation, and local development workflows as a standing priority• Instrument pipeline components with distributed tracing, structured logging, and rich metrics — establishing documentation standards and knowledgemanagement practices so that the team and platform consumers can understand system behavior at all times• Design and implement resilient fault tolerance mechanisms — dead-letter queues, retry strategies with exponential backoff,
#J-18808-Ljbffr

📌 Staff Software Engineer — Search Platform, Ingestion & Indexing (Toronto)
🏢 Refinitiv
📍 Toronto

Reply to this offer

Impress this employer describing Your skills and abilities, fill out the form below and leave Your personal touch in the presentation letter.

Subscribe to this job alert:
Enter Your E-mail address to receive the latest job offers for: staff software engineer — search platform, ingestion & indexing (toronto) / toronto
Subscribe to this job alert:
Enter Your E-mail address to receive the latest job offers for: staff software engineer — search platform, ingestion & indexing (toronto) / toronto