Senior Data Architect

Omilia · Poland · Remote-friendly

You will continue to the employer’s original posting.

Company
Omilia
Location
Poland
Employment type
Full-time
Posted
April 15, 2026

About this job

Accountabilities Own the Training Environment data architecture end-to-end: dataset design and schema for all ML training pipelines, including dialog corpora for LLM training, conversational steps for NLU models, annotated evaluation sets, and whole-call recordings for speech-to-speech model development. Define and govern data selection and sampling strategy: establish criteria that determine which production conversations have the highest training value, including diversity-optimized sampling, confidence-based filtering, edge-case prioritization, and deduplication strategies. Build and maintain the data catalog and dataset discovery infrastructure: enable ML engineers across LLM, NLU, Speech, and Agentic teams to find, understand, and use training data without friction. Define annotation

This is a short summary. The full description is on the employer’s page.

Get matched to jobs like this

Create a free profile and receive vacancies from Poland and across the EU that match your skills.

Get my matches