Data Engineer III

Categories Engineer
Salary Market Related
Location Western Cape
Job Information

Job Spec

Our Client in Cape Town requires a Data Engineer III for a contract role. Data Engineers build and support data pipelines and datamarts built off those pipelines. Both must be scalable, repeatable and secure. The Data Engineer helps to facilitate gathering data from a variety of different sources, in the correct format, assuring that it conforms to data quality standards and assuring that downstream users can get to that data timeously. This role functions as a core member of an agile team.

These professionals are responsible for the infrastructure that provides insights from raw data, handling and integrating diverse sources of data seamlessly. They enable solutions, by handling large volumes of data in batch and real-time by leveraging emerging technologies from both the big data and cloud spaces. Additional responsibilities include developing proof of concepts and implements complex big data solutions with a focus on collecting, parsing, managing, analysing and visualising large datasets. They know how to apply technologies to solve the problems of working with large volumes of data in diverse formats to deliver innovative solutions.

Data Engineering is a technical job that requires substantial expertise in a broad range of software development and programming fields. These professionals have a knowledge of data analysis, end user requirements and business requirements analysis to develop a clear understanding of the business need and to incorporate these needs into a technical solution. They have a solid understanding of physical database design and the systems development lifecycle. This role must work well in a team environment.



  • 4 years Bachelors degree in computer science, computer engineering, or equivalent work experience
  • AWS Certification at least to associate level



  • 5+ years Data engineering or software engineering
  • 3-5 years demonstrated experience leading teams of engineers
  • 2+ years Big Data experience
  • 5+ years experience with Extract Transform and Load (ETL) processes
  • 2+ years Could AWS experience
  • At least 2 years demonstrated experience with agile or other rapid application development methods – Agile exposure, Kanban or Scrum
  • 5 years demonstrated experience with object oriented design, coding and testing patterns as well as experience in engineering (commercial or open source) software platforms and large scale data infrastructures


  • 5+ years Retail Operations experience


Work Complexity:

  • Architects Data analytics framework
  • Translates complex functional and technical requirements into detailed architecture, design, and high performing software
  • Leads Data and batch/real-time analytical solutions leveraging transformational technologies
  • Works on multiple projects as a technical lead driving user story analysis and elaboration, design and development of software applications, testing, and builds automation tools

Main Job Objectives:

1. Development and Operations

2. Database Development and Operations

3. Policies, Standards and Procedures

4. Communications

5. Business Continuity & Disaster Recovery

6. Research and Evaluation

7. Coaching/ Mentoring

Knowledge and Skills


  • Creating data feeds from on-premise to AWS Cloud (2 years)
  • Support data feeds in production on break fix basis (2 years)
  • Creating data marts using Talend or similar ETL development tool (4 years)
  • Manipulating data using python and pyspark (2 years)
  • Processing data using the Hadoop paradigm particularly using EMR, AWS’s distribution of Hadoop (2 years)
  • Devop for Big Data and Business Intelligence including automated testing and deployment (2 years)
  • Extensive knowledge in different programming or scripting languages
  • Expert knowledge of data modeling and understanding of different data structures and their benefits and limitations under
  • particular use cases

Further technical skills required:

  • Capability to architect highly scalable distributed systems, using different open source tools
  • Big Data batch and streaming tools
  • Talend (1 year)
  • AWS: EMR, EC2, S3 (1 year)
  • Python (1 year)
  • PySpark or Spark (1 year) – Desirable
  • Business Intelligence Data modelling (3 years)
  • SQL (3 years)

If you have not heard from us after 2 weeks, please consider your application unsuccessful.

Apply Now »