Properties
Intern - Record Linkage Data Science
C1
D005 - Human Resources
Head Office - Lusaka District, Lusaka Province
12 May 2025 00:00
21 May 2025 00:00

Reporting to the Senior Data Technical Support Officer, the Intern will be trained to processing, linking, and analysing data using Python and specialized record linkage techniques. This involves cleaning and standardizing data, implementing quality checks, utilizing string and numeric matching algorithms to accurately link records across datasets, and documenting the linkage process. The intern will also perform exploration data analysis to derive insights and contribute to data-driven decision-making. Strong collaboration, communication, and documentation skills are essential to work effectively with the team and stakeholders while adhering to data sensitivity and confidentiality.

Key activities:

  • Conduct data cleaning, standardization, and formatting using Python libraries such as Pandas and NumPy.
  • Implement data quality checks and validation procedures to identify and rectify errors or inconsistencies.
  • Utilize advanced record linkage techniques with Python libraries including Scikit-Learn, Splink, record linkage and Fuzzy, to identify and match records across multiple datasets.
  • Configure and optimize linkage models to improve accuracy and efficiency.
  • Document and maintain detailed records of linkage processes, parameters, and outcomes.
  • Perform exploratory data analysis (EDA) using Python data analysis libraries to understand dataset characteristics and patterns.
  • Collaborate with team members to extract actionable insights from linked datasets and support data-driven decision-making.
  • Prepare comprehensive documentation outlining the record linkage methodology, including the implementation of Splink and fuzzy matching algorithms.
  • Generate regular reports summarizing progress, challenges, and outcomes of record linkage activities.
  • Collaborate with cross-functional teams, including software developers, analysts, and subject matter experts, to understand data requirements and project objectives.
  • Communicate effectively with stakeholders to gather feedback, address concerns, and ensure alignment with project goals.

Requirement:

  • Full Grade 12 certificate
  • A Bachelor’s degree in Computer Science, Statistics, Artificial Intelligence, Data Science, or a related field.
  • Proficiency in Python programming language, with experience in libraries such as Pandas, NumPy, Scikit-Learn, Statsmodels, Splink.
  • Familiarity with the application of mathematical/statistical modelling, and deterministic and probabilistic string and numeric comparator/match algorithms, is advantageous.
  • Strong analytical and critical thinking skills, and meticulous.
  • Excellent communication and interpersonal skills, with the ability to work both independently and collaboratively in a team environment.

Suitably qualified candidates are invited to apply. However, only shortlisted candidates will be contacted