Open Position: Data scientist for generative AI research
SolidLines Tech Services is a company focused on developing appropriate technology to improve living conditions in developing countries. Our clients include NGOs, higher education institutions, and Ministries of Health from various countries in Africa, Asia, and Latin America. We specialize in the implementation of large-scale information systems, data analytic platforms, generative AI, chatbots, and machine learning. This position comes from the collaboration between SolidLines and Girl Effect, which is an international NGO that relies on technology to connect girls to the resources and support they need to overcome barriers, and unleash their full potential.
SolidLines and Girl Effect are collaborating in the research and development of a conversational chatbot based on generative AI. This conversational chatbot will be based on the current GE chatbots tailored for young girls in developing countries, embodying the persona of a caring big sister (like Big Sis). The first prototype of this chatbot is based on Retrieval Augmented Generation (RAG). Next iterations of the chatbot will include more functionalities on Social Behaviour Change Communication (SBCC) to enhance the user experience and maximize the chatbot’s possessive impact in young users’ health lives. These functionalities will require the research, implementation and testing of sophisticated techniques in Natural Language Processing (NLP) and Large Language Models (LLM).
Our goal is to explore and apply the state-of-the-art of genAI models to create a safe, informative, and engaging solution for girls in South Africa, India and Kenya. To do so, we will combine the technical expertise of the SolidLines team with the applied knowledge of Girl Effect. The team that is working on this project includes several profiles, like data scientists, data analysts, ML engineers, content writers, software engineers, etc.. In this context, SolidLines is looking for an additional data scientist with a strong background in Python programming and software development to support the research, testing, and implementation of new upgrades in the chatbot.
Responsibilities
In collaboration with the ML and QA team, the data scientist will respond to the following responsibilities:
- Supporting the ML team in evaluating and testing genAI models and LLM application configurations for the different research objectives proposed. This work shall include RAG parameter tuning, embeddings, and fine-tuning of state-of-the-art LLMs among other generative AI techniques.
- Conducting comprehensive data exploration and analysis of textual data, applying statistical methods to summarize these datasets and defining data-driven workplans for next stages of the project.
- Supporting the QA team in performing detailed and replicable tests that guarantee the relevance of experimental findings.
- Provide technical advice and participate in discussions about the AI software architecture when needed.
- Documenting the experiments and results based on the team guidelines. Both the approaches, tuning, and results should be thoroughly documented for clear communication with the GE team to support decision-making.
- Communicating experiments, results, and key findings and decisions to stakeholders and other members of both the SolidLines and GirlEffect teams, including both technical to strategical discussions.
- Collaborating in the implementation of these solutions together with the dev team, to release new versions of the chatbot according to the client’s requirements.
Requirements
- Master’s degree or higher in a technical field like Computer Science, Data Science, Mathematics, Statistics, Machine Learning or Engineering.
- 5 years+ developing ML, data science models with Python or related programming: data exploration, supervised learning, and clustering.
- Software engineering experience in ML projects
- Deep knowledge of the Python ecosystem for AI/DS including packages like pandas, numpy, scipy, scikit-learn, pytorch, tensorflow, huggingface transformers, etc.
- Understanding and communicating in English is essential for this role.
- The ideal candidate should be proactive, detail-oriented, and demonstrate the ability to work independently while also excelling as a collaborative team member.
- Strong analytical and problem-solving skills, with the ability to analyze large datasets, extract insights, and translate these into actionable strategies.
- Excellent communication and collaboration skills, with the ability to work effectively in a multidisciplinary team and communicate complex data insights to non-technical stakeholders.
- Excellent organisation and project management skills; able to develop and drive research processes and timelines.
Desirable
- R&D experience.
- Prior experience in NLP, LLMs, deep learning and GenAI packages (Langchain, LlamaIndex, etc.).
- Experience in the social sector, strong interest on the use of technology for development and social change
Conditions
- Starting date: January/February 2025
- 100% remote work
- Open to discuss part time positions as well.