Hi, my name’s Pablo, and I‘ve been working as a data scientist at DWP Digital since October last year. I’ve really enjoyed my first few months as part of the data science team in the Newcastle hub; being able to use my skills to improve people’s lives is something I’m really excited about.
My educational background is in computer science. I was always fascinated by the possibilities of machine learning, which is why pursued a PhD in the topic. After spending several years in academia, I decided to move to greener pastures and start my career as a data scientist. I was driven by the desire to solve practical problems with data, and making a short-term impact.
I spent my first years as a data scientist in the private sector, where I learned most of the principles and processes aimed at improving the way data science is implemented in organisations. This side of my job soon became my passion.
So far, so good
At DWP Digital, I work as part of a multi-disciplinary group of people, with a varied set of skills, experiences and backgrounds, which is very enriching. The nature of the projects we’re working on is stimulating in terms of scope, potential and complexity. Churchill, the most mature of these projects, is a great example of how open data can be served by means of an interactive application. It helps users to take informed decisions based on evidence.
From the very beginning I was given the opportunity to leverage all the knowledge I’ve acquired in the past about what makes data science teams more effective and efficient. I think that part of the reason for this is that I joined the data science team at the right time. More and more teams within DWP are adopting agile ways of working, and several departments across government are introducing concepts like pipelines for analysis, automation and reproducibility with the support of the Government Digital Service. These are exciting times!
Agile data science teams
Coming from a computer science background, I’m a strong advocate of agile methodologies for software projects. I particularly enjoy the flexibility and the interaction with other team members, stakeholders and users. I find the concept of Minimum Viable Project (MVP) very appealing and I’m delighted to be part of a team that puts agile principles into practice every day.
However, I’m also very aware that it’s hard to translate ‘vanilla’ agile to data science teams, especially during the first stages in the development of a new data product. How do we deal with the uncertain process of doing research? How do we continuously evaluate the value of potentially interesting research directions? How do we estimate effort? What is a MVP in this context?
It’s always difficult to introduce changes in a long-established team that is already following some form of agile process. But agile is about adapting – not only to changes in the product use cases and user specifications, but also about adapting agile itself to the way the team performs the best. In this sense, the role of retrospective meetings is key.
Every two weeks the team has an open and sincere discussion about how we can improve the way we work. Thanks to these conversations we’re able to iteratively try and evaluate new approaches. We can agree on our definitions of different concepts, like when we can consider a task to be completed, or what is a useful acceptance criterion. I’m very lucky to be part of a team in which not only can I suggest new ideas, but I can learn – and keep learning every day.
Experiments reproducibility and pipelines
Another aspect which I’ve been focusing on during the last few weeks is improving research reproducibility and analysis automation. Reproducible research is always a challenge, but it can also be a source of many different benefits for a data science team. Making sure that people in the team not only have access to the data, but to the code and the environments required to repeat any analysis, lets us focus on keeping producing results, rather than on superficial details. This includes using concepts widely adopted by software engineering teams around the world, like version control systems, code reviews and continuous integration.
One fundamental component of continuous integration is the deployment of automated data analysis pipelines. A data analysis pipeline receives some data as an input and processes it through a series of stages chained together in order to produce a given output. These pipelines should be the building blocks of our prototypes; they ensure that our data-based applications are regularly updated. And since our objective is to make these pipelines as flexible and general-purpose as possible, they also reduce the time-investment required to analyse other sources of data.
We’re currently designing and implementing a new pipeline with the aim of building up a machine learning-powered analytical layer over Churchill. This will enable Churchill users to extract even more value from data and take better informed decisions.
A long road ahead
Being part of the data science team in Newcastle is a very rewarding experience. I’m doing what I love and I’m learning lots in the process. Adopting good practices and improving processes is not the end line, but a journey which will result in improved ways of working where we can extract value from data and have an impact on people’s lives. I am ready to keep embracing continuous change!
Come join us!