To not miss out on any new articles, consider subscribing.
It’s been over four years since I took on my first role in the Data Science industry. During this time, I have used different platforms, including social media, to talk about my journey, my work, and what I have learned over time. In some of these interactions with people, one of the most frequent questions I have received is, “How do I start a career in data science?”.
In this article, I will attempt to answer this question broadly. I will begin by sharing my journey and then I’ll share recommended paths from two of my friends who started out without an initial formal background in computer science or tech. The first, who has a Marine Science background, is currently building a career in NLP research, and the other is a data analyst who started out in Economics.
Finally, I will share a general roadmap you can tweak to fit your personal goals, timelines, etc.
How I started my data science career
It all began in late 2018 when I stumbled on the Data Science team at my then workplace. My interest in their work was piqued and I found it very exciting. That led to me switching to interning under the team, taking online courses, and, subsequently, picking up little tasks in the sprint.
Actually, it began before then. As far back as 2010, I had been using Microsoft Excel for different mathematical and data tasks. Then I got my first introduction to Python in 2014 and actually studied it as a course during my undergraduate studies in 2017. This gave me background knowledge of the fundamentals, different capabilities, and a basic understanding of these tools.
Fast forward to 2019, I was now a data science intern at this company and I was leaning towards Recommendation Systems (RecSys) and Natural Language Processing (NLP) – simply because that was the area the company’s product focused on. On the side, I picked up data visualization with Tableau.
Then, in 2020, I switched roles to a new company and started working with more numerical data and experimentation. I worked with this company till 2021. At this job, I learned to carry out statistical analyses like Time series forecasting, Correlation analysis, Google Analytics, SQL (PostgreSQL), A/B testing, and Metabase among others. Personally, I worked on an NLP project, on the side, with different supervised ML algorithms for classification and a Recurrent Neural Network.
I left the role in 2021 to begin grad school where I am learning more about Unsupervised ML, Reinforcement learning, Computer vision, AI Robotics, and active learning. I had my summer internship the next year and went back to working with text data but on a larger scale. I got more into Python scripting and Pyspark. I’m still in grad school and eagerly looking forward to what more I’ll learn.
My data science path
My journey is unique, as most of my learnings have been on the job; but to summarize my learning journey:
Microsoft Excel -> Python OOP -> Data analysis with Python -> NLP/RecSys -> Tableau -> Statistical analysis -> Google Analytics -> SQL -> A/B testing -> Supervised ML -> Deep Learning -> Unsupervised ML -> Reinforcement learning -> Computer Vision -> AI Robotics.
Also, I don’t think you can ever stop learning. At different points, I have had to relearn certain skills or tools, take advanced courses on certain topics, or just refresh my knowledge.
I don’t think your path has to look exactly like mine, not at all. I understand that everyone’s journey is different. It all depends on different circumstances and differs per person. One way to craft your path is to check out public roadmaps–like the one in this article, check job descriptions at your desired companies/locations, and also draw inspiration from people’s journeys. This is why I reached out to two friends to share their recommended paths for their respective tracks.
From Marine Science to Computer Science Ph.D in NLP research
So I’d like you to meet Orevaoghene. Orevaoghene, a former machine learning research engineer, who, although does not come from a typical tech/CS background as I do, has gone ahead to build a great career in Data Science/Machine Learning. She is also currently doing her Ph.D in Computer Science in the United States, with a research area in Natural Language Processing and she focuses her research on model interpretability and multilingual NLP. Here is her recommended path to get started in a machine learning research career:
- Take an Intro to Python programming course / Python for Data Science
- Get some familiarity with math basics (calculus, linear algebra, probability, and statistics)
- Go through the Machine learning foundational course by Andrew Ng / Deep learning specialization on Coursera.
- Join a research community to gain mentorship and collaborate on research projects. Some research communities you can check out are: AI Saturdays Lagos, Masakhane, ML Collective, and Cohere For AI.
- Work on projects.
- Read survey papers on the area you’ve picked interest in to find future directions to pursue.
- Reproduce experiments from published papers.
From Economics to Data Analyst
I also spoke to Esther, an Economist turned data analyst, and she shared what skills she recommends every data analyst acquires and tools to master. Ideally, a data analyst must have the following skills and at least one option in each toolset category.
Skills: Problem-solving, Analytical thinking, Communication, Leadership, Attention to detail, Adaptation, and Presentation skills.
Toolset:
- Spreadsheets: Microsoft Excel or Google sheets
- SQL: Big Query, PostgreSQL, or MySQL
- Programming: R or Python
- Data visualization: Tableau or PowerBI
- Presentations: Microsoft PowerPoint
Data Science Roadmap
As I mentioned, here is the proposed roadmap for getting into data science for a beginner in 2023. Remember, you don’t have to learn everything at once, for each category, it is okay to pick one option and focus on gaining mastery of it before advancing to the next tool.
I also like to split data science skills into data analysis and machine learning aspects. This is because as a data scientist, you have to first be able to analyze data to understand it better, spot any irregularities, and draw insights; before you begin to train any model on the data.
Data analysis skills
- Exploratory Data Analysis (EDA): Microsoft Excel, Python data analysis libraries (numpy, pandas), R data analysis libraries (dplyr, tidyr)
- Data visualization: Tableau or Microsoft PowerBI, Python visualization libraries (matplotlib, seaborn), R data visualization libraries (ggplot2, plotly)
- Data collection: SQL, Web scraping (beautifulsoup4 or scrapy Python libraries)
- Storytelling, presentation, and communication skills
Machine learning skills
- Data analysis skills (listed above)
- Programming: Python scripting and object-oriented programming paradigm
- Math fundamentals: Understanding the basics of Calculus, Linear algebra, Probability, and Statistics
- Building ML models: scikit-learn
- Feature selection and engineering
- Training and testing ML models
- Hyperparameter tuning
- Deploying models to production and maintaining: Flask, FastAPI, Streamlit, Cloud services: AWS, Azure, GCP, etc.
- Deep learning: tensorflow, keras
- Retraining models in production, more advanced tunings, and experimentations: MLFlow
I hope this is helpful to you or someone you know, as a guide along the journey. I understand that sometimes it could be confusing, challenging, or even discouraging; and I hope this helps give you some clarity. This is a comprehensive roadmap, with multiple options, to get you started. As you progress, you will pick up different libraries or packages specific to your area of interest, e.g. computer vision, natural language process, robotics, etc.
Please let me know in the comments, or via email: contact aniekan at gmail dot com, any other topics or areas you would like me to cover in my subsequent articles. And share this with anyone you know who is looking to get started in data science or any related fields.
Thank you for reading.
Aniekan
Related readings:
- https://aniekan.blog/2019/02/13/why-data-science/
- https://aniekan.blog/2019/02/06/my-tech-journey-part-1/
- https://aniekan.blog/2022/06/16/getting-a-summer-internship-at-a-big-tech-company/
To not miss out on any new articles, consider subscribing.