Blog #2: Data Analysis Script (Informal)

Hello all! This post will focus solely on the data analysis script I’ve been working on under the Data Lead for the testing team. 

At the beginning of my internship I expressed interest in sharpening my data analysis and data science skills. My research mentors and the Data Lead happily obliged me, and I have spent the past month and a half or so creating a data analysis script, employing SQL querying and joining from a postgreSQL database and python (pandas, numpy, and matplotlib) for data cleaning, analysis, and visualization. The main goal for this script was to take often used metrics and create a sort of one-stop-shop script where the test scientists could plug in a key for the data they want to run the script on, and the script would spit out some plots and tables with useful statistical analyses.  

The most time consuming part of this project was writing a script to clean the data. I don’t love the word time consuming in this case; time consuming has a negative connotation, and I really enjoyed this process. The most fun part of programming for me is being able to solve intricate and interesting puzzles. Anybody who has taken an introductory programming course (and has enjoyed it) understands this. Usually though, after those introductory courses, the fun puzzle questions decrease in frequency. Creating a script to clean the data carried on this enjoyment for me. The first step of this process was creating test cases that anticipated the possible errors there could be in the data. This is where my time working with the testing team doing the actual testing came in really handy. With the testing team I gained an understanding of what the data was and the possible errors it could have. I had the help of the Data Lead to write test cases based on these possible errors and set to solving them in a comprehensive and computationally simple way. 

Making the data analysis script was a wonderful opportunity. It was totally separate from my other projects and other work. I told my supervisors at Imprint that I wanted to get more experience applying my data skills, so they gave me a project to do just that. I learned a lot of useful information about data cleaning, querying, and analysis with guidance from the Imprint team. Big thanks to them for the experience, and to you for reading!