[Blog #2] Deep Dive into DISTIL

These past two weeks, I’ve started working with a system called DISTIL, which makes it possible to quickly perform analytics on a set of input data streams from sensors on the power grid and output results in almost real time. Currently, not many people at the company know how to use it, but we are aiming to get the data science team more involved in using it, so I’ve been working with Michael, another member of the data science team, on learning DISTIL. 

To use DISTIL, you have to program in Go. I didn’t have any experience with Go beforehand, so I spent the first couple days this week becoming familiar with programming in Go and looking at some distillers that had already been written as a reference. After talking with my supervisor Laurel, we decided that a good place to start might be calculating the effective impedance between two sensors on the grid. In general, this calculation is based on physical knowledge of the system and the materials connecting everything together. However, often these calculations are done based on physical models and are not verified by the actual data. By estimating the impedance with sensor data, we could both validate the physical models and also be able to tell if the impedance changes (which could happen from usual environmental changes, like fluctuations in temperature or humidity, or from other events such as a branch falling on a power line). 

For the rest of the week, I wrote out most of the body of the distiller and worked on setting up a testing environment where I could get the program running. The next week, I was actually able to start running distillers. It took a little bit to figure out how to get everything running because the person who knew how to run DISTIL was on vacation, but fortunately everything was pretty well documented, so I was able to figure it out with a good amount of help from the other people on the team. I also started working more closely this past week with Michael, who wanted to work on applying DISTIL to finding data quality issues. We worked together to write distillers to check for whether a stream was outputting zeros or outputting the same value over and over, both of which can be indicators of bad data. After having written the first distiller, it was much faster to write more. Even though we only tested the distillers on randomly generated test data, it was pretty cool to see the distillers outputting data in real time and responding really quickly even if I edited the input data that had previously been processed.

I still have a good amount to learn about DISTIL, but over these past two weeks I’ve learned so much. I’ve learned how to program in Go and how to use it to write (relatively simple) distillers. I’ve also learned a lot about how to use the testing environments and a bit about how to set them up and take them down. Everyone that I’ve worked with has been extremely helpful and kind – even though I know that they are all busy with their own tasks, whenever I have a question, it’s not uncommon that someone will offer to spontaneously hop on a call with me to help explain something or debug some error that I’m stuck on. I really appreciate this collaborative environment, as it’s helped me get the most out of my experience so far.