Blog 5: fin.

Time really flies—it still feels like yesterday when I first began my internship, bubbling with nervous excitement.

I can still remember my first Zoom meeting with my research mentors before the internship even began. That day, I had set a 5 am alarm (it was an early morning meeting), changed into my best clothes, and nervously reviewed everything I had previously learned about machine learning and chemistry, barely able to sit still with anticipation. At that time, I only knew that the project had something to do with generating solar molecules, but not how we planned to approach it.

I remember excitedly describing all the different methods I could think of to achieve the task during the meeting and wondering what we would ultimately decide to do. At the end of the meeting, I had asked for a reading list—which I devoured as soon as I received it—to strengthen my understanding.

I remember setting up my remote account on the first day of work, learning new UNIX commands on the way (nohup, in particular, which lets me run scripts in the background even after logging out of my remote session, has been a complete lifesaver). I remember the first visualization graphs I made with seaborn (which I had not used before) and matplotlib, and eventually getting a lot more familiar with the two libraries. I remember learning to consider computational efficiency when coding, from evaluating my choices of NumPy operations to learning about how each Graph object is stored in memory. I remember first reading about GPR and going from complete confusion in the first few weeks to finally comprehending how the algorithm worked. I’ve not only done so much in the last few weeks—I’ve learned an incredible amount as well.

It wasn’t merely algorithms and libraries and commands that I learned in the last two months, either. Through training and testing different models, I developed a lot of good coding habits. Since I needed to showcase my code to other people, I got into the habit of documenting each class and function and took time to make sure that it was clean and legible. I also learned to thoroughly consider different graphical ways of presenting my results to better emphasize different trends in the data. In addition, after running out of time to train a model on several occasions, I learned to take time into consideration when training a model—to start with smaller datasets and get an idea of the time it takes to compute different datasets before training the larger, full dataset with the algorithm. All these are habits that will help me in completing projects in the future.

Nevertheless, without my research mentors, I would never have learned so much and had so much fun. From providing me with extra resources to read up on and explaining harder concepts to me in our meetings, to always being willing to answer my questions through Slack and email, their support has truly given me one of the best learning experiences I’ve ever had.

Now, the internship is drawing to a close. This is my final week of my internship with the Computational Chemistry, Materials, and Climate group in Lawrence Berkeley Lab on this project this summer.  In the last two weeks, I’ve adapted my code to a larger dataset. I plotted the distributions of this new data (and the t-SNE graphs of it as well), which I’ve included above, and transferred my trained GPR model to this new dataset. I’ve also cleaned up my MCTS code so that it is more efficient and accurate and supports deletion rewrites to different molecules. I’ll be working on finally putting everything together to form a model that generates solar molecules on this final week.

Thanks for following me on this wonderful journey!