Showing posts with label data science. Show all posts
Showing posts with label data science. Show all posts

Friday, August 7, 2020

On Being Asian and in a STEM Career

I want to first say that my new website, which shines light on different topics in data science can be accessed at morningmodule.com

The second thing I want to say is that I will be pivoting this blog to be more of a personal reflection type of blog where I will be writing my thoughts on matters that may or may not be directly related to data science or being an aspiring data scientist. So, instead of this blog having a focus on my projects, it will be less technical in nature and more of my musings. 

Lately, I had some thoughts about being in a STEM career and being Asian American. These thoughts were prompted by my sister who has always had a big influence on my whether I acknowledge it or not. My sister is the complete opposite of me in terms of her personality, interests, and of course, career aspirations. She is outspoken and stubborn and I am shy and more of a push-over. She majored in Chinese and English and I majored in Math. She went to college in NYC and I went to school upstate of NYC. She is interested in Asian American politics, civic engagement, immigration law, and I am interested in figuring out how to incorporate backend Python code to a frontend Angular framework. 

My sister also started a newsletter which focuses on all things Asian American and also collaborated with others to start a podcast called "Fresh off the Vote" which focuses on the topic of voting mainly in the Asian American Pacific Islander community. That's pretty amazing right? Considering there aren't that many Asian Americans who are in politics, she may be doing things at the frontier of civic engagement among Asian Americans. 

As for me, I feel more like I have followed a stereotypical route of being on a STEM track almost my whole life. Although I was technically a "Social Science major" in high school, I still went to a technical high school (Brooklyn Technical High School) and took the hardest math classes they had to offer. I also contemplated on being a Biology major as well in college. I can read my old journal entries from elementary school and I can see how focused I was on grades as well. I just feel I have checked all the boxes of being a stereotypical Asian American. 

Now that I am an aspiring data scientist, I sometimes wonder why I chose this field. Of course it is interesting--there is no doubt. But, I wonder how I can bring more of the humanities to data science sometimes. I remember really enjoying my "Literature and Society" class and studying abroad in Spain to learn the language, culture, and history.  I also remember enjoying getting my Masters in Teaching, although teaching was not as enjoyable. Although I enjoyed learning about the humanities and teaching, I ended up in a more technical field than ever. I think it isn't because I inherently liked STEM more, but because of how I was raised that put STEM at a higher pedestal. 

I simply have to remember that I am a multi-faceted person and although I have chosen data science as a career track, I am not just a data scientist. I am a human with other interests and perhaps other obligations as a person who has the ability to vote. 

I suppose my sister's newsletter and the podcast she is a part of is awakening the other side of me who wants to make a difference not just in data science but in other parts of society. Her existence is a call for action. It is time to not forget to read the news sometimes, understand some politics, and take action when needed.

"Fresh off the Vote" is available everywhere. My sister's newsletter can be accessed through nonnative.substack.com




Saturday, March 7, 2020

Imposter Syndrome

What has stayed with me in my transition from a teaching career to a data science career, among other things, is imposter syndrome. Imposter syndrome is the feeling of not having the "expertise" to do your job well or a feeling of not "fitting in" with others who are doing the same work you do. There are probably better definitions, but I'm just giving you mine.

I'm not a "real" teacher

Imposter syndrome is real and at first, I didn't recognize it as what it was. It was subtle, but as with all little things, it builds up and manifested itself in troublesome ways. As a teacher, I felt imposter syndrome almost all the time. I felt that I was not really a teacher, so that led me to believe that I couldn't teach well. I felt I had no classroom management experience, and although that is true, it is true amongst many new teachers. I was comparing myself to the veteran teachers and I felt that I was nothing like them, so therefore, I was not a good teacher.

The feeling that I was insufficiently prepared (maybe it's true) just exacerbated my imposter syndrome. It was like a self-fulfilling prophecy where I thought I wasn't good enough and then it really turned out that I was not good enough. Although, I have to say in retrospect that those feelings were normal and the outcome is also normal for many new teachers. It doesn't mean I had to quit. But I did. And I don't blame it entirely on imposter syndrome, but that is part of the root of the problems I had. No matter how pretty the leaves on a plant looks, if it has a rotting root, the leaves will wilt one day. That was how I was - I had the credentials to be a teacher, but I had an unstable foundation.

I'm not a "real" data scientist

Now that I am transitioning careers into data science, imposter syndrome has tagged along. It usually starts out small, but it gradually grows for me. I felt confident on the first week of my internship because I didn't know anything and also because as an intern, I felt that there were less expectations from me. The second week was when imposter syndrome crept it a little. Our company hosted a hackathon and I felt like I was the weakest link on the team. I felt like a fraud--that I didn't belong in a team of data scientists and data analysts because I couldn't do the things they were doing. Although at the end, I was able to help out a little, it felt "fake" since I just copy-and-pasted code from the Internet to run a simple algorithm.

The following weeks of my internship included me rejecting an offer to present to management the dashboard I have created because I felt that I didn't have the expertise to talk about the subject of the dashboard, even though I have been working with the data for about a month. When a new intern joined us, his skills in coding and graph databases (which I know nothing about) made me feel a little more inferior. But, I happened to come across a wholesome meme on the Internet recently which was a little reminder of how silly comparing myself to other people can be.

























Acceptance

Recognizing that I have imposter syndrome and had had it for a long time is a huge relief. I remember during my bootcamp that Vinny, one of my instructors, told us that we will feel imposter syndrome and that is normal. In a field as vast and ever-changing as data science, imposter syndrome is real for many people. My manager gave me some feedback recently and he said that I should take chances to present to management and not be afraid to make mistakes.

Now, comes the topic of how to stop feeling like an imposter. My manager is right. I should not be afraid to make mistakes. I should not be afraid to "look stupid". That is easier said than done, though. But, fear is at the core of imposter syndrome, I believe. I'm just still a fearful person, who worries a lot. I'm a scaredy cat. But, I am accepting that. Being aware that I have imposter syndrome is one thing. It will take courage to take actions even though I am afraid that what I bring to the table is "wrong". (Originally, this blog post was supposed to be my thoughts on how data engineering is under-rated, but I didn't finish writing that post because I felt that I didn't have the expertise to write about such a topic.)

I believe what would help with imposter syndrome are the little things done consistently, like going to work every day with a positive attitude and outlook, talking with other people about how you are feeling, writing about it, taking chances when given to you. Take a leap...and believe that you are stronger, smarter, and braver than you think. (Winnie the Pooh reference)


Friday, December 13, 2019

2019 Year In Reflection

As 2019 ends, I find myself being very excited for all the things I will be doing in 2020. But, first, it is important to reflect on the past year, express my gratitudes, and learn from my mistakes.

I knew the first thing of 2019 I will be doing is going to be the end (perhaps) of my career in teaching. I left the a teaching position in the Bronx and decided to study for an actuarial exam (FM) full time. But, I always had the feeling that an actuarial career path was a wrong puzzle piece in the picture of my life. I still tried hard to see if that puzzle piece would fit though. But, I tried other puzzle pieces too through networking with people from different walks of life and starting to dabble in some data projects. I connected with a data scientist friend and he showed me the work he was doing for his company at the time-- SQL queries, A/B testing, coding with Python, influencing decisions using insights gathered from data. Is this the future...(my future?) I thought.

When I passed FM in April, I made another decision. I will not continue to take actuarial exams. Instead, I will pursue this data science thing full-time. Was I fickle? Do I get swept away with interesting ideas and new things too easily? In retrospect, it seems like I was simply following my instincts and my instincts told me that data science could be right for me. But, instincts doesn't really give you all the technical skills you need to be a leader in data science. Although I had already studied some Python and done some data projects on my own, I was lacking in everything a modern day data scientist needed.

Therefore, I joined Metis, a 12 week data science bootcamp to skill up. Really, it was the start of a career change and a new chapter in my life where I met new friends and learned the basics and the latest of machine learning and data science. Although I say that Metis was the start of a career change, I believe my mindset has already been set in the beginning that I will change my career with success.

9 months into 2019, I have left my teaching job, passed an actuarial exam, and graduated from a data science bootcamp. The next three months was spent in coffee shops, yoga studios, cross-fit boxes, libraries, networking events, at home, but I had one main goal. I was going to find my first full time position as a data scientist...or analyst.

I can say that I have been scouted by a company and although it is for an internship, I am really happy that a door opened up for me in data science.

I know that 2019 has been really focused on my career change. My career change, although scary, has been the one thing that remained constant throughout the year.

As for all the people who made this year possible, I would have to thank my parents, sister, boyfriend, and all my friends. I'm grateful for the life I had lived.

Coming into the new year, I think there is one thing among many things that I would like to work on. It is being aware. I want to be more aware of myself, body and mind, and of the filters I put in the world. I believe being aware has been an unconscious goal of mind and I am really happy that I became aware of this goal of mine this time around and am able to articulate it with words.

Thank you 2019.

Saturday, September 7, 2019

Once Upon A Time- NLP on Disney Movie Scripts & Their Original Stories

Introduction


Practically everyone knows some Disney movies, but not everyone knows about their original stories. The movies' original stories come from many places. For example, The Little Mermaid comes from a short story by Christian Andersen and The Hunchback of Notre Dame comes from a gothic novel written by Victor Hugo. It would be interesting to see the parallels between the Disney movie scripts and their original writings, which is exactly what my project seeks to do. In addition to finding interesting relationships between main characters in the movie scripts and original stories, my project also culminates in a rudimentary flask app that is also a book/movie recommendation system.

Methodology


I first gathered 17 Disney movie scripts and their corresponding original stories. Then, I tokenized my data in two different ways-- on the word level and on the sentence level. I tokenized my data on the word level with TF-IDF, removed punctuation and used stop-words, lemmatization, and parts of speech tagging to only keep nouns and adjectives. I did this pre-processing in order to feed my data into an NMF model to do topic modeling and also used Word2Vec and PCA to find relationships between main characters in the original stories and in the movie scripts. I also tokenized my data on the sentence/line level, keeping punctuation in order to do sentiment analysis with Vader and applied dynamic time warping techniques to compare sentiment change over time in the books and movies. With my NMF model and sentiment analysis, I was able to come up with a rudimentary book/movie recommendation system. The user puts in a story/movie from the list of 34 stories/movies they liked, a sentiment similarity weight and a topic similarity weight and the system returns a book or a movie that matches the criteria. 

My methodology is summed up in this picture:
















Findings


Word2Vec & PCA

Using Word2Vec and reducing dimensions to 2 with PCA, I was able to visualize the spread of main characters in the stories and in the movies. In the original stories, there is a greater spread of main characters, with some characters closer to the words "love" and "happy" and some characters rather far away from them. However, in the movies, the main characters are bunched together and everyone seems to be relatively close to the words "love" and "happy". Does this mean that the Disney movies are just more similar to each other and sugar-coats the original stories? This idea makes sense since a lot of the original stories are more dark and does not always have happy endings. For example, Victor Hugo's novel of the hunchback of Notre Dame has a lot of the main characters dying, but the Disney version is a lot happier. 


















Sentiment Analysis 

Using Vader, I was able to assign a sentiment score to each line or sentence of the movie script or story. This created a time series of sentiment change over the plot of the movie or story. I then used dynamic time warping techniques to compare sentiment change over time to compare how similar the movie and their corresponding story 'feels'. My findings show that the most similar book and movie pair is Winnie the Pooh by A.A. Milne and  The Adventures of Winnie the Pooh. The Winnie the Pooh book/movie pair received a dynamic time warping distance of 7, the lowest of all the other pairs of books and movies.  The most dissimilar book and movie according to sentiment is The Hunchback of Notre Dame by Victor Hugo and The Hunchback of Notre Dame.  This book/movie pair received a dynamic time warping distance of 33. You can see the change in sentiment over time in the following graphs. (I used a rolling mean of 5% of the lines/sentences to smooth out the graphs.)
















Recommendation System

Finally, my recommendation system was created by doing topic modeling over the 34 total number of books and movies and dynamic time warping scores. NMF was able to create 34 topics that corresponded to the movie or a book very well. I also wanted to be able to create a recommendation based on how similar the user wanted the recommended book or movie to feel to their chosen book or movie and how similar in topic the user wanted the recommended book or movie to be to their chosen book or movie. To give users those options, I used cosine similarity to compare each book or movie against another book or movie using the topic weights vectors I received from NMF. I also used the dynamic time warping distances to compare how similar each book/movie was to another book/movie. 

Here is a demonstration of the flask app I built:



I will end with a quote: 
"Begin at the beginning," the King said, very gravely, "and go on till you come to the end: then stop."
-Lewis Caroll, Alice in Wonderland


Monday, May 27, 2019

Incoming Student at Metis

I am accelerating my learning of data science by joining Metis. Metis has an accredited 12 week intensive data science program where qualified candidates can gain the skills and connections needed to launch a career in data science. The summer cohort classes begin on July 1st.

I am excited to embark on this journey and plan to update regularly about my experiences during the program. But, I wanted to put into words my reason why I decided to pursue data science. After all, I came from a teaching background. I will start with the reason why I decided to pursue teaching first.

I first wanted to become a math teacher because I loved math and because I was fascinated by the learning process of a new language, specifically Spanish, when I studied abroad in Spain. I also liked children. But, I found out quickly that loving math and the learning process was not enough if I did not enjoy working with children, among the many frustrations of being a teacher. Maybe it was the age group--I worked with middle school and high school students but only had prior experience with kindergarteners. Teaching, for me, was 90% classroom management, which was just horrific. A fellow teacher told me that it took her 6 years to be comfortable in her own classroom and to feel that she had things together. That was fair, but I didn't think I could deal with six years of pain. So, I left teaching.

Why did I decide to go for data science? I actually did not decide to go for data science at first and instead picked up studying for the actuarial exams again. I have taken P (Probability), one of the preliminary exams, while in college. I thought being an actuary was the logical thing to do, given that I had the math skills, an exam taken, and heard good things about being an actuary. So, I studied for FM (Financial Mathematics), another preliminary exam, and passed it on the first try. But, while I was studying for FM, I also explored other career options. Data science was one of them.

It was my search to learn more about different careers that caused me to shadow a friend of mine, who is a data scientist. I learned more about what he does day to day and what it took to become a data scientist and thought it was a viable option for me. My interest in data science was still fairly new, but I felt there was a lot of potential in the field to answer important questions that can benefit humanity.

I talked to a Metis alumn and heard really great things about the program. Although I am still not 100% sure if data science is going to work out for me, I decided to join Metis to find out. Through the admissions process and pre-work, I found that learning to code in Python is difficult, but rewarding. I learned that finding a data science project was also hard, but interesting. I have also talked to more data scientists to learn more about what they are doing.

Overall, I am pumped to get learn more about data science and to start the bootcamp. I hope it works out and in the near future, I will be working in a data analyst or maybe even data scientist role.