The History
I have kept a diary for 18 years. Although I don't write in my diary every day and some entries are missing, I recognized my diary collection as a treasure trove of my memories and events that I experienced in my life. It was also a great source of data for doing an NLP project.
I thought, what if I can program a computer to write like me? That was the start of my diary generator app.
A Moment
Here is an excerpt from a "page" of my generated diary:
"I am able to listen to my parents talk about things and my own needs. Honestly, I really want to be able to listen to them, but I don't listen very much. I think I really need to learn how to speak to my own needs. I think I know what I really want to know, but I haven't really communicated it to my parents very well. My dad doesn't like to talk to me so much, but he does talk to me occasionally. I think I am getting better at communicating with dad because I am reflecting back what I say to him. Honestly, I think I need to do more of the following: 1. Be more present. Be passionate about something. Be passionate about math. Talk to more people. Try to keep up with habits. 2. Respect my parents. Be authentic to people. 3. Give accurate and detailed descriptions of what you are doing. 4. Be up to speed with what you are doing. Learning is not just about finding what you are doing. It is also about being noticed."
I can say that there is an essence of me in this generated text. I truly want to be able to communicate with my parents better and I vaguely remember writing about something on that topic. And the numbered list of things "I think I need to do" is mostly true. The being "more present" part cannot be more true. But, aside from the meanings of the generated text, the writing style does feel like a diary entry of mine.
The Details
The heart of this text-generator is a GPT-2 model. I used Max Woolf's gpt-2-simple (https://github.com/minimaxir/gpt-2-simple) to fine-tune a GPT-2 model on my diary dataset.
According to Wikipedia, GPT-2 is a text generating model developed by OpenAI and was trained on a corpus, called WebText, which has over 8 million documents of text from Reddit. It is great at generating fake news, and was thought to pose a significant threat by the academic community. (This project was just for fun and I don't intend to threaten anybody with my diary entries.)
I was able to find and type up 393 total diary entries, varying in different lengths and concatenate them in one .txt file. The GPT-2 model can be fine-tuned with a custom corpus that is in a .txt file with each sentence as a new line as input. I chose the "small" 124M parameter model to fine-tune my diary entries on.
After fine-tuning the model for about 45 minutes, I was able to start generating text with it. Although my app only has one setting, which is the default setting of length and temperature, it is possible to tweak it so that it generates more or less text with more or less "creativity" ("letting the model pick suboptimal predictions"). You can even enter a prefix to tell the generator to start from a certain letter, word, or phrase. But, my app only has one button to run the model and generate text.
The fine-tuning was run in a Google Colab notebook, specifically with this notebook - https://colab.research.google.com/drive/1VLG8e7YSEwypxU-noRNhsv5dW4NfTGce
After I downloaded the fine-tuned model on my computer, I created a flask app to run it. I then used Docker and Google Cloud to deploy my app.
This was my first time using Docker and Google Cloud.
You can access my diary generator app here - https://diary-khmwtmm5lq-uc.a.run.app/
And the code for it here - https://github.com/morningkaren/diary-generator
Stay tuned for Diary Project Part 2, where I delve into the topic modeling portion of my project.
Showing posts with label flask. Show all posts
Showing posts with label flask. Show all posts
Thursday, May 7, 2020
Sunday, January 19, 2020
Discover a Houseplant App
Did you know that January 10th was National Houseplant Day? It was a great coincidence that around that time, I was working on an app that is supposed to recommend similar houseplants to people so that they can discover different plants they might like.
Motivation
I got the inspiration to create an app revolving around houseplants because my mom, cousin, and best friend are plant lovers. Their houses are basically mini-forests. I scraped all my data off of houseplant411.com, created a crude recommendation system and dressed it up in a flask app which was lastly deployed on Heroku.
Data
The data I scraped included images of the 136 different popular house plants, their descriptions, and other information related to their care such as light, water, fertilizer, temperature, humidity, flowering, pests, diseases, soil, pot size, pruning, propagation, special occasion, and poisonous plant information. I used beautiful soup to do all of my web-scraping.
The Recommendation System
At the core of my app is a recommendation system made possible by using cosine similarity on vectors created using Tfidfvectorizer. The vectors came from the text descriptions of the plants and care information related to the plants. I compared the vectors that came from just the text descriptions using cosine similarity to create a cosine similarity matrix. I also compared the vectors that came from care information using cosine similarity to create another matrix. I then found the average of the two matrices to create my last matrix which I used to recommend 5 different, yet similar plants to a plant of the user's choice.
The App
The application can be found here - https://discover-a-houseplant.herokuapp.com/
The idea of the app is that three random plants are generated from my list of 136 plants. A user chooses one of the plants he or she likes and then they are brought to a new page which has 5 similar, yet different plants to the plant he or she chose. There is a photo, a name and a description of each plant on each page of the application.
Results
You can decide if you like the recommendations put forth by this app or not! But, I think this is a fun way to discover new plants.
Code
The code for this project can be found on my Github at https://github.com/morningkaren/discover-a-houseplant .
Motivation
I got the inspiration to create an app revolving around houseplants because my mom, cousin, and best friend are plant lovers. Their houses are basically mini-forests. I scraped all my data off of houseplant411.com, created a crude recommendation system and dressed it up in a flask app which was lastly deployed on Heroku.
Data
The data I scraped included images of the 136 different popular house plants, their descriptions, and other information related to their care such as light, water, fertilizer, temperature, humidity, flowering, pests, diseases, soil, pot size, pruning, propagation, special occasion, and poisonous plant information. I used beautiful soup to do all of my web-scraping.
The Recommendation System
At the core of my app is a recommendation system made possible by using cosine similarity on vectors created using Tfidfvectorizer. The vectors came from the text descriptions of the plants and care information related to the plants. I compared the vectors that came from just the text descriptions using cosine similarity to create a cosine similarity matrix. I also compared the vectors that came from care information using cosine similarity to create another matrix. I then found the average of the two matrices to create my last matrix which I used to recommend 5 different, yet similar plants to a plant of the user's choice.
The App
The application can be found here - https://discover-a-houseplant.herokuapp.com/
The idea of the app is that three random plants are generated from my list of 136 plants. A user chooses one of the plants he or she likes and then they are brought to a new page which has 5 similar, yet different plants to the plant he or she chose. There is a photo, a name and a description of each plant on each page of the application.
Results
You can decide if you like the recommendations put forth by this app or not! But, I think this is a fun way to discover new plants.
Code
The code for this project can be found on my Github at https://github.com/morningkaren/discover-a-houseplant .
Subscribe to:
Posts (Atom)