As I recently got interested in data science and machine learning, I wanted to create some real world datasets that are relevant to myself.
After I noticed how often meat cut into stripes / ragout (“Geschnetzeltes”) was on the menue I thought about quickly creating a python script logging the online menue of the Mensa (canteen of my university) every week so I could later analyse the data.
After about one or two evenings of inspecting the website with the menue and translating it into bs4 parse code I was already able to parse the data in easily usable data and save it into a MongoDB.
How it works
- Reads in your config file to know the urls of the plans to scrape, database location and more
- Gets the plan(s) from the web
- Parses the html and fits the data into a usable format
- Saves the dishes with their corresponding attributes like prices, date and ingredients as documents in a database collection with the name of the mensa serving them
- Posts a Slack message into a specific channel to notify that it has run
You can find the code of the project here
- Maybe I will implement proper error managment with a service like Datadog and use Slack for notifying me if an error occurs instead of only on success
- Another plan is to deploy this and my other datamining scripts to AWS Lambda and utilize planned execution there instead of running them with cron on a RaspberryPi to have optimal reliability and no cost with such low execution time
The whole idea of collecting all this information is to later help me use real world data in some data science experiments and maybe try some machine learning models with this data.
There probably won’t be any real use of collecting this data but it’s fun to use technology to find out some more about the real world and maybe even use it to predict/enhance some things via e.g. machine learning.
And even if it turns out to not be useful at all it only took some hours to write the script and I had fun doing it and learned some things about web scraping, Slack bots and more.