项目作者: Santhin

项目描述 :
Real estate crawler with ML on scraped data
高级语言: Jupyter Notebook
项目地址: git://github.com/Santhin/real-estate.git
创建时间: 2021-05-01T09:58:09Z
项目社区:https://github.com/Santhin/real-estate

开源协议:MIT License

下载



" class="reference-link">🧐 About

Project was created for “SKNS Warsztaty z Pythona”.

Consists crawler for scraping real estate data from gumtree and jupyter notebook with ML.

" class="reference-link">🏁 Getting Started

To clone repository type:

  1. git clone https://github.com/Santhin/real-estate

To run crawler locally:

  1. pip install -r requirements
  2. python app.py

Project structure

  1. .
  2. ├── crawler
  3. ├── app.py
  4. ├── aps_asyncio.py
  5. ├── gumtree
  6. ├── __init__.py
  7. ├── items.py
  8. ├── middlewares.py
  9. ├── pipelines.py
  10. ├── settings.py
  11. └── spiders
  12. ├── gumtree_crawler.py
  13. ├── __init__.py
  14. └── stack.py
  15. ├── install_asyncio.py
  16. ├── Procfile
  17. ├── requirements.txt
  18. └── scrapy.cfg
  19. ├── LICENSE
  20. ├── ml
  21. ├── features
  22. ├── rankingcen.xlsx
  23. ├── Ranking\ Dzielnic\ 2020\ Warszawa.pdf
  24. ├── ranking_dzielnic_warszawy_pod_wzgledem_atrakcyjnosci_warunkow_zycia_2017.pdf
  25. ├── ranking_otodom.csv
  26. ├── ranking.txt
  27. └── ranking.xlsx
  28. ├── notebooks
  29. ├── ML\ endgame\ floydhub.ipynb
  30. ├── ML\ endgame.ipynb
  31. ├── NLP\ eda\ etc.ipynb
  32. ├── Pipeline\ mongoRaw\ to\ clean\ before\ EDA.ipynb
  33. └── real\ EDA.ipynb
  34. └── pictures
  35. ├── images.png
  36. ├── ml_map.png
  37. ├── simple-house-exterior-white-background_1308-50195.jpg
  38. ├── unnamed.jpg
  39. └── white-house-background-check-democratic-party-republican-party-house-png.jpg
  40. └── README.md
  41. 6 directories, 32 files

" class="reference-link">🚀 Deployment

The crawler was deployed on Heroku and in 15min intervals was activated with advanced python scheduler.

" class="reference-link">⛏️ Built Using

🛠️ Todo

  • add requirements.txt to ML folder