Scalable Data Pipeline for Stock Market Analysis with Reddit API and Yahoo Finance.
Project Presentation
Web Application
Test Cases
Docker image
Final_Project/
├── architecture_diagram/
│ ├── Final_Project_architecture_diagram.png
│ └── Project_Proposal.png
├── aws_config/
│ ├── API_Gateway_role_with_policies.png
│ ├── Cognito_auth_role_with_policies.png
│ ├── Cognito_unauth_role_with_policies.png
│ ├── Lambda_role_with_policies.png
│ └── Lex_role_with_policies.png
├── config/
│ └── config.ini
├── lambda_batch_process/
│ ├── config.yaml
│ ├── pushing-files-s3.py
│ └── requirements.txt
├── lambda_files_for_dashboard/
│ ├── config.yaml
│ ├── requirements.txt
│ └── s3-trigger.py
├── lambda_reddit_news/
│ ├── config.yaml
│ ├── lambda_function.py
│ └── requirements.txt
├── lambda_reddit_sentiment/
│ ├── config.yaml
│ ├── lambda_function.py
│ └── requirements.txt
├── lambda_reddit_stock/
│ ├── config.yaml
│ ├── lambda_function.py
│ └── requirements.txt
├── lambda_stock_current/
│ ├── config.yaml
│ ├── lambda_function.py
│ └── requirements.txt
├── lambda_stock_prediction/
│ ├── config.yaml
│ ├── lambda_function.py
│ └── requirements.txt
├── LSTM_model/
│ ├── Dockerfile
│ ├── main.py
│ └── requirements.txt
├── README.md
└── Static-Website-Deploy/
├── about.html
├── architecture.html
├── contact.html
├── css/
│ ├── clean-blog.css
│ └── clean-blog.min.css
├── dashboard.html
├── home.html
├── img/
│ ├── img.jpeg
│ └── project.png
├── index.html
├── register.html
├── script/
│ ├── amazon-cognito-identity.min.js
│ ├── amazon-cognito-identity.min.js.map
│ ├── aws-cognito-sdk.min.js
│ ├── aws-cognito-sdk.min.js.map
│ ├── aws-sdk-2.487.0.min.js
│ ├── jquery.min.js
│ └── jquery.min.map
└── vendor/
├── bootstrap/
├── fontawesome-free/
└── jquery/
Scalable Data Pipeline for collecting stock data from Reddit API and Yahoo Finance,
generating text entities(stock names) & sentiment analysis,
current stock prices, historical stock performance trend & stock forecast
and deploying them on the cloud to run completely on a Serverless Infrastructure on-demand.
Facilitated this service by creating interactive chatbot for user interaction using Amazon Lex and integrated into static website hosted on S3.
Stock forecast is done using LSTM
model.
The pipeline requires an Amazon Web Services account to deploy and run. Signup for an AWS Account here. The pipeline uses the folllowing AWS Services:
Create new roles on the AWS IAM Console by taking reference from images found at aws_config/
on this repository to allow access to all required AWS Services.
Clone this repo to your local machine using https://github.com/goyal07nidhi/Team6_CSYE7245_Spring2021.git
config.ini
SetupAll scripts make use of the configparser
Python library to easily pass configuration data to running scripts/deployed packages. This allows for easy replication of code with zero modifications to Python scripts. Find configuration file can be found in config/config.ini
directory on this repository. Modify the file with your environment variables and place it on your S3 bucket under the config
directory like so YourS3BucketName/config/config.ini
; All packages and scripts are designed to read the configuration values from this path.
[aws]
ACCESS_KEY = <Enter your access key>
SECRET_KEY = <Enter your secret key>
BUCKET = <Enter your bucket name >
REGION_NAME = <Enter your region name >
[reddit]
CLIENT_ID = <Enter your reddit api client id >
CLIENT_SECRET_ID = <Enter your reddit api client secret id >
USERNAME = <Enter your reddit api username>
PASSWORD = <Enter your reddit api password>
The pipeline uses AWS Lambda Functions for Serverless Computing. All directories on this repo marked with the prefix lambda_
are Lambda functions that have to be deployed on AWS. All functions follow a common deployment process.
Python Lambda is toolkit to easily package and deploy serverless code to AWS Lambda. Packaging is requried since AWS Lambda functions only ship with basic Python libraries and do not contain external libraries. Any external libraries to be used will be have to be packaged into a .zip
and deployed to AWS Lambda. More information about Python Lambda can be found here
config.yaml
All lambda_
directories contain a config.yaml
file with the configuration information required to deploy the Lambda package to AWS. Configure the file with your access keys, secret access keys and function name before packaging and deploying the Python code. An example is as follows
region: us-east-1
function_name: lambda_function
handler: service.handler
description: Deployed lambda Function
runtime: python3.7
role: <Enter the role name created earlier>
# if access key and secret are left blank, boto will use the credentials
# defined in the [default] section of ~/.aws/credentials.
aws_access_key_id: <Enter your Access Keys>
aws_secret_access_key: <Enter your Secret Access Keys>
timeout: 15
memory_size: 1024
environment_variables:
ip_bucket: <enter_your_S3_Bucket>
s3_key: <s3_file_key>
# Build options
build:
source_directories: lib
Create a Virtual Environment and install all Python dependencies mentioned in requirements.txt.
Create a new Lambda function on your AWS Lambda Console.
For deploying lambda, zip the contents of the directory and upload the file to the Lambda console
and make the necessary changes like lambda handler name & function, the environment variables from the lambda console:
Key: Value
ip_bucket:
s3_key:
Trigger the lambda function by cloud watch scheduler for lambda_batch_process and lambda_files_for_dashboard functions.
Although - you can do this with python-lambda as well, but we are using the above method.
pool-id
Add on app client
App client name
and uncheck the Generate client secret
. Note the App client idIdentity Pool name
and expand Authenication providers
select cognito
tab and provide the User Pool ID
and App client id
generated in step 1Your Cognito identities require access to your resources
pageIdentity pool ID
AmazonS3FullAccess
policy to the IAM Roles created in Step 2Create
and provide the necessary information like IAM service for LexBots , and then choose Create
.We also implemented error handling for the chatbox.
Clarification prompts: If the chatbox couldn’t understand the user’s utterances, it will throw an error.
Error message: Sorry, can you please repeat that?
Hang-up phrase: If the chatbox couldn’t find the correct utterances by the user, it will throw an error.
Error message: Sorry, I could not understand. Goodbye.
After building the bot, we created alias name for bot and publish the bot. We are integrating this bot into our static website for interactive communication.
The pipeline requires six lambdas to run on AWS to facilitate value in Amazon Lex.
Set Permissions/Block all public access
section and unselect the first 2 permissionsProperties
section and enable Static website hosting
. Enter Index document as index.html
Permissions
tab, go to CORS configuration
and paste the below code
[
{
"AllowedHeaders": [
"*"
],
"AllowedMethods": [
"PUT",
"POST",
"DELETE",
"GET"
],
"AllowedOrigins": [
"*"
],
"ExposeHeaders": []
}
]
Properties/Static website hosting
Time-series forecasting models are the models that are capable to predict future values based on previously observed values. Time-series forecasting is widely used for non-stationary data. Non-stationary data are called the data whose statistical properties e.g. the mean and standard deviation are not constant over time but instead, these metrics vary over time.
Long short-term memory (LSTM) is an artificial recurrent neural network (RNN) architecture used in the field of deep learning. LSTM models are able to store information over a period of time. This characteristic is extremely useful when we deal with Time-Series or Sequential Data. When using an LSTM model we are free and able to decide what information will be stored and what discarded.
We are taking 5 years of data for multiple companies and applying the LSTM model on data and predicting the Stock price for the next 90 days. The results are stored in S3
In this directory, we have Dockerfile
under the LSTM_model directory, a blueprint for our development environment, and requirements.txt
that lists the python dependencies.
For using the model, follow these steps:
git clone
this repocd Final_Project/LSTM_model/
docker build -t nidhi2019/stock-analysis-lstm-model:latest .
docker pull nidhi2019/stock-analysis-lstm-model
docker run -it --rm -p 5000:5000 nidhi2019/stock-analysis-lstm-model
All Test Cases have been documented here
Chatbot can be accessed using this web application: Web Application
Team Members
Nidhi Goyal
Kanika Damodarsingh Negi
Rishvita Reddy Bhumireddy
https://docs.aws.amazon.com/lexv2/latest/dg/lambda.html
@sumindaniro/user-authentication-and-authorization-with-aws-cognito-d204492dd1d0"">https://medium.com/@sumindaniro/user-authentication-and-authorization-with-aws-cognito-d204492dd1d0
https://docs.aws.amazon.com/lex/latest/dg/lambda-input-response-format.html
https://docs.aws.amazon.com/comprehend/latest/dg/functionality.html
https://github.com/nficano/python-lambda