Introduction:
Steps:
Step 1: Import Statements.
This code imports the required Python modules that will be used for web scraping, data handling, and visualization. The modules include requests
for making HTTP requests to websites, BeautifulSoup
for parsing HTML and XML documents, csv
for reading and writing CSV files, datetime
for handling dates and times, tqdm
for displaying progress bars, os
for handling file paths and directories, and time
for adding pauses between web requests.
import requests
from bs4 import BeautifulSoup
import csv
from datetime import datetime, timedelta
from tqdm import tqdm
import os
import time
Step 2: Function Definitions.
The script defines three functions: date_to_unix_timestamp
, scrape_yahoo_finance_data
, and save_data_to_csv
.
date_to_unix_timestamp
takes in a date string in the format “YYYY-MM-DD” and converts it to a Unix timestamp.
scrape_yahoo_finance_data
takes in a stock ticker, start date, and end date and scrapes historical stock data from Yahoo Finance. It returns a list of lists containing the scraped data.
save_data_to_csv
takes in data and a file path and saves the data to a CSV file.
def date_to_unix_timestamp(date_str):
date = datetime.strptime(date_str, "%Y-%m-%d")
return int(date.timestamp())
def scrape_yahoo_finance_data(ticker, start_date, end_date):
...
return data
def save_data_to_csv(data, filepath):
...
Step 3: Main Function.
The main function first defines variables for the file paths, start date, end date, and folder path. It then reads in a list of tickers from a text file and loops through each ticker.
For each ticker, it calls the scrape_yahoo_finance_data
function to scrape data, saves the data to a CSV file using save_data_to_csv
, and then waits for 2 seconds using time.sleep(2)
before moving on to the next ticker.
If the ticker is the last one in the list, the loop is exited using a break
statement.
if __name__ == "__main__":
ticker_file = r"/root/Server/data/OUT/nyse_tickers.txt"
start_date = (datetime.now() - timedelta(days=5)).strftime("%Y-%m-%d")
end_date = datetime.now().strftime("%Y-%m-%d")
folder_path = r"/var/lib/postgresql/data/nyse/"
with open(ticker_file, "r") as f:
tickers = [line.strip() for line in f]
for ticker in tqdm(tickers):
data = scrape_yahoo_finance_data(ticker, start_date, end_date)
if data is not None:
filepath = os.path.join(folder_path, f"{ticker}.csv")
save_data_to_csv(data, filepath)
time.sleep(2)
if ticker == tickers[-1]:
break
Step 4: End Statement.
The script ends with a raise SystemExit
statement, which stops the execution of the script.
raise SystemExit
Leave a Reply