,

Market Data with Python and Beautiful Soup.

Introduction:

Steps:

Step 1: Import Statements.

This code imports the required Python modules that will be used for web scraping, data handling, and visualization. The modules include requests for making HTTP requests to websites, BeautifulSoup for parsing HTML and XML documents, csv for reading and writing CSV files, datetime for handling dates and times, tqdm for displaying progress bars, os for handling file paths and directories, and time for adding pauses between web requests.

import requests
from bs4 import BeautifulSoup
import csv
from datetime import datetime, timedelta
from tqdm import tqdm
import os
import time

Step 2: Function Definitions.

The script defines three functions: date_to_unix_timestamp, scrape_yahoo_finance_data, and save_data_to_csv.

date_to_unix_timestamp takes in a date string in the format “YYYY-MM-DD” and converts it to a Unix timestamp.

scrape_yahoo_finance_data takes in a stock ticker, start date, and end date and scrapes historical stock data from Yahoo Finance. It returns a list of lists containing the scraped data.

save_data_to_csv takes in data and a file path and saves the data to a CSV file.

def date_to_unix_timestamp(date_str):
    date = datetime.strptime(date_str, "%Y-%m-%d")
    return int(date.timestamp())

def scrape_yahoo_finance_data(ticker, start_date, end_date):
    ...
    return data

def save_data_to_csv(data, filepath):
    ...

Step 3: Main Function.

The main function first defines variables for the file paths, start date, end date, and folder path. It then reads in a list of tickers from a text file and loops through each ticker.

For each ticker, it calls the scrape_yahoo_finance_data function to scrape data, saves the data to a CSV file using save_data_to_csv, and then waits for 2 seconds using time.sleep(2) before moving on to the next ticker.

If the ticker is the last one in the list, the loop is exited using a break statement.

if __name__ == "__main__":
    ticker_file = r"/root/Server/data/OUT/nyse_tickers.txt"
    start_date = (datetime.now() - timedelta(days=5)).strftime("%Y-%m-%d")
    end_date = datetime.now().strftime("%Y-%m-%d")
    folder_path = r"/var/lib/postgresql/data/nyse/"

    with open(ticker_file, "r") as f:
        tickers = [line.strip() for line in f]

    for ticker in tqdm(tickers):
        data = scrape_yahoo_finance_data(ticker, start_date, end_date)
        if data is not None:
            filepath = os.path.join(folder_path, f"{ticker}.csv")
            save_data_to_csv(data, filepath)
        time.sleep(2)

        if ticker == tickers[-1]:
            break

Step 4: End Statement.

The script ends with a raise SystemExit statement, which stops the execution of the script.

raise SystemExit
Kseno avatar

Leave a Reply

Your email address will not be published. Required fields are marked *