DATA parsing with Python scripts.

Home » DATA parsing with Python scripts.

This script uses the pandas_datareader library

and can parse data from different sources. You can read more on the official pandas_datareader website.

We will take a detailed look at parsing shares from the New York Stock Exchange (NYSE) through Yahoo Finance.

You can find contract specifications, stock descriptions, tickers, and data in the corresponding article: NYSE: New-York Stock Exchange.

Libraries:

datetime – For date.
pandas_datareader Connection with Yahoo Finance.
schedule – For automation of scrips.

pip install pandas-datareader
pip install schedule

# Necessary imports:

import datetime as dt 
from pandas_datareader import data as pdr
import schedule

Tickers:

If you need several tickers.
An array of data, a list of stock tickers.


stocks = ['GAPL', 'GIPL']

All tickers saved in file.
file_name.txt – must contain an array of data in the form [‘GAPL’, ‘GIPL’]

with open(r'C:\folder_name\file_name.txt') as file:
  stocks = eval(file.read())

File path:

path – Folder.
file_name – File name.

path = (r'C:\folder_name\\')
file_name = 'file_name.csv'

Period:

Fixed period of time.

start = dt.datetime(2022, 1, 18)
end = dt.datetime(2022, 12, 22)

Floating data.
The starting day is set to the minus side from the ending day.
In the example, for yesterday.

end = dt.datetime.now()
start = end - dt.timedelta(days=1)

Data:

Create one file for all data.
interval=’d’ – download quotes for the daily interval.
.stack(“Symbols”) – add the name of ticker in Symbols column to the received file.
df.to_csv(path + file_name) – Save the file to the specified directory with the specified file name.

for item in stocks: 
  df = pdr.get_data_yahoo(symbols = stocks, start = start, end=end, interval='d').stack("Symbols")
  df.to_csv(path + file_name)
  print('Done')
  break

This allows you to save each ticker in a separate file.
Important! File name will be as ticker name!

for item in stocks: 
    df = pdr.get_data_yahoo(item, start = start, end=end, interval='d')
    stocks.to_csv(path + item + '.csv')

Automation function:

schedule.every(40).seconds.do(NYSE) – Script will be activated every 40 sec.
schedule.every().day.at(’01:00′).do(NYSE) – Script will execute function NYSE each day at 1 Am

def NYSE():

    # SCRIPT

def main():
    
    # schedule.every(40).seconds.do(NYSE)
    schedule.every().day.at('01:00').do(NYSE)

    while True:
        schedule.run_pending()

if __name__ == '__main__':
    main()

Example:

Our script which collect data for NYSE: New-York Stock Exchange.

Every night at 3:10 Am it downloads data from the previous day and saves it in one file.
Then at 7:01 Am its runs again and download missing data from first time.
You can read more detailed regarding Automation in article “”soon”

from pandas_datareader import data as pdr
import datetime as dt
import schedule


def time():
    end = dt.datetime.now()
    print('Time test', end)


def nyse():
    with open(r'C:\Pdata\NYSE\OUT\nyse_ticker.txt') as file:
        stocks = eval(file.read())
    directory = (r'C:\Pdata\NYSE\IN\\')
    file_name = 'nyse.csv'
    end = dt.datetime.now()
    start = dt.datetime(2022, 11, 4)
    for item in stocks:
        df = pdr.get_data_yahoo(symbols=stocks, start=start).stack("Symbols")
        df.to_csv(directory + file_name)
        print('NYSE DONE', end)
        break

def nyse_error():
    with open(r'C:\Pdata\NYSE\OUT\error.csv') as file:
        stocks = eval(file.read())
    directory = (r'C:\Pdata\NYSE\IN\\')
    file_name = 'nyse_error.csv'
    end = dt.datetime.now()
    start = dt.datetime(2022, 11, 4)
    for item in stocks:
        df = pdr.get_data_yahoo(symbols=stocks, start=start).stack("Symbols")
        df.to_csv(directory + file_name)
        print('NYSE error DONE', end)
        break


def main():
    schedule.every(1).hour.do(time)
    schedule.every().day.at('03:10').do(nyse)
    schedule.every().day.at('07:01').do(nyse_error)

    while True:
        schedule.run_pending()


if __name__ == '__main__':
    main()

Finally, I would like to remind you that this is just an example of how a Python parser can be implemented. By changing the parameters of the script you can customize it to whatever information you need.
Don’t forget that this is raw data and you need to convert it into a database.

Done!

Leave a Reply

Your email address will not be published. Required fields are marked *