New data.
Adding new data to nyse.nyse_data_new. This script is created in the PLpgSQL DB section.
select * from nyse.load_csv_files();
Add new data from nyse.nyse_data_new to nyse.nyse_data table. Copy only the new data for a certain period.
WITH rounded_values AS (
SELECT
date, symbols,
ROUND(close, 2) AS rounded_close,
ROUND(High, 2) AS rounded_high,
ROUND(Low, 2) AS rounded_low,
ROUND(Open, 2) AS rounded_open
FROM nyse.nyse_data_new_y
WHERE date >= '2023-01-20'
)
-- Insert only the new data into the table
INSERT INTO nyse.nyse_data (date, symbols, Open, High, Low, Close)
SELECT
rv.date, rv.symbols,
rv.rounded_close, rv.rounded_high, rv.rounded_low, rv.rounded_open
FROM rounded_values rv
WHERE NOT EXISTS (
SELECT 1
FROM nyse.nyse_data nd
WHERE nd.date = rv.date AND nd.symbols = rv.symbols
);
Missing tickers check.
This script compares all the data and searches for missing tickets. If any are found, it creates a file called nyse_error_tickers_sql.csv for future checking.
Copy missing ticker symbols to nyse_error_tickers_sql.csv file
COPY (
SELECT ticker
FROM nyse.nyse_ticker_list
WHERE ticker NOT IN (
SELECT symbols
FROM nyse.nyse_data
WHERE date = CURRENT_DATE-1
GROUP BY symbols
)
) TO '/var/lib/postgresql/nyse_error_tickers_sql.csv';
After creating the file nyse_error_tickers_sql.csv, the Python script named BeautifulSoup.py OR Pandas.py is executed again, but this time it only scrapes the missing tickers.
For a detailed explanation of how the entire process works, please refer to the article mentioned.
Leave a Reply