Server set up – SQL requests.

New data.

Adding new data to nyse.nyse_data_new. This script is created in the PLpgSQL DB section.

select * from nyse.load_csv_files();

Add new data from nyse.nyse_data_new to nyse.nyse_data table. Copy only the new data for a certain period.

WITH rounded_values AS (
  SELECT 
    date, symbols, 
    ROUND(close, 2) AS rounded_close, 
    ROUND(High, 2) AS rounded_high, 
    ROUND(Low, 2) AS rounded_low, 
    ROUND(Open, 2) AS rounded_open
  FROM nyse.nyse_data_new_y
  WHERE date >= '2023-01-20'
)

-- Insert only the new data into the table
INSERT INTO nyse.nyse_data (date, symbols, Open, High, Low, Close)
SELECT 
  rv.date, rv.symbols, 
  rv.rounded_close, rv.rounded_high, rv.rounded_low, rv.rounded_open
FROM rounded_values rv
WHERE NOT EXISTS (
  SELECT 1
  FROM nyse.nyse_data nd
  WHERE nd.date = rv.date AND nd.symbols = rv.symbols
);

Missing tickers check.

This script compares all the data and searches for missing tickets. If any are found, it creates a file called nyse_error_tickers_sql.csv for future checking.

Copy missing ticker symbols to nyse_error_tickers_sql.csv file

COPY (
  SELECT ticker 
  FROM nyse.nyse_ticker_list 
  WHERE ticker NOT IN (
    SELECT symbols 
    FROM nyse.nyse_data 
    WHERE date = CURRENT_DATE-1 
    GROUP BY symbols
  )
) TO '/var/lib/postgresql/nyse_error_tickers_sql.csv';

After creating the file nyse_error_tickers_sql.csv, the Python script named BeautifulSoup.py OR Pandas.py is executed again, but this time it only scrapes the missing tickers.

For a detailed explanation of how the entire process works, please refer to the article mentioned.

Kseno avatar

Leave a Reply

Your email address will not be published. Required fields are marked *