Server set up – SQL DB part.

To set up and create the necessary tables and functions for operating the PostgreSQL database, follow these steps. Make sure to modify the file paths according to your setup. To begin make shore that you have install PostgreSql data base on you Linux machine.

Here is Gdrive link to download necessary data if needed: Google Drive

Scripts

1 – Create the database and schema:

To create DB execute following script:

CREATE DATABASE name;

To create schema:

  • Right-click on “schemas” and select “Create Schema.”
  • In the description below, we’ll use the schema name “nyse.”

2 – Create the required tables:

Table – nyse.nyse_data:

This is the main table for storing all the data. Quotes are rounded to three decimal places, and the “adj_close” and “volume” columns are not included.

CREATE TABLE nyse.nyse_data
(
    date date,
    symbols varchar(5),
    open numeric(8,3),
    high numeric(8,3),
    low numeric(8,3),
    close numeric(8,3)
);

If you need all NYSE data since May 9, 1983, download it from the provided GDrive link and use the following script to add it to the table:

COPY nyse.nyse_data FROM 'C:\folder_name\file_name.csv' DELIMITER ';' CSV HEADER;

Remember to update the file path accordingly. After importing the data, you can delete the CSV file. The data file is updated monthly.

Table – nyse.nyse_data_new:

This table stores the new data in its original format.

CREATE TABLE nyse.nyse_data_new (
       ticker varchar(5),
       date date,
       open numeric(21,15),
       high numeric(21,15),
       low numeric(21,15),
       close numeric(21,15),
       adj_close numeric(21,15),
       vol varchar(10)
);

Table – nyse.ticker_list:

This table contains the symbols list and their descriptions. Download the file from the provided GDrive link.

CREATE TABLE nyse.ticker_list (
       ticker varchar(5) NOT NULL,
       name varchar(100)
);

Table – nyse.file_paths:

This table stores the path addresses for each ticker separately.
It allows you to manipulate file paths without worrying about data loss.

CREATE TABLE nyse.file_paths (
       id serial primary key,
       path text not null
);

Example of adding file paths in table:

INSERT INTO nyse.file_paths (path) VALUES ('/var/lib/postgresql/data/nyse/AAPL.csv');
INSERT INTO nyse.file_paths (path) VALUES ('/var/lib/postgresql/data/nyse/GOOG.csv');

Modify the file paths and add the necessary ticker file names. You can delete or update this table as needed. The ID field has auto-increment.

Table – error_log:

This table tracks all the manipulations between the new and main tables and includes tickers that don’t have data.

CREATE TABLE nyse.error_log (
       date timestamp DEFAULT CURRENT_DATE,
       error_path text NOT NULL,
       error_message text
);

Functions:

This function adds information from file path to nyse.nyse_data_new table, in the same time its ads some kind of audit log in to nyse.error_log

CREATE OR REPLACE FUNCTION nyse.load_csv_files()
RETURNS text AS $$
DECLARE
  file_path record;
  file_path_string text;
BEGIN
  FOR file_path IN SELECT path FROM nyse.file_paths LOOP
    BEGIN
      file_path_string := file_path.path;
      -- Creating a file path link, depending on the file name.
      EXECUTE 'COPY nyse.nyse_data_new FROM ''' || file_path_string

 || ''' DELIMITER '','' CSV HEADER';
    EXCEPTION -- Error log data
      WHEN OTHERS THEN
        INSERT INTO nyse.error_log (error_message, error_path) 
        VALUES (SQLERRM, file_path_string);
    END;
  END LOOP;

  RETURN 'Good';
END;
$$ LANGUAGE plpgsql;

That’s all you need to prepare in the PLpgSQL database before starting the automation process in Python.

Kseno avatar

Leave a Reply

Your email address will not be published. Required fields are marked *