DB – automation of the process.

Database automation with PgAgent.

If you have never worked with this application for PgAdmin, check out its features at this link: PgAgent documentation.

Lets start.

We write a script to add basic data to our main table. This script is written in the standard input window of PgAdmin.

copy nyse_data FROM 'C:\file_path\nyse_data.csv' 
DELIMITER ',' CSV HEADER;
copy nyse_ticker_list FROM 'C:\file_path\nyse_ticker.txt' 
DELIMITER ',' CSV HEADER;

Check our data. At the time of writing, the number of shares must be greater than 2700 and the number of attributes must be at least 11 million.

You can delete file nyse_data.csv from your hard disk after inserting!

select count(*) from nyse_data;
select * from nyse_ticker_list;

We have an empty table nyse_data_new to store new Yahoo Finance incoming data. This table will store all new quotes for the selected instruments. This is done to avoid duplicates in the main table. Since we use a daily interval, our table will store yesterday’s data.

We automate the addition of new data from the /IN python in folder to the temporary table nyse_data_new. To do this, we create a script inside PgAgent according to a certain script.

Suppose, the Python script parses new information at 3:11 am server time, let it take exactly 1 hour to parse new information and save it to a file. That means our script in PgAgent should run at about 5:40.

copy nyse_data_new FROM 'C:\file_path\nyse_data_new.csv' 
DELIMITER ',' CSV HEADER;

We created our first automated script in PL/pgSQL. Now every night at 5:40, the script will automatically take all new data from the nyse.csv file and add it to table nyse_data_new.

Now we need to link our main table nyse_data with our temporary nyse_data_new. To do this we will use a function and a trigger.

create or replace function nyse_trig() 
returns trigger AS $$
declare
  
  Date   DATE;
  Symbols   VARCHAR(10);
  Adj_Close NUMERIC(21, 15);
  Close     NUMERIC(21, 15);
  High      NUMERIC(21, 15);
  Low       NUMERIC(21, 15);
  Open      NUMERIC(21, 15);
  Volume    NUMERIC(13, 2);

BEGIN

 date = new.date;
 Symbols = new.symbols;
 Adj_close = new.adj_close;
 close = new.close;
 High = new.High;
 Low = new.Low;
 Open = new.Open;
 Volume = new.volume;

if(date = CURRENT_DATE -1) then
insert into nyse_data values(
       new.date, new.symbols, new.adj_close, new.close, new.High, new.Low, new.Open, new.volume
       );
end if;

return new;
END;
$$ language plpgsql

-- Trigger

create trigger nyse_trig_on_insert
after insert on 
nyse_data_new
for each row
execute procedure nyse_trig()

I will not explain all settings of these scripts because it will take a lot of time. If you want to know more about them, there are hundreds of PL/pgSQL function and trigger descriptions on the Internet. At this point, you can also choose which information you want to migrate from the temporary table to the main table. You can also find information on how to do this on the Internet or just think logically.

Now, we have to check if all tickers have updated.

Create an automatic script execution in PgAgent, for example at 6:30 am server time. The script compares our main data table with the list of tickers of the New York Stock Exchange.
The script automatically creates a list of tickers that are missing in the main table, also converting the information into an understandable language for the Python parser.

COPY (SELECT '[' || array_to_string(array_agg(''''||ticker||''','), ' ') || ']' FROM nyse_ticker_list 
WHERE ticker NOT IN 
    (select symbols from nyse_data 
     WHERE date = CURRENT_DATE-1 GROUP BY symbols))
TO 'C:\folder_name\file_name.csv';

After parsing the file again with the python parsing script, all we have to do is add the new data back into the temporary table. After that, our trigger and function will automatically pick up the necessary information for the main table.

This is done in exactly the same way we did the script for adding new information to the nyse_data_new, just change the execution time and file name.

As a result, we should have the following scheme.

Time.Description.
3:11 AmPython starts downloading new DATA.
5:40 AmPostgreSql add it to table nyse_data_new
6:30 AmChecking and creating file with missing tickers.
7:03 AmPython download missing data.
8:01 AmPostgreSql add missing tickers in to nyse_data_new

Remember this is just an example.

Wish you a great day and make your dream come thru.

Leave a Reply

Your email address will not be published. Required fields are marked *