Database Automation with PgAgent .

Database Automation with PgAgent: How to Automate Data Collection and Storage

In today’s world, data is the king, and automated data collection and storage are crucial to ensure timely and efficient data analysis. If you are working with PostgreSQL, you can automate data collection and storage using PgAgent, a PostgreSQL job scheduler.

In this article, we’ll guide you through the process of setting up a PostgreSQL database automation system using PgAgent. We’ll cover the following steps:

  • Adding Basic Data to the Main Table
  • Automating the Addition of New Data
  • Linking the Main Table with the Temporary Table
  • Checking if All Tickers Have Updated

If you have never worked with this application for PgAdmin, check out its features at this link: PgAgent documentation.

Step 1: Adding Basic Data to the Main Table

The first step is to add basic data to the main table. You can do this by writing a script in the standard input window of PgAdmin. For instance, you can add data from a CSV file using the following code:

COPY nyse_data FROM 'C:\file_path\nyse_data.csv' DELIMITER ',' CSV HEADER;

You can then check if the data has been added to the table using the following code:

select count(*) from nyse_data;
select * from nyse_ticker_list;

We have an empty table nyse_data_new to store new Yahoo Finance incoming data. This table will store all new quotes for the selected instruments. This is done to avoid duplicates in the main table. Since we use a daily interval, our table will store yesterday’s data.

Step 2: Automating the Addition of New Data

To automate the process of adding new data to the nyse_data_new table, we will use PgAgent. We’ll create a script in PL/pgSQL that copies data from a CSV file and inserts it into the nyse_data_new table. The script will run every night at 5:40am, assuming the Python script parses new information at 3:11am server time and takes exactly one hour to parse and save new data to a file.

Here’s an example of the PL/pgSQL script:

copy nyse_data_new FROM 'C:\file_path\nyse_data_new.csv' 
DELIMITER ',' CSV HEADER;

We created our first automated script in PL/pgSQL. Now every night at 5:40, the script will automatically take all new data from the nyse.csv file and add it to table nyse_data_new.

Step 3: Linking the Main Table with the Temporary Table

Next, we need to link the main nyse_data table with the temporary nyse_data_new table to avoid duplicates. We’ll use a function and a trigger to achieve this.

First, we’ll create a PL/pgSQL function called nyse_trig() that takes the values from a new row inserted into the nyse_data_new table and inserts them into the nyse_data table if the date is equal to the previous day’s date.

create or replace function nyse_trig() 
returns trigger AS $$
declare
  
  Date   DATE;
  Symbols   VARCHAR(10);
  Adj_Close NUMERIC(21, 15);
  Close     NUMERIC(21, 15);
  High      NUMERIC(21, 15);
  Low       NUMERIC(21, 15);
  Open      NUMERIC(21, 15);
  Volume    NUMERIC(13, 2);

BEGIN

 date = new.date;
 Symbols = new.symbols;
 Adj_close = new.adj_close;
 close = new.close;
 High = new.High;
 Low = new.Low;
 Open = new.Open;
 Volume = new.volume;

if(date = CURRENT_DATE -1) then
insert into nyse_data values(
       new.date, new.symbols, new.adj_close, new.close, new.High, new.Low, new.Open, new.volume
       );
end if;

return new;
END;
$$ language plpgsql

Next, we’ll create a trigger called nyse_trig_on_insert that executes the nyse_trig() function every time a new row is inserted into the nyse_data_new table.

create trigger nyse_trig_on_insert
after insert on 
nyse_data_new
for each row
execute procedure nyse_trig()

Step 4: Check for missing tickers

We also want to check for any missing tickers and add them to the nyse_data_new table. We’ll use a PL/pgSQL script to generate a list of missing tickers in the nyse_data table by comparing it to the list of tickers in the nyse_ticker_list table. The script will run every morning at 6:30am.

COPY (
  SELECT '[' || array_to_string(array_agg(''''||ticker||''','), ' ') || ']' 
  FROM nyse_ticker_list 
  WHERE ticker NOT IN (
    SELECT symbols 
    FROM nyse_data 
    WHERE date = CURRENT_DATE-1 
    GROUP BY symbols
  )
) TO 'C:\folder_name\file_name.csv';

After parsing the file again with the Python parsing script, all we have to do is add the new data back into the temporary table. After that, our trigger and function will automatically pick up the necessary information for the main table.

COPY nyse_data_new FROM 'C:\file_path\missing_tickers_data.csv' 
DELIMITER ',' CSV HEADER;

That’s it! Our data automation pipeline is now complete. Here’s a summary of the process:

3:11 AM - Python starts downloading new data
5:40 AM - Postgres adds new data to table nyse_data_new
6:30 AM - PgAgent script checks and creates a file with missing tickers
7:03 AM - Python downloads missing data
8:01 AM - Postgres adds missing tickers to nyse_data_new

This is just an example, and you can adjust the timings and frequency of the scripts based on your specific needs. With this automated process in place, you can have a reliable and up-to-date database without the need for manual intervention.

GitHub repository: https://github.com/KsenoLv/Server/blob/Server/PostgreSQL/Automation.sql

Kseno avatar

Leave a Reply

Your email address will not be published. Required fields are marked *