Database Automation with PgAgent: How to Automate Data Collection and Storage
In today’s world, data is the king, and automated data collection and storage are crucial to ensure timely and efficient data analysis. If you are working with PostgreSQL, you can automate data collection and storage using PgAgent, a PostgreSQL job scheduler.
In this article, we’ll guide you through the process of setting up a PostgreSQL database automation system using PgAgent. We’ll cover the following steps:
- Adding Basic Data to the Main Table
- Automating the Addition of New Data
- Linking the Main Table with the Temporary Table
- Checking if All Tickers Have Updated
If you have never worked with this application for PgAdmin, check out its features at this link: PgAgent documentation.
Step 1: Adding Basic Data to the Main Table
The first step is to add basic data to the main table. You can do this by writing a script in the standard input window of PgAdmin. For instance, you can add data from a CSV file using the following code:
COPY nyse_data FROM 'C:\file_path\nyse_data.csv' DELIMITER ',' CSV HEADER;
You can then check if the data has been added to the table using the following code:
select count(*) from nyse_data;
select * from nyse_ticker_list;
We have an empty table nyse_data_new
to store new Yahoo Finance incoming data. This table will store all new quotes for the selected instruments. This is done to avoid duplicates in the main table. Since we use a daily interval, our table will store yesterday’s data.
Step 2: Automating the Addition of New Data
To automate the process of adding new data to the nyse_data_new
table, we will use PgAgent. We’ll create a script in PL/pgSQL that copies data from a CSV file and inserts it into the nyse_data_new
table. The script will run every night at 5:40am, assuming the Python script parses new information at 3:11am server time and takes exactly one hour to parse and save new data to a file.
Here’s an example of the PL/pgSQL script:
copy nyse_data_new FROM 'C:\file_path\nyse_data_new.csv'
DELIMITER ',' CSV HEADER;




We created our first automated script in PL/pgSQL. Now every night at 5:40, the script will automatically take all new data from the nyse.csv file and add it to table nyse_data_new
.
Step 3: Linking the Main Table with the Temporary Table
Next, we need to link the main nyse_data
table with the temporary nyse_data_new
table to avoid duplicates. We’ll use a function and a trigger to achieve this.
First, we’ll create a PL/pgSQL function called nyse_trig()
that takes the values from a new row inserted into the nyse_data_new
table and inserts them into the nyse_data
table if the date is equal to the previous day’s date.
create or replace function nyse_trig()
returns trigger AS $$
declare
Date DATE;
Symbols VARCHAR(10);
Adj_Close NUMERIC(21, 15);
Close NUMERIC(21, 15);
High NUMERIC(21, 15);
Low NUMERIC(21, 15);
Open NUMERIC(21, 15);
Volume NUMERIC(13, 2);
BEGIN
date = new.date;
Symbols = new.symbols;
Adj_close = new.adj_close;
close = new.close;
High = new.High;
Low = new.Low;
Open = new.Open;
Volume = new.volume;
if(date = CURRENT_DATE -1) then
insert into nyse_data values(
new.date, new.symbols, new.adj_close, new.close, new.High, new.Low, new.Open, new.volume
);
end if;
return new;
END;
$$ language plpgsql
Next, we’ll create a trigger called nyse_trig_on_insert
that executes the nyse_trig()
function every time a new row is inserted into the nyse_data_new
table.
create trigger nyse_trig_on_insert
after insert on
nyse_data_new
for each row
execute procedure nyse_trig()
Step 4: Check for missing tickers
We also want to check for any missing tickers and add them to the nyse_data_new
table. We’ll use a PL/pgSQL script to generate a list of missing tickers in the nyse_data
table by comparing it to the list of tickers in the nyse_ticker_list
table. The script will run every morning at 6:30am.
COPY (
SELECT '[' || array_to_string(array_agg(''''||ticker||''','), ' ') || ']'
FROM nyse_ticker_list
WHERE ticker NOT IN (
SELECT symbols
FROM nyse_data
WHERE date = CURRENT_DATE-1
GROUP BY symbols
)
) TO 'C:\folder_name\file_name.csv';
After parsing the file again with the Python parsing script, all we have to do is add the new data back into the temporary table. After that, our trigger and function will automatically pick up the necessary information for the main table.
COPY nyse_data_new FROM 'C:\file_path\missing_tickers_data.csv'
DELIMITER ',' CSV HEADER;
That’s it! Our data automation pipeline is now complete. Here’s a summary of the process:
3:11 AM - Python starts downloading new data
5:40 AM - Postgres adds new data to table nyse_data_new
6:30 AM - PgAgent script checks and creates a file with missing tickers
7:03 AM - Python downloads missing data
8:01 AM - Postgres adds missing tickers to nyse_data_new
This is just an example, and you can adjust the timings and frequency of the scripts based on your specific needs. With this automated process in place, you can have a reliable and up-to-date database without the need for manual intervention.
GitHub repository: https://github.com/KsenoLv/Server/blob/Server/PostgreSQL/Automation.sql
Leave a Reply