Database Automation with PgAgent: How to Automate Data Collection and Storage
In today’s world, data is the king, and automated data collection and storage are crucial to ensure timely and efficient data analysis. If you are working with PostgreSQL, you can automate data collection and storage using PgAgent, a PostgreSQL job scheduler.
In this article, we’ll guide you through the process of setting up a PostgreSQL database automation system using PgAgent. We’ll cover the following steps:
- Adding Basic Data to the Main Table
- Automating the Addition of New Data
- Linking the Main Table with the Temporary Table
- Checking if All Tickers Have Updated
If you have never worked with this application for PgAdmin, check out its features at this link: PgAgent documentation.
Step 1: Adding Basic Data to the Main Table
The first step is to add basic data to the main table. You can do this by writing a script in the standard input window of PgAdmin. For instance, you can add data from a CSV file using the following code:
COPY nyse_data FROM 'C:\file_path\nyse_data.csv' DELIMITER ',' CSV HEADER;
You can then check if the data has been added to the table using the following code:
select count(*) from nyse_data; select * from nyse_ticker_list;
We have an empty table
nyse_data_new to store new Yahoo Finance incoming data. This table will store all new quotes for the selected instruments. This is done to avoid duplicates in the main table. Since we use a daily interval, our table will store yesterday’s data.
Step 2: Automating the Addition of New Data
To automate the process of adding new data to the
nyse_data_new table, we will use PgAgent. We’ll create a script in PL/pgSQL that copies data from a CSV file and inserts it into the
nyse_data_new table. The script will run every night at 5:40am, assuming the Python script parses new information at 3:11am server time and takes exactly one hour to parse and save new data to a file.
Here’s an example of the PL/pgSQL script:
copy nyse_data_new FROM 'C:\file_path\nyse_data_new.csv' DELIMITER ',' CSV HEADER;
We created our first automated script in PL/pgSQL. Now every night at 5:40, the script will automatically take all new data from the nyse.csv file and add it to table
Step 3: Linking the Main Table with the Temporary Table
Next, we need to link the main
nyse_data table with the temporary
nyse_data_new table to avoid duplicates. We’ll use a function and a trigger to achieve this.
First, we’ll create a PL/pgSQL function called
nyse_trig() that takes the values from a new row inserted into the
nyse_data_new table and inserts them into the
nyse_data table if the date is equal to the previous day’s date.
create or replace function nyse_trig() returns trigger AS $$ declare Date DATE; Symbols VARCHAR(10); Adj_Close NUMERIC(21, 15); Close NUMERIC(21, 15); High NUMERIC(21, 15); Low NUMERIC(21, 15); Open NUMERIC(21, 15); Volume NUMERIC(13, 2); BEGIN date = new.date; Symbols = new.symbols; Adj_close = new.adj_close; close = new.close; High = new.High; Low = new.Low; Open = new.Open; Volume = new.volume; if(date = CURRENT_DATE -1) then insert into nyse_data values( new.date, new.symbols, new.adj_close, new.close, new.High, new.Low, new.Open, new.volume ); end if; return new; END; $$ language plpgsql
Next, we’ll create a trigger called
nyse_trig_on_insert that executes the
nyse_trig() function every time a new row is inserted into the
create trigger nyse_trig_on_insert after insert on nyse_data_new for each row execute procedure nyse_trig()
Step 4: Check for missing tickers
We also want to check for any missing tickers and add them to the
nyse_data_new table. We’ll use a PL/pgSQL script to generate a list of missing tickers in the
nyse_data table by comparing it to the list of tickers in the
nyse_ticker_list table. The script will run every morning at 6:30am.
COPY ( SELECT '[' || array_to_string(array_agg(''''||ticker||''','), ' ') || ']' FROM nyse_ticker_list WHERE ticker NOT IN ( SELECT symbols FROM nyse_data WHERE date = CURRENT_DATE-1 GROUP BY symbols ) ) TO 'C:\folder_name\file_name.csv';
After parsing the file again with the Python parsing script, all we have to do is add the new data back into the temporary table. After that, our trigger and function will automatically pick up the necessary information for the main table.
COPY nyse_data_new FROM 'C:\file_path\missing_tickers_data.csv' DELIMITER ',' CSV HEADER;
That’s it! Our data automation pipeline is now complete. Here’s a summary of the process:
3:11 AM - Python starts downloading new data 5:40 AM - Postgres adds new data to table nyse_data_new 6:30 AM - PgAgent script checks and creates a file with missing tickers 7:03 AM - Python downloads missing data 8:01 AM - Postgres adds missing tickers to nyse_data_new
This is just an example, and you can adjust the timings and frequency of the scripts based on your specific needs. With this automated process in place, you can have a reliable and up-to-date database without the need for manual intervention.
GitHub repository: https://github.com/KsenoLv/Server/blob/Server/PostgreSQL/Automation.sql
Leave a Reply