Import CSV from FTP to PostgreSQL Using Python

Import CSV from FTP to PostgreSQL Using Python | Automated ETL
Import CSV from FTP to PostgreSQL using Python Architecture

Automating CSV Imports: FTP to PostgreSQL with Python Full ETL Walkthrough & Code Examples

In modern data-driven architectures—especially in IoT platforms, smart metering systems, and industrial automation—data rarely originates from a single source. Embedded devices often export measurements as CSV files and upload them to an FTP server because FTP is lightweight and universally supported by legacy hardware.

The challenge lies in moving this data into a structured PostgreSQL database for analysis. In this guide, I will show you how to build a Python script that monitors an FTP server, detects new files, and performs an "Upsert" (Update or Insert) into your database.


🛠️ Prerequisites & Architecture

Before diving into the code, ensure you have the following Python libraries installed. We use ftplib for the connection and psycopg2 for high-performance database insertions.

pip install psycopg2-binary

The Workflow:

  1. Connect: Script logs into the FTP server.
  2. List: It retrieves a list of all CSV files.
  3. Compare: It checks a local "processed_files.json" log to see which files are new.
  4. Process: It downloads new files, parses the CSV, and pushes data to Postgres.


💻 Core Code Logic: The FTP Connection

While the full source code is available on GitHub (linked below), here is the critical logic for establishing a secure connection and listing files. This function ensures we only target .csv files:

import ftplib
import os

def get_files_from_ftp(host, user, pwd):
    # Establish connection to FTP Server
    ftp = ftplib.FTP(host)
    ftp.login(user, pwd)
    
    files = []
    try:
        # List all files in the directory
        filenames = ftp.nlst()
        
        # Filter only for CSV files
        for filename in filenames:
            if filename.endswith(".csv"):
                files.append(filename)
                
    except ftplib.all_errors as e:
        print(f"FTP Error: {e}")
        
    return files, ftp

Once the files are identified, we use psycopg2.extras.execute_values to batch insert data, which is significantly faster than inserting row-by-row.


📺 Watch the Step-by-Step Tutorial

I have recorded a two-part series demonstrating this code in a live production environment.

Part 1: Setting up the Environment

Part 2: The Python ETL Script


📦 Get the Full Source Code

The complete project, including the database schema, config file handling, and error logging, is available on my GitHub.

🔗 GitHub Repository:
View Project on GitHub


✅ Conclusion

Building your own ETL pipeline using Python gives you full control over how data is handled. Unlike expensive ETL tools, this script is free, lightweight, and can be scheduled using a simple Cron job or SystemD service.

If this guide helped you, please consider starring the repository on GitHub and subscribing to the YouTube channel for more IoT data engineering tutorials.

Comments

Popular posts from this blog

How to Import CSV Files into PostgreSQL Automatically

Apache NiFi ETL Tutorial for Beginners | Installation & Data Pipeline Basics

ESP32 TFT LCD 240x240 Tutorial-Display Images & DHT11 Sensor

Contact Form

Name

Email *

Message *