Skip to content

adolfojmnz/pandaspg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Description

This library contains several functions that allow you to migrate data from a CSV file or Pandas Dataframe into a PostgreSQL database using the libraries Psycopg2 and Pandas. Specifically, it includes functions for:

  • Loading data from a CSV file to a Pandas dataframe
  • Mapping Pandas Dataframe columns to their datatypes
  • Mapping Pandas Dataframe columns to suitable PostgreSQL datatypes
  • Connect to a PostgreSQL database
  • Creating new tables on a PostgreSQL database
  • Inserting data from a Pandas DataFrame into a table in a PostgreSQL database

Installation

Install The Required Dependencies

pip install pandas psycopg2

Install The pandaspg Package

pip install pandaspg
💡 Make sure you have a PostgreSQL database up and running!

Usage

I will walk you through a step-by-step example of how to migrate data from a CSV file into a PostgreSQL database. For demonstration purposes, download the exoplanets file located in this repo.


Download The CSV file

wget https://raw.githubusercontent.com/adolfojmnz/pandaspg/main/exoplanets_07-04-2023.csv
💡 This file contains a total of 34,112 rows and 92 columns with a total size of 28.6 MB

Import The Library

import pandaspg

Create a Pandas dataframe with the CSV file data

dataframe = pandaspg.csv_to_dataframe('exoplanets_04-07-2023.csv')

Generate a dictionary mapping the dataframe columns to their datatype

column_datatypes_dict = pandaspg.get_dataframe_column_dtypes_dict(dataframe)

Generate a dictionary mapping the dataframe columns to a suitable PostgreSQL datatype

pg_column_datatypes_dict = pandaspg.map_pandas_to_postgresql_datatypes(
    column_datatypes_dict
)

Connect to an existing and running PostgreSQL database

connection = pandaspg.connect_to_postgresql(
    database='analysis',
    user='postgres',
    password='postgres',
    host='localhost',
    port=5432,
)
💡 Replace the above parameters as you see fit.

Create a PostgreSQL table

pandaspg.create_postgresql_table(
    connection, 'exoplanets_csv', pg_column_datatypes_dict
)
💡 The parameter `exoplanets_csv` is the name of the database to be created.

Insert the data from the dataframe into the recently created table

pandaspg.insert_dataframe_into_postgresql(
    connection, 'exoplanets_csv', dataframe
)

Close the connection with the database

connection.close()

The Full Example

import pandaspg

dataframe = pandaspg.csv_to_dataframe('exoplanets_04-07-2023.csv')

column_datatypes_dict = pandaspg.get_dataframe_column_dtypes_dict(dataframe)

pg_column_datatypes_dict = pandaspg.map_pandas_to_postgresql_datatypes(
    column_datatypes_dict
)

connection = pandaspg.connect_to_postgresql(
    database='analysis',
    user='postgres',
    password='postgres',
    host='localhost',
    port=5432,
)

pandaspg.create_postgresql_table(
    connection, 'exoplanets_csv', pg_column_datatypes_dict
)

pandaspg.insert_dataframe_into_postgresql(
    connection, 'exoplanets_csv', dataframe
)

connection.close()

Inspect The Results

Enter the PostgreSQL prompt

psql -U postgres -d analysis

List the tables in the analysis database

\dt

Retrieve the data from the exoplanets_csv table

SELECT * FROM exoplanets_csv;

About

This library contains several functions that allow you to migrate data from a CSV file or Pandas Dataframe into a PostgreSQL database using the libraries Psycopg2 and Pandas

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages