Introduction: After participating in a Python hackathon, I decided to apply my new skills by working with a hospital dataset. I wanted to analyze it using SQL, but I soon realized that directly using Excel files wasn't possible. I needed to convert them to CSV format, create tables, and then load the data. This seemed like a lot of work, which got me thinking about making the process more efficient. I remembered something I learned during the hackathon – connecting Jupyter notebooks with databases – and used that knowledge to simplify the task.
My work began with the following steps:
Pursuing Efficiency: I pondered whether there was a smarter way to handle this process. This is where the hackathon's lessons came into play. I recalled a way to connect Jupyter notebooks with databases, and that inspired a new approach to tackle the problem more efficiently.
I first loaded the 'HospitalDatabase.xlsx' Excel file into Jupyter using the code below:
df_hosp = pd.read_excel('HospitalDatabase.xlsx',sheet_name = None)
pats=df_hosp['Patients']
amb_visits=df_hosp['AmbulatoryVisits']
re_adm=df_hosp['ReAdmissionRegistry']
discharges=df_hosp['Discharges']
providers=df_hosp['Providers']
ed_unique=df_hosp['EDUnique']
ed_visits = df_hosp['EDVisits'] This involved reading the Excel file and assigning each sheet to a specific variable. Subsequently, I established a connection string and interacted with a PostgreSQL database using the following libraries:
import pandas as pd
import psycopg2 as ps
from sqlalchemy import create_engine
# Connect to the PostgreSQL database
conn_string = 'postgresql://postgres:postgres@localhost/Hospital'
db = create_engine(conn_string)
conn = db.connect()
Initially, I found myself loading each sheet separately, and yet, it still involved a significant amount of manual work.
for example
Driven by the desire to enhance performance, I came up with the following code idea. Streamlining Code and Workflow: Transitioning from manual data loading, I discovered a better way: for i in df_hosp.keys():
df_hosp[i].to_sql(i,con=conn,if_exists = 'replace',index=False)
In this approach, I utilized the key-value method to automate the loading of all values. Unlike my earlier approach, where I had to manually assign each sheet to a variable, this method streamlined the process.
This change saved me from the hassle of loading each sheet separately. It demonstrated how optimizing code could improve the process.
In this approach, I achieved the task without the need for manual reading of Excel files and loading each sheet individually. Users are able to input their database details and select their sheets for loading. The code allows users to choose two files at a time.
Innovating a Web App: With newfound insights, I envisioned an interactive web app to streamline most of the process. A combination of tools brought my vision to life.
Creating the Web App: # It will load all the necessary library
import pandas as pd
import panel as pn
from panel import widgets as pnwidgets
from sqlalchemy import create_engine
pn.extension()
# This 4 lines used to get necessary information from the user via Text input
dbUserName = pnwidgets.TextInput(name='dbUserName', placeholder='Enter a username here...')
dbPassword = pnwidgets.TextInput(name='dbPassword', placeholder='Enter a password here...')
dbHost = pnwidgets.TextInput(name='dbHost', placeholder='Enter a host address here...')
dbName = pnwidgets.TextInput(name='dbName', placeholder='Enter a db name here...')
# File Selector library
fc = pnwidgets.FileSelector(directory='~', file_pattern='*.xls*',only_files=True)
# This method will be called via on_Click action o a button
def loadData(event):
result = str("")
try:
result = "Started Loading Data Process <br>"
connection_string = str("postgresql://"+dbUserName.value+":"+dbPassword.value+"@"+dbHost.value+"/"+dbName.value)
db = create_engine(connection_string)
conn = db.connect()
for file in fc.value:
excelData = pd.read_excel(file, sheet_name=None)
for i in excelData.keys():
#exceldf = excelData[i]
excelData[i].to_sql(i, con=conn, if_exists='replace', index=False)
result = result + "File"+ file + " Data loaded successfully<br>"
conn.close()
result = result + "Completed Loading Data Process"
except Exception as e:
result = result + "Error occurred while loading.<br> Please check the inputs like excel file, user & db informations.<br> Please check the exception details below <br>" + str(e)
output_text.value = result
#This is the button to load the data into SQL
loadDataButton = pn.widgets.Button(name='Load Data into DB')
loadDataButton.on_click(loadData)
# This is a output text to print all the necessary information to the user
output_text = pn.widgets.StaticText()
# Template
fc_pane = pn.panel(fc, width=300, height=300)
header_pane = pn.panel('<H3>User & DB information</H3>', width=500, height=40)
user_info_pane = pn.panel(pn.Row(dbUserName, dbPassword),width=300, height=300)
db_info_pane = pn.panel(pn.Row( dbHost, dbName), width=300, height=300)
button_pane = pn.panel(loadDataButton, width=300, height=300)
layout = pn.Column(fc_pane, header_pane, user_info_pane,db_info_pane, button_pane, output_text)
#Display the layout
layout.servable()
This code will generate following Output
After selecting the folder click on the button on the right most corner
which allow user to select the excel file
After selecting the file click on the forward arrow in the middle ,you can also load multiple file altogether. The backward arrow helps to deselect incase if we choose wrong file
Give the user and DB information and press the load button.
Closing Thoughts and Future Plans: This journey, influenced by hackathon experiences and powered by Python, has given me tools to efficiently manage data. Thank you Numpy ninja providing opportunity to explore and learn more , I've achieved 70% of my goals. There's much more to learn and explore