How to Import Multiple Excel files into Pandas

Disclaimer: All proprietary company data has been completely anonymized and replaced with randomly generated counterparts, to respect the stipulations outlined in my non-disclosure agreement

As part of my role as a Data Analyst at Exempt Edge, I did a lot of data modelling and cleaning. Sometimes, we would receive up to fifty excel sheets from a client and would need to import that into our production database. To do this, I would use Pandas or sometimes Spark(for millions or rows) depending on the size. If the columns matched across all sheets I would then ensure all the files are in the same folder and the same file type (.xlsx in this case).

import glob
import pandas as pd 
import numpy as np

# get file names
path =r'/Users/oresttokovenko/Desktop/client_data/client/Data/Aug_24_2021/Offerings'
filenames = glob.glob(path + "/*.xlsx")

filenames
> ['/Users/oresttokovenko/Desktop/client_data/client/Data/Aug_24_2021/Offerings/placeholder_1.xlsx',
'/Users/oresttokovenko/Desktop/client_data/client/Data/Aug_24_2021/Offerings/placeholder_2.xlsx',
'/Users/oresttokovenko/Desktop/client_data/client/Data/Aug_24_2021/Offerings/placeholder_3.xlsx',
'/Users/oresttokovenko/Desktop/client_data/client/Data/Aug_24_2021/Offerings/placeholder_4.xlsx',

master_df = pd.DataFrame()
for file in filenames:
    master_df = master_df.append(pd.read_excel(file, sheet_name = 0, skiprows = 1), ignore_index=True)

master_df.sample(20)

If the relevant columns are not in the same position or do not match by name, that would require a more manual approach, reading in the columns from each sheet using usecols='A:T' or usecols='A,B,D,E', then concatenating them together using master_df = pd.concat([TFSA_df, RRSP_df, RESP_df, CASH_df])

How to Import Multiple Excel files into Pandas

Disclaimer: All proprietary company data has been completely anonymized and replaced with randomly generated counterparts, to respect the stipulations outlined in my non-disclosure agreement

End-to-End ELT with Airbyte, dbt, Dagster, Snowflake and Metabase

Generating Custom Anki Decks with GPT-4 and LangChain

Creating SQL Queries using Python

How to Import Multiple Excel files into Pandas

Disclaimer: All proprietary company data has been completely anonymized and replaced with randomly generated counterparts, to respect the stipulations outlined in my non-disclosure agreement

You may also like

End-to-End ELT with Airbyte, dbt, Dagster, Snowflake and Metabase

Generating Custom Anki Decks with GPT-4 and LangChain

Creating SQL Queries using Python