Breaking Up with Excel

A short narrative about my love for and subsequent betrayal of Excel (or my newfound appreciation for a handful useful of Python functions, clean APIs, Postgres and Apache Superset).

11 min readJan 13, 2021

Part 1

The First Encounter

For those of you that truly know me, you’ll have known that I have always had an embarrassingly aggressive affinity for Excel. Given my degree in finance and background in operations, this love has made nothing but pure sense to me.

My first intimate moment with Excel, like many notable memories, was in high school. Specifically, it was in Ms. Weems’s “Computer Apps” class (aka a course on how to leverage Microsoft Office better). It is important to note that this was a purely transactional relationship between myself and that course—my participation in exchange for a necessary technology credit. Nothing more, nothing less.

If we fast forward to my first day as a finance major at the University of Kansas, “Computer Apps” and Ms. Weems were, sadly, nothing but distant memories.

The Renewal of an Age-old Spark

As I began to study more quantitative material in college, my irreverent use of lined notebook paper and laziness to change my own calculator batteries seriously began hindering my intellectual growth. I had enrolled in Introduction to Macroeconomics where complex economic theories are almost always reduced to simple homework questions that require you to perform burdensome calculations on notebook paper. Within a few weeks of the first lecture, I became tired of reading my own hand-writing and taking batteries out of the school computer mice to juice up my calculator (this was an act of desperate poverty, not of intentional malice).

It was at this point in my life where I revisited Excel. Things have not been the same since.

My homework started to become much more enjoyable as I was forced to reduce financial theory to simple, clean calculations in a spreadsheet. Both sides of my brain became engaged as I found formatting my models as equally enjoyable as I did the math behind the sheets. AND OH, BROTHER! Imagine the moment when I accidentally hit the “Alt” key, exposed shortcuts, and beheld the brilliance of the Excel engineering team! Every semester and internship after this moment only furthered this admittedly odd but beneficial infatuation.

Upon graduation from KU, I was, without question, a freak in the spreadsheets (in the most professional, LinkedIn-appropriate sense).

The Dangers of Love

All life choices have trade-offs and my pursuit of Excel was not an exception to this rule. My data analysis was limited to my ability to obtain a manageable CSV and the processing power of a computer while running calculation-heavy models. Plus, every hour I spent in Excel was another hour I did not spend learning programming languages like Python, understanding database terminology, and taking my BI skills to the next level.

If I am being completely honest with myself: I was getting by, collecting a check and my shortcut game was too strong to quit (on a PC, that is). Simply put, I was comfortable.

The Beginnings of Betrayal

About a year and a half ago, I started to seriously dabble in Python with the intent of maturing my data analysis skills (not counting a few moments in college). “Dabble” is italicized because I stayed within Jupyter Notebooks and wrote spaghetti code that output naked Matplotlib line graphs.

In November of 2019, I decided it was time to grow up. I enrolled in a General Assembly (GA) Python for Data Science course. This course taught me the fundamentals of version control via git, took me deeper into data analysis packages of interest like pandas, numpy and sci-kit learn and helped me mature my programming abilities. The cherry on top was that I had a phenomenal instructor (thank you, Joshua Bernhard, if you’re reading this).

After the GA course, I made a few lackluster attempts at incorporating Python into my work but felt that my productivity still was suffering too much in the short-run to justify the long-run tradeoff of automating my work. So, back to spreadsheets and shortcuts I went.

Seeing Other People

In mid-November of 2020, I left my job at Lucidworks to do a few months of soul-searching, reading, spending time with family and honing my data chops. Slowly but surely, I have started to transform the way I think about interacting with data by branching out from Excel to other tools.

While I believe this specific journey has an acute number of oddities that make it unique to me, I believe the general theme is not so different from that of many in the business world. Hence why I am writing this article.

Disclaimer:

The next part is a bit more technical than the part above. I’ve done my semi-best to simplify. It is intended for people who have dabbled in Python & pandas, know what a database is, and have tried (and failed) to effectively visualize various datasets quickly with matplotlib.

Part 2

Getting Out There Again

Given all of the time on my hands after leaving my job, outside of reading, walking my dog, and talking to my mom, I decided to study options investing. Not only is this topic interesting to me from a financial theory perspective, but it requires a heavy amount of quantitative analysis if you want to perform well. Plus, the exploration of this subject has now given me discrete problems I can solve with Python.

A quick note on options: An option is exactly what it sounds like. It is simply the ability to do something at some time in the future. That “something”, in our context, is like buying or selling a stock at a certain price. If you buy an option, all you are paying for is the “right” to do something. Given that nobody possesses a crystal ball, every option, even in life, has value. My strategy, in short, is to best trade the “value” of the option in the market.

Now that you are experts in options investing, it can be easily agreed upon that one of the most important things to do when buying/selling options is to understand the underlying asset (stock) you are trading. So, it should be clear that I needed to find easily accessible company data, a place to store that data somewhere, and a means for visualization to produce insight for investment strategies.

Right Time, Right Place

Over the past year, I have been following the development of an API called Financial Modeling Prep (FMP). Around the time that I left my job in November, they began to aggressively market their paid API, and I drank the Kool-Aid without hesitation. Why? Well, for the first time since I started studying finance, someone has aggregated the necessary, but disparate, information needed to truly study the financial performance of a company AND offer it to the public at a low cost. Plus, the API is very clean and the data is as accurate as I could have hoped for for $20/mo.. This definitely beat paying $24,000/year for Bloomberg or $13,000/year for CapIQ. Regardless, I now had data. So I was quickly off to a good start.

My First Hiccup: JSON data

Shortly after I selected my data source, I ran into my first problem. Their APIs naturally returned data in a JSON format. Below is an example of the JSON output from a call to the FMP API:

An example of what JSON looks like from the FMP API. The things before the colon are called “keys” and the things after are the “values”.

A few months before going into this project, I was terrified of JSON. For starters, it doesn’t easily “drag and drop” into a data tab in Excel like a CSV file does. Any time I needed to work with data in Python, it got put into a CSV file. Remember, relational tables were my home, and I did NOT enjoy straying from them — even if JSON’s structure lowered the CPU overhead associated with processing. Yet, after a few projects of having to handle JSON and realizing it was just like a dictionary in Python, I concluded that it wasn’t that difficult to use.

All of this being said, I encountered one issue with the whole JSON thing: I wanted to get more familiar using databases, and it didn’t make sense to simpleton me to put JSON data into a relational database without putting it into a table first. In my eyes, good data was only formatted in tables.

My Solution: pd.json_normalize()

Like Virgil leading Dante to the next canto, pd.json_normalize() helped me quickly progress past this frustrating juncture into my next one. All the function requires is for you to specify the JSON data and then (optionally) a depth of the “nested” data you want to access. Once you have done that, pandas turns your JSON data into a dataframe. See below:

The output of pd.json_normalize() when used on the JSON data above.

Finally. Data the way it should be: in a table (kidding, of course… maybe…).

My Second Hiccup: Interacting with Postgres

My interaction directly with databases was so infrequent until two to three months ago that it may as well have been non-existent. However, that was about to change. After carefully reviewing my options, I ended up choosing Postgres. Why? Postgres is a bit more robust than a database like MySQL. Why else? Not sure. Even though I haven’t leveraged much of the robust functionality of it, I thought it would be best to familiarize myself with a heavyweight tool first and then work backwards when I need to. Unfortunately, another issue confronted me immediately after I “carefully” selected my database of choice.

The issue was this: Until this point, I had never “seeded” a database (aka populate it for the first time) or really even updated one directly. I am not saying that I haven’t pushed data into applications before, but I am saying that I have never interacted with the backend databases (which is the point of teams that build UIs and frontends and make life easier for people like me, duh).

My Solution(s): psql, sqlalchemy and pd.to_sql()

All three of these tools helped me in different ways, but are all ultimately used for interacting with Postgres in some manner. I’ll quickly explain them because some might fall asleep reading this part (plus, it will be clear I don’t know much else about this topic if I keep writing about it):

) psql: This is simply the interactive terminal tool for Postgres. Basically, I could access my “financialdata” database from the command line. See below:

The output of running “psql financialdata” from the command line.

2. ) sqlalchemy: Think of this as a thing that builds a “bridge” to a database and allows scripts to interact with it. All I am doing below is using a function called “create_engine” to be able to point my script at my database:

3. ) pd.to_sql(): Discovering this was like listening to John Mayer’s rendition of Gravity on his “Live in LA” album for the very first time. Simply put, I was filled with awe and astonishment. Not only was this the easiest thing to use, but the function works exactly like it sounds. “Put your pandas dataframe into sql database”. Below is the code I wrote to do just that:

This is saying “For each pandas dataframe in my list of dataframes, push them to the database we are referencing (hint, look at the engine variable above)”.

The now populated database:

Some might be asking, “Danny, how on earth did you get your Postgres database there in the first place?”. Thanks for asking, curious reader. Here is a good place to get started with Postgres if you are at ground zero. Takes only a few minutes.

Third and Final Hiccup: Visualizing My Data

I was tired of creating terrible matplotlib graphs and refused to dump my financial data into Excel. Some may try to point to Dash or React and tell me to “get learnt” and pick one up. However, recall that I am a finance major and just triumphantly wrote half an article on getting data into a database. So, easy. In fact, for those wondering how far I made it with Dash, below is the latest iteration of my dashboard when I used it:

Yes, I couldn’t even get the alignment of the boxes fixed using CSS.

My troubles of finding a way to visualize my data were not remedied until I thought back to when Marcus Eagan demoed Apache Superset on our data at Lucidworks. It had been months since this first encounter with the tool, but I decided to check it out. And oh boy, was I glad I did.

Superset was INCREDIBLY easy to install and start. If you download Docker desktop and up the allocatable RAM to 6GB, per Superset, you can go to the incubator-superset folder, use the “docker-compose up” command, and start the program. No other computer tomfoolery. After that, Superset works just like an enterprise-grade BI tool. PLUS IT’S FREE! Below is the login screen once booted up:

Output when I typed “http://localhost:8088/login/”.

Are you kidding me? I get to manage credentials with this tool? I couldn’t even align an input box with Dash.

After I logged in, I simply pointed Superset at my database by going to the Data tab. See below:

Making another “bridge” or “connection” to my database with SQLAlchemy.

Once connected to my database, I brought in my datasets, made charts & tables on the dataset and BAM! My data was being visualized. Probably took me 10 minutes to build the example dashboard seen here:

Simple example of Superset’s dashboard functionality

END. TO. END. BABY. No Excel. No spreadsheets. No shortcuts. Just like the westward pioneers did it (sort of). Thank you, Apache Superset.

Ending Thoughts

Microsoft Excel taught me how to love data, numbers, and quantitative problems. If we are being honest, people in operations do not need Python, Postgres databases, and Superset to do their job effectively. Usually, PowerBI or something of the sort will do the trick when paired with Excel. The people insist on using them are either undercover data analysts/data scientists or are lying to themselves.

However, I feel that with the ever-increasing amount of emphasis placed on data gathering and analysis at the business ops level, this won’t be the case in 5–10 years.

Thanks for reading.

Side note: I will be offering up my Dash abilities for $250/hr. to companies really wanting to “transform” the way they look at their data.