Stop Building Generic Portfolios
The Exact Data Engineering Project to Get You Hired
By Chris Gambill | Founder, Gambill Data
If you are trying to land a mid-level or senior data engineering role right now, I need to tell you a difficult truth: hiring managers are ignoring your portfolio.
I’ve spent the last 25 years navigating massive data shifts across telecom, cybersecurity, manufacturing, and aviation. I’ve seen thousands of resumes. And the one constant I see today is that theory only gets you so far. When a hiring manager opens your GitHub and sees another "Titanic Dataset" analysis or a generic Twitter sentiment scraper, they keep scrolling. Real-world business problems are messy. They involve ambiguous requirements, delayed cash flow, and complex architecture. If your portfolio doesn’t reflect that reality, you won't stand out.
To show you what an enterprise-ready portfolio project actually looks like, I ran a scenario through the Gambill Data Coaching App. We targeted a Staff Data Engineer role in the Healthcare industry.
Here is the exact Statement of Work (SOW) the AI generated, and why building this will get you the interview.
Generate data engineering end-to-end portfolio projects that actually win job offers!
The Project: Healthcare Revenue Cycle Denial Prevention
Instead of just moving data from Point A to Point B, a real engineer builds systems that solve expensive business problems.
The Business Problem
Healthcare provider finance teams often know their cash is being delayed, but they lack a unified data product that connects payer mix, outpatient/inpatient utilization, and revenue-cycle efficiency indicators into one decision-ready view. Without this, leaders struggle to prioritize process improvement efforts or target the facilities where denial prevention would have the greatest financial impact.
The Technical Stack
To build this, you need to prove you can handle a modern data stack. This project utilizes:
Databricks & PySpark (for scalable data processing)
Delta Lake (for reliable data storage and ACID compliance)
dbt (for gold-layer metric modeling)
Power BI (for business-facing dashboards)
GitHub (for version control and CI/CD basics)
The 5-Phase Implementation Plan
If you want to build this for your own portfolio, here is the exact step-by-step blueprint generated by our app.
Phase 1: Business Framing and KPI Definition
Don't write a line of code yet. First, define the fictitious client as a regional health system struggling with delayed reimbursement. Document 5-7 business questions, such as: Which facilities show the highest reimbursement leakage? Where does payer mix create cash-flow pressure? Create a source-to-KPI mapping so every metric is traceable.
Phase 2: Source Acquisition and Ingestion Framework
Download public healthcare datasets from the CMS (Centers for Medicare & Medicaid Services). Build reusable ingestion patterns that normalize these files into raw landing tables. Write Python utilities for schema cleanup, column standardization, and file-level metadata capture.
Phase 3: Silver Layer Standardization
Use PySpark to standardize provider IDs, facility names, geography fields, and reporting periods across all your sources. Resolve hospital entities using the Provider of Services file as your reference dimension. Your goal here is to build clean, fact-like tables for cost report financials and inpatient utilization.
Phase 4: Gold Layer Metric Engineering
This is where you develop Staff-level SQL and PySpark transformations. Create SQL models for reimbursement efficiency and year-over-year trend analysis. Use window functions to rank hospitals by deterioration in reimbursement-related indicators. Finally, create a "Cash Acceleration Opportunity" estimate using scenario logic.
Phase 5: Power BI Dashboard & Executive Narrative
Build a Power BI semantic model on top of your Gold tables. Create visual dashboards showing estimated cash acceleration opportunity and at-risk hospitals. Include a prioritization matrix that compares operational burden versus estimated financial upside.
Why This Project Wins Interviews
The biggest mistake engineers make in interviews is talking purely about the technology. Hiring managers want to know how your technology impacts the business.
When you build a project like this, you walk into the interview armed with real-world Talking Points, such as:
“Here is how I translated an ambiguous healthcare finance problem into measurable KPIs and a scoped 4-week delivery plan.”
“Let me walk you through how I used PySpark for scalable entity resolution across hospital datasets.”
“Here is the data quality checking I implemented to make the pipeline trustworthy for executive reporting.”
You are no longer a junior dev talking about a Kaggle dataset. You are a data engineer solving enterprise revenue problems.
Stop Guessing. Start Engineering.
You don't have to spend weeks trying to brainstorm the perfect project. I built the Gambill Data Coaching App to do the heavy lifting for you.
The app takes your current resume, your specific skill gaps, and a link to your absolute dream job, and generates a custom, high-quality Statement of Work exactly like the one above in about 60 seconds.
It gives you the data sources, the architecture, the deliverables, and the interview talking points. No more staring at blank screens.
Ready to build a portfolio that actually opens doors? Standard access is $50/month, but early adopters who join the beta testing group today get a 50% lifetime discount. (Note: Because we are in early beta, access is granted manually. Once you sign up, look out for an email from me with your personal access link).
See exactly how we build a Senior-level project in 60 seconds.