Decision Intelligence Orchestration

Healthcare Staffing & Analytics Engine

A multi-modal comparison of Payroll-Based Journal (PBJ) data processing across research, local, and enterprise-distributed environments.

Method 1

Google Colab

Zero-setup prototyping using SQLite In-Memory logic.

Method 2

Databricks Local

Vectorized Pandas execution on Driver nodes for mid-market scale.

Method 3

Databricks Spark

Distributed PySpark SQL for Big Data production reliability.

📊 Data Volume vs. Engine Selection

Decision Insight: For datasets < 2GB, Method 2 (Pandas on Driver) outperforms Spark by eliminating distributed shuffle overhead.

⚙️ Healthcare ETL Logic Gate

Ingestion Remote CSV Retrieval (Requests/IO)
Cleaning Regex-based Dynamic Header Detection
Standard CMS ID Zero-Padding & Fuzzy Matching
Output PBJ Staffing Market Share Analysis
Top 100
Target Facilities
3 Methods
Execution Engines