This project investigates housing sale data from Cook County with a dual focus: building predictive models for property valuation and examining the fairness of those models in real-world contexts. Using exploratory analysis and visual diagnostics, I identified both the key features that drive sale prices and the systemic patterns that lead to inequitable outcomes in tax assessments.
I began by exploring sale patterns and housing characteristics, identifying building square footage and number of bedrooms as strong predictors of sale price. Visualizations such as the jointplot and boxplot below helped support these relationships:
Though predictive models achieved strong accuracy (low RMSE), I found that residuals were not evenly distributed. Lower-priced homes were systematically overestimated, which leads to disproportionate tax burdens for low-income residents. These findings align with long-standing critiques of Cook County’s assessment system.
Even accurate models can reinforce systemic inequity if their errors fall disproportionately on disadvantaged groups. Drawing from local journalism and policy investigations, I discuss how tax appeal processes, neighborhood demographics, and historical segregation intersect with model performance to compound unfair outcomes.
Predictive models must go beyond technical accuracy to incorporate fairness, transparency, and local context. Without thoughtful oversight, even the most statistically sound models risk amplifying racial and economic disparities. This project illustrates how data science can be used not only to predict—but also to question and improve—the systems we measure.