At 10:00 am on 6/10/26, the request "Evaluate the analysis at https://econdataus.com/stem_workers1.htm and provide the evaluation in HTML" resulted in the following response from Gemini (in a Chrome browser):

Evaluation of Analysis: "STEM Workers, H-1B Visas, and Productivity in US Cities"

The analyzed text presents a methodological critique of a prominent 2015 study by economists Giovanni Peri, Kevin Shih, and Chad Sparber. By utilizing R and a Shiny dashboard to replicate the original Stata-based findings, the author challenges the study's core conclusions by altering specific econometric specifications. Below is an evaluation of the primary arguments raised in this critique.


1. Software Replication and Minor Discrepancies

The author successfully replicates the study's baseline 2-Stage Least Squares (2SLS) regression slopes and standard errors within a remarkably tight margin (mostly under 1%), utilizing standard R packages such as ivreg, sandwich, and lmtest.


2. The Mismatched Time Span Problem (The Strongest Critique)

The core of the author's critique rests on a structural data discrepancy: the original study mixes a 10-year growth period (1990–2000) with two 5-year growth periods (2000–2005 and 2005–2010) without annualizing or standardizing the rates.

Methodological Validity: High. In linear regression models tracking growth, failing to normalize time blocks means the 10-year period artificially inflates the magnitude of change for both the independent and dependent variables. The author demonstrates this effectively via two methods:

  1. Simulated Data: By generating two completely independent variables that both scale with time, the author proves that mixing 5-year and 10-year intervals creates a highly statistically significant false correlation ($p < 0.0001$). Standardizing the data to 1-year or 5-year blocks correctly removes this false correlation.
  2. Re-Analysis of Study Data: When the author normalizes the actual study data to a 1-year growth rate, none of the regressions remain statistically significant (with $p$-values jumping to between 11.7% and 75.5%). This heavily implies that the original study's significant results were an artifact of mismatched time horizons.

3. Overfitting via Metropolitan Area Dummy Variables

The analysis criticizes the study's inclusion of i.metarea (metropolitan area fixed effects), pointing out that adding 218 dummy variables to a dataset of only 657 observations yields roughly 3 observations per parameter. This violates Harrell's general rule of thumb, which requires a minimum of 10 observations per parameter to avoid overfitting.


4. Mathematical Rigor: Raw Growth vs. Natural Logs

The author correctly points out that when standardizing growth rates across intervals, using the natural logarithm of growth rates is mathematically necessary to maintain linearity across different base periods. As demonstrated in the author's final tables, applying log transforms ensures that the regression slopes stay perfectly constant regardless of whether the data is scaled to a 1-year or 5-year baseline, changing only the intercept. This adjustment grounds the critique in sound financial and economic mathematics.


Conclusion

The critique is analytically robust and highly persuasive. By combining open-source replication, simulated falsification, and parameter stress-testing, the author successfully demonstrates that the foundational claims of the paper—namely, that foreign STEM workers drastically boost native wages without harming employment—rely on fragile econometric specifications, over-parameterization, and a fundamental failure to normalize varying time intervals.