At 10:03 am on 6/10/26, the request "Evaluate the analysis at https://econdataus.com/amer_jobs1.htm and provide the evaluation in HTML" resulted in the following response from Gemini (in a Chrome browser):

Methodological Evaluation of "Analysis of 'Immigration and American Jobs' with Shiny"

The analytical critique and software replication hosted at econdataus.com/amer_jobs1.htm deconstructs a prominent 2011 policy report, "Immigration and American Jobs", authored by economist Madeline Zavodny (published by AEI and the Partnership for a New American Economy). The original study famously concluded that adding 100 foreign-born STEM workers with advanced U.S. degrees is associated with creating an additional 262 jobs for native U.S. workers.

Using R and an interactive Shiny dashboard, the author replicates the original econometric parameters and exposes significant vulnerabilities regarding sensitivity to control variables, potential overfitting, and data selection bias ("p-hacking"). Below is a structured evaluation of this analysis.

1. Successful Software Replication and Verification

The author successfully translates the original Stata code (using Ordinary Least Squares regressions) into R using the baseline lm function. By employing the sandwich and lmtest packages, the author perfectly mirrors the study's heteroskedasticity-robust, state-level clustered standard errors (cluster(statefip)).

Accuracy: High. The author's Shiny dashboard demonstrates a 0.0% discrepancy in regression slopes, standard errors, and p-values compared to the published paper's Table 2 metrics.
Open-Source Advocacy: By converting proprietary Stata code into open-source R code, the author enables independent verification of findings that actively influence federal immigration debates, democratizing public policy research.

2. Faulty Job-Creation Derivation Logic

A major contribution of this critique is unpacking the exact mathematical transformation used to claim that "262 jobs are created" per 100 STEM graduates. The author reveals that the study calculates this using a simple linear scale:

Jobs = 100 * Slope * (Average Native Employment / Average Foreign STEM Population)

The author correctly flags a fundamental mathematical mismatch here. The actual regression equation uses log transformations of ratios: log(Native Employment Rate) regressed on log(Immigrant Share of Employment). Because the dependent variable incorporates the total native population (rather than just the active labor force), a positive slope coefficient does not linearly dictate net job growth if the native population scales faster than employment during the same window. The absolute volume of "jobs created" is an unstable estimate mapped loosely onto a non-linear log model.

3. Extreme Sensitivity to Fixed-Effects Controls

The analysis leverages the interactive capabilities of Shiny to strip out control variables one by one, exposing how fragile the original study's positive correlations are:

Removing Year Indicators: Unchecking the time fixed-effects variables causes 5 out of the 6 regressions to flip from positive to negative slopes.
Removing State Indicators: Removing state-level fixed effects renders every single p-value statistically insignificant, causing half the slopes to turn negative.

This demonstrates that the original paper's headline findings do not reflect a robust, universally present economic phenomenon. Instead, the positive relationship is heavily dependent on specific cross-sectional baseline differences between states and universal macroeconomic shifts over time.

4. Parameter Overfitting (Harrell's Rule Violation)

The critique applies sound statistical theory regarding model over-parameterization. The dataset consists of 408 total observations (51 geographic regions over 8 years). By introducing categorical dummy variables for all states and years, the model burns roughly 58 degrees of freedom.

This yields roughly 7 to 8 observations per regression parameter, falling short of Harrell's recommended minimum rule of thumb of 10 observations per parameter to avoid overfitting. The analysis rightly notes that by packing the regression with state indicators, the model risks capturing localized noise rather than an actual macro-employment signal, diminishing its predictive power on out-of-sample data.

5. Data Selection Bias and Arbitrary Time Spans

The most compelling indictment of the study’s generalizability is found in the author’s multi-interval cross-validation table. The author tests the exact same econometric model against 36 distinct time spans of 3 years or longer between the years 2000 and 2010.

Out of 36 potential time blocks, only 4 intervals yield statistically significant positive results—all of which happen to start during the 2000–2001 dot-com crash.
When evaluating the full 2000–2010 window without state dummies, the regression yields highly significant, deeply negative slopes across almost all spans ending after 2007, which would imply severe native "job destruction."

This volatile performance reinforces external reports that the study's decision to drop the 2008–2010 data (ostensibly to avoid the Great Recession) while retaining the dot-com recession years (2000–2001) constitutes a form of arbitrary data filtering—often referred to as p-hacking—to manufacture a desired policy narrative.

Conclusion

The analysis hosted on econdataus.com is a rigorous, objective, and mathematically sound deconstruction of the "Immigration and American Jobs" paper. By proving that the original model lacks predictive validity across alternative time blocks, falls short of overfitting thresholds, and yields opposite results under minimal specification changes, the author effectively demonstrates that the widely cited "2.62 jobs created" statistic is a fragile econometric artifact rather than a stable economic truth.