第11章のPythonコード

第11章 操作変数法

サンプルデータ

cigarette.csv:母親の喫煙と新生児の出生体重に関する架空データ.

proximity.csv:Card (1995)のデータ.

パッケージの読み込み

import pandas as pd
from statsmodels.formula.api import ols
from linearmodels.iv import IV2SLS, compare

11.2 操作変数法の実施:2段階最小二乗法

11.2.1 2段階最小二乗法

cigdata = pd.read_csv('cigarette.csv')

result_ols = ols('bw ~ cigpacks', data = cigdata).fit()
print(result_ols.summary().tables[1])
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept   3423.2281     17.023    201.094      0.000    3389.823    3456.633
cigpacks     -62.4019      1.544    -40.405      0.000     -65.433     -59.371
==============================================================================
first_stage = ols('cigpacks ~ price', data = cigdata).fit()
cigdata = cigdata.assign(fitted = first_stage.predict())

result_iv = ols('bw ~ fitted', data = cigdata).fit()
print(result_iv.summary().tables[1])
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept   3331.8940     22.218    149.961      0.000    3288.294    3375.494
fitted       -53.2676      2.034    -26.192      0.000     -57.258     -49.277
==============================================================================
print(first_stage.summary().tables[1])
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     42.0637      0.340    123.751      0.000      41.397      42.731
price         -5.0058      0.053    -95.219      0.000      -5.109      -4.903
==============================================================================

Rによるデータ演習

proxdata = pd.read_csv('proximity.csv')

proxdata.head()
lwage educ exper black south smsa nearc4
0 6.306275 7 16 1 0 1 0
1 6.175867 12 9 0 0 1 0
2 6.580639 12 16 0 0 1 0
3 5.521461 11 10 0 0 1 1
4 6.591674 12 16 0 0 1 1
formula = 'lwage ~ 1 + educ + exper + black + south + smsa'
OLS = IV2SLS.from_formula(formula, data = proxdata).fit(cov_type='unadjusted')
print(OLS.summary.tables[1])
                             Parameter Estimates                              
==============================================================================
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
Intercept      4.9133     0.0631     77.917     0.0000      4.7897      5.0369
black         -0.1882     0.0178    -10.604     0.0000     -0.2230     -0.1534
educ           0.0738     0.0035     20.908     0.0000      0.0669      0.0807
exper          0.0393     0.0022     17.924     0.0000      0.0350      0.0436
smsa           0.1647     0.0157     10.509     0.0000      0.1340      0.1955
south         -0.1291     0.0152    -8.4829     0.0000     -0.1589     -0.0992
==============================================================================
formula = 'lwage ~ 1 + exper + black + south + smsa + [educ ~ nearc4]'
TSLS = IV2SLS.from_formula(formula, data = proxdata).fit(cov_type='unadjusted')
print(TSLS.summary.tables[1])
                             Parameter Estimates                              
==============================================================================
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
Intercept      3.9398     0.8309     4.7414     0.0000      2.3112      5.5684
black         -0.1296     0.0532    -2.4357     0.0149     -0.2339     -0.0253
exper          0.0623     0.0197     3.1663     0.0015      0.0237      0.1008
smsa           0.1348     0.0303     4.4555     0.0000      0.0755      0.1941
south         -0.1093     0.0232    -4.7186     0.0000     -0.1546     -0.0639
educ           0.1318     0.0495     2.6624     0.0078      0.0348      0.2289
==============================================================================
print(compare({'OLS':OLS, '2SLS':TSLS}, precision = 'std-errors', stars = True))
                 Model Comparison                 
==================================================
                                OLS           2SLS
--------------------------------------------------
Dep. Variable                 lwage          lwage
Estimator                       OLS        IV-2SLS
No. Observations               3010           3010
Cov. Est.                unadjusted     unadjusted
R-squared                    0.2788         0.2140
Adj. R-squared               0.2776         0.2127
F-statistic                  1163.4         673.47
P-value (F-stat)             0.0000         0.0000
==================     ============   ============
Intercept                 4.9133***      3.9398***
                           (0.0631)       (0.8309)
black                    -0.1882***      -0.1296**
                           (0.0178)       (0.0532)
educ                      0.0738***      0.1318***
                           (0.0035)       (0.0495)
exper                     0.0393***      0.0623***
                           (0.0022)       (0.0197)
smsa                      0.1647***      0.1348***
                           (0.0157)       (0.0303)
south                    -0.1291***     -0.1093***
                           (0.0152)       (0.0232)
==================== ============== ==============
Instruments                                 nearc4
--------------------------------------------------

Std. Errors reported in parentheses