This package brings Epi Info statistical analysis routines, plus a log binomial regression analysis routine for computing adjusted relative risks of associations with binary outcome data, to the Python environment. It has methods for converting datasets into Python objects and for passing those objects to analysis functions.
pip install path/epiinfo-1.1.0.8-py3-none-any.whl
Full path to pip may be necessary.
Contains functions for importing data sources to Python objects (lists of dictionaries) for consumption by analysis functions.
from epiinfo.ImportData import *
dataset = eicsv('path/filename.csv')
Reads a CSV file and creates a list of dictionaries called 'dataset'.
dataset = eijson('path/filename.json')
Reads a JSON file and creates a list of dictionaries called 'dataset'.
initVector = 'D1041B94D49F66DF120AD40269E102EA'
passwordSalt = '15A4D4258940BD492EE2'
dataset = eisync('path/filename.epi7', initVector, passwordSalt, 'password')
Reads a sync file (encrypted XML exported from Epi Info mobile apps) and creates a list of dictionaries called 'dataset'. initVector is the Init Vector that was generated by the mobile device to encrypt the data on the mobile device. passwordSalt is the Salt that was generated by the mobile device to encrypt the data on the mobile device. password is the password that was used to encrypt the data on the mobile device.
This function can only be use with data that was encrypted using the Epi Info iOS app's Custom Keys option. See here for details.
Performs MxN Tables Analyses of Outcome and Exposures, with statistical tests of association.
from epiinfo.TablesAnalysis import *
ivdict = {'outcomeVariable' : 'Ill'}
ivdict['exposureVariables'] = ['ChefSalad', 'Workdays']
ta = TablesAnalysis()
taResults = ta.Run(ivdict, dataset)
'dataset' is the list of dictionaries containing the analysis data. 'ivdict' is a dictionary of analysis variables.
The result of Run() is this dict:
{'Variables': ['Ill * ChefSalad', 'Ill * Workdays'],
'VariableValues': [[[1, 0], [True, False]],
[[1, 0], [0, 1, 2, 3, 4, 5, 6, 7]]],
'Tables': [[[150, 79], [43, 37]],
[[7, 4],
[21, 10],
[21, 12],
[29, 16],
[18, 15],
[25, 9],
[15, 12],
[57, 38]]],
'RowTotals': [[229, 80], [11, 31, 33, 45, 33, 34, 27, 95]],
'ColumnTotals': [[193, 116], [193, 116]],
'Statistics': [{'OR': 1.633794524580512,
'ORLL': 0.974133414274391,
'ORUL': 2.740163215258917,
'MidPOR': 1.6311162097465466,
'MidPORLL': 0.9691190383377568,
'MidPORUL': 2.7422471723180135,
'FisherORLL': 0.9399306300659928,
'FisherORUL': 2.826676338762746,
'MidPExact1Tail': 0.032696255288936306,
'FisherExact1Tail': 0.04214460217937938,
'FisherExact2Tail': 0.08083332652929953,
'RiskRatio': 1.2186452726718797,
'RiskRatioLL': 0.9741280246023387,
'RiskRatioUL': 1.5245391397211574,
'RiskDifference': 11.752183406113536,
'RiskDifferenceLL': -0.789033558172958,
'RiskDifferenceUL': 24.293400370400033,
'UncorrectedX2': 3.4922535822177045,
'UncorrectedX2P': 0.0617,
'MHX2': 3.4809517906894913,
'MHX2P': 0.0621,
'CorrectedX2': 3.0090265314124536,
'CorrectedX2P': 0.0828},
{'ChiSq': 3.922874010476977,
'ChiSqDF': 7,
'ChiSqP': 0.7886,
'FishersExact': 0.7683844417832235}]}
Results are from the fictional Salmonellosis dataset often used in Epi Info training sessions, with the 8-value variable "Workdays" added to demonstrate 2xN analysis.
(MxN Fisher's Exact test is adapted from the 1983 Fortran (FORTRAN 77) routine by Mehta and Patel.)
Contains functions for Logistic Regression analysis. In addition to computing identical results to Epi Info Dashboard on identical datasets, this routine also computes interaction odds ratios and confidents limits for interaction effects that are not also main effects.
from epiinfo.LogisticRegression import *
ivdict = {'Ill' : 'dependvar'} #, 'intercept' : True , 'includemissing' : False} # intercept and includemissing are optional keys
ivdict['exposureVariables'] = ['column0', 'column1', column2, 'column0*column1']
# ivdict['Group'] = 'matchgroupcolumn' # optional for matched case control studies
reg = LogisticRegression()
rslts = reg.doRegression(ivdict, dataset)
'dataset' is the list of dictionaries containing the analysis data. 'ivdict' is a dictionary of analysis variables and options.
rslts.show()
LOGISTIC REGRESSION RESULTS
Variable Coefficient Standard Error Odds Ratio Lower Upper
ChefSalad 1.145 0.3429 3.1424 1.6046 6.1539
EggSaladSandwich 1.0418 0.3146 2.8343 1.53 5.2506
CONSTANT -0.7644 0.3602
Number of Iterations 4
Minus Two Log-Likelihood 393.37
Number of Observations 309
Fit Test Value DF P
Score 14.7777 2 0.0006
Likelihood Ratio 15.5999 2 0.0004
Individual result functions provide greater precision, as well as meaningful odds ratios for interaction terms.
print(rslts.Variables)
print(rslts.Beta)
print(rslts.SE)
print(rslts.OR)
print(rslts.ORLCL)
print(rslts.ORUCL)
print(rslts.Z)
print(rslts.PZ)
print(rslts.Iterations)
print(rslts.MinusTwoLogLikelihood)
print(rslts.CasesIncluded)
print(rslts.Score, rslts.ScoreDF, rslts.ScoreP)
print(rslts.LikelihoodRatio, rslts.LikelihoodRatioDF, rslts.LikelihoodRatioP)
for ior in rslts.InteractionOR:
print(ior)
['ChefSalad', 'EggSaladSandwich', 'CONSTANT']
[1.1449910849486071, 1.041809442441627, -0.7644353230444337]
[0.34290905726619875, 0.31456430262135265, 0.3601766334566801]
[3.1424133419378486, 2.834340954032848]
[1.6046436032972575, 1.5300107079664744]
[6.153865937145245, 5.250609425070682]
[3.3390517418143193, 3.311912488988536, -2.1223901053991483]
[0.0008406490630090172, 0.0009266052827040479, 0.03380499507237897]
4
393.3736390464802
309
14.777703590233484 2 0.0006
15.599932822362803 2 0.0004
Results are from the fictional Salmonellosis dataset often used in Epi Info training sessions.
rslts.showHTML()
Formats rslts.show() as HTML for future use within Epi Info Classic Analysis.
Contains functions for Log-Binomial Regression analysis to provide adjusted risk ratios when the independent variable is binary.
from epiinfo.LogBinomialRegression import *
ivdict = {'Ill' : 'dependvar'} #, 'intercept' : True , 'includemissing' : False} # intercept and includemissing are optional keys
ivdict['exposureVariables'] = ['column0', 'column1', column2, 'column0*column1']
#ivl['StartValues'] = [0.0, 0.0, 0.0, 0.0, -0.45] # Optional user-supplied starting beta values
reg = LogBinomialRegression()
rslts = reg.doRegression(ivdict, dataset)
'dataset' is the list of dictionaries containing the analysis data. 'ivdict' is a dictionary of analysis variables and options.
rslts.show()
LOG BINOMIAL REGRESSION RESULTS
Variable Coefficient Standard Error Risk Ratio Lower Upper
EggSaladSandwich 0.287 0.0863 1.3325 1.1251 1.578
ChefSalad 0.3359 0.1156 1.3992 1.1155 1.755
CONSTANT -0.8579 0.1275
Number of Iterations 7
Log-Likelihood -197.81
Number of Observations 309
Individual result functions provide greater precision, as well as parameter iteration history and meaningful risk ratios for interaction terms.
print(rslts.Variables)
print(rslts.Beta)
print(rslts.SE)
print(rslts.OR)
print(rslts.ORLCL)
print(rslts.ORUCL)
print(rslts.Z)
print(rslts.PZ)
print(rslts.Iterations)
print(rslts.MinusTwoLogLikelihood)
print(rslts.CasesIncluded)
for ior in rslts.InteractionOR:
print(ior)
# To see start values and subsequent beta parameter values in the iteration history.
print()
for ph in rslts.ParameterHistory:
print(ph)
['EggSaladSandwich', 'ChefSalad', 'CONSTANT']
[0.28702656278093225, 0.33587378874771817, -0.8578882747952187]
[0.08629620218172233, 0.11560278700279132, 0.127540265427839]
[1.3324596068382382, 1.399162423625005]
[1.1251193436636948, 1.1154930317062357]
[1.5780091364122923, 1.7549688183079077]
[3.3260625094081475, 2.90541255497247, -6.726411238971455]
[0.0008808217459920762, 0.003667693292066409, 1.7389864495598986e-11]
7
-197.80804698840117
309
[0.015412272950627526, 0.018694620739870826, -0.1434745513345579]
[0.047175480510386114, 0.053313801454389464, -0.2716280030939807]
[0.12025288102095905, 0.13274826883584623, -0.4755035906059664]
[0.22117400493838188, 0.24761248723857526, -0.7059060377582919]
[0.2784656944889591, 0.3214894897321413, -0.8352770236314679]
[0.2868833735148343, 0.33549848896238826, -0.857374553660365]
[0.28702656278093225, 0.33587378874771817, -0.8578882747952187]
Results are from the fictional Salmonellosis dataset often used in Epi Info training sessions.
rslts.showHTML()
Formats rslts.show() as HTML for future use within Epi Info Classic Analysis.
Contains the ComplexSampleTables class, which contains the functions ComplexSampleTables and ComplexSampleFrequencies.
from epiinfo.EICSTables import *
csTables = ComplexSampleTables()
results = csTables.ComplexSampleTables(ivdict, dataset)
results = csTables.ComplexSampleFrequencies(ivdict, dataset)
'dataset' is the list of dictionaries containing the analysis data. 'ivdict' is a dictionary of analysis variables.
Contains the ComplexSampleMeans class, which contains the ComplexSampleMeans function.
from epiinfo.EICSMeans import *
csMeans = ComplexSampleMeans()
results = csMeans.ComplexSampleMeans(ivdict, dataset)
'dataset' is the list of dictionaries containing the analysis data. 'ivdict' is a dictionary of analysis variables.