Welcome to PhiloLogic  
   home |  the ARTFL project |  download |  documentation |  sample databases |   

Information Gain (Weka)

Corpus One ( date=1950-2006 keywords=american verbose="on"): 394 documents (count greater than 5000 words).
Corpus Two ( date=1950-2006 keywords=NOT+american verbose="on"): 303 documents(count greater than 5000 words).

User EXCLUDE Feature List Submitted ... generating user exclude feature list from input pattern:
color colour honor honour center centre favor favour

Found 9 Features to exclude:

ceńter center centre color colour favor favour honor honour

Run name: bldr

Reading raw data files for Corpus One.
Reading raw data files for Corpus Two.

Pruning Instance Feature Vectors.....
Excluding features found in more than 557 and less than 139....
From 95494 global features, removed 390 found in more than 557 documents, 93034 in less than 139 documents.
2070 features remaining for this classification task.
Done pruning all document vectors. Retained 563197 features.
Deleted 950161 features in too many or too few documents.

Using 2070 features


Running Weka Information Gain selector.

Executing: java -cp /projects/datamining/weka-3-4-8a/weka.jar -Xmx2000m weka.attributeSelection.InfoGainAttributeEval -S "weka.attributeSelection.Ranker" -I /projects/datamining/philomine/runs/bldr/arff/data/bldr.arff > /projects/datamining/philomine/runs/bldr/wekaig/output/bldr.output


Weka Information Gain Results

gonna [c1] [c2]0.39341
ain' [c1] [c2]0.33455
folks [c1] [c2]0.31313
em [c1] [c2]0.30981
huh [c1] [c2]0.28309
colored [c1] [c2]0.27006
yeah [c1] [c2]0.22664
eh [c1] [c2]0.22661
toward [c1] [c2]0.21629
guess [c1] [c2]0.20137
cause [c1] [c2]0.18499
wanna [c1] [c2]0.16454
nothin' [c1] [c2]0.16399
ass [c1] [c2]0.15704
kinda [c1] [c2]0.15684
pretty [c1] [c2]0.15666
outta [c1] [c2]0.15039
gotta [c1] [c2]0.14956
baby [c1] [c2]0.14357
talkin' [c1] [c2]0.13851
doin' [c1] [c2]0.13703
shall [c1] [c2]0.13545
wasn' [c1] [c2]0.13449
cannot [c1] [c2]0.13113
naw [c1] [c2]0.1298
negro [c1] [c2]0.12816
towards [c1] [c2]0.12587
y' [c1] [c2]0.12428
honey [c1] [c2]0.12061
daddy [c1] [c2]0.12012
comin' [c1] [c2]0.1166
neighborhood [c1] [c2]0.11298
somethin' [c1] [c2]0.11087
tryin' [c1] [c2]0.10975
goin' [c1] [c2]0.10868
stuff [c1] [c2]0.10788
lookin' [c1] [c2]0.10781
til [c1] [c2]0.10545
probably [c1] [c2]0.10517
they' [c1] [c2]0.10486
blues [c1] [c2]0.10482
gettin' [c1] [c2]0.10417
uh [c1] [c2]0.10302
she' [c1] [c2]0.09902
figured [c1] [c2]0.09608
somebody [c1] [c2]0.09569
aw [c1] [c2]0.09493
goat [c1] [c2]0.09275
awhile [c1] [c2]0.09123
gotten [c1] [c2]0.08939
couldn' [c1] [c2]0.08721
nigger [c1] [c2]0.08556
negroes [c1] [c2]0.08497
shit [c1] [c2]0.08408
whom [c1] [c2]0.08331
scared [c1] [c2]0.08316
bet [c1] [c2]0.08225
stairs [c1] [c2]0.08154
hurt [c1] [c2]0.08084
crosses [c1] [c2]0.08077
who' [c1] [c2]0.08039
village [c1] [c2]0.08026
wives [c1] [c2]0.07957
dig [c1] [c2]0.07917
somebody' [c1] [c2]0.07856
crazy [c1] [c2]0.07728
supposed [c1] [c2]0.076
chief [c1] [c2]0.0736
hit [c1] [c2]0.07313
damn [c1] [c2]0.07263
bout [c1] [c2]0.07222
started [c1] [c2]0.06817
dollars [c1] [c2]0.06811
nerve [c1] [c2]0.06723
liked [c1] [c2]0.06635
fine [c1] [c2]0.06562
sounds [c1] [c2]0.06486
bush [c1] [c2]0.06422
southern [c1] [c2]0.06412
market [c1] [c2]0.06374
street [c1] [c2]0.06367
how' [c1] [c2]0.06303
goddamn [c1] [c2]0.06295
kids [c1] [c2]0.06167
beat [c1] [c2]0.06163
starts [c1] [c2]0.06028
mere [c1] [c2]0.06026
check [c1] [c2]0.06019
kiss [c1] [c2]0.06
ole [c1] [c2]0.05985
beg [c1] [c2]0.05939
niggers [c1] [c2]0.0593
fix [c1] [c2]0.05886
york [c1] [c2]0.05818
washington [c1] [c2]0.05788
gods [c1] [c2]0.05705
matters [c1] [c2]0.05649
palm [c1] [c2]0.05649
nobody' [c1] [c2]0.05522
quit [c1] [c2]0.05498
american [c1] [c2]0.0547
fuck [c1] [c2]0.05343
hey [c1] [c2]0.05313
n [c1] [c2]0.05265
sometimes [c1] [c2]0.05231
miss [c1] [c2]0.05197
idiot [c1] [c2]0.05156
bullshit [c1] [c2]0.05148
reaches [c1] [c2]0.05139
dim [c1] [c2]0.05078
mess [c1] [c2]0.05049
hip [c1] [c2]0.04966
hung [c1] [c2]0.04947
blue [c1] [c2]0.04883
return [c1] [c2]0.04882
bust [c1] [c2]0.04854
fathers [c1] [c2]0.04801
apartment [c1] [c2]0.04794
dollar [c1] [c2]0.04793
brown [c1] [c2]0.04776
kid [c1] [c2]0.04749
joint [c1] [c2]0.04746
where' [c1] [c2]0.04726
low [c1] [c2]0.04696
looked [c1] [c2]0.04652
needed [c1] [c2]0.04649
learned [c1] [c2]0.04643
message [c1] [c2]0.04624
minute [c1] [c2]0.04614
whose [c1] [c2]0.04612
exits [c1] [c2]0.04609
coat [c1] [c2]0.04585
loose [c1] [c2]0.04584
lay [c1] [c2]0.04524
properly [c1] [c2]0.04487
butt [c1] [c2]0.04464
mama [c1] [c2]0.04424
anymore [c1] [c2]0.04412
arrive [c1] [c2]0.04402
store [c1] [c2]0.04381
government [c1] [c2]0.04371
ought [c1] [c2]0.0433
bill [c1] [c2]0.04315
phone [c1] [c2]0.04253
top [c1] [c2]0.04195
wealth [c1] [c2]0.04187
bitch [c1] [c2]0.0414
burned [c1] [c2]0.04107
ma' [c1] [c2]0.04107
upstairs [c1] [c2]0.04099
sacrifice [c1] [c2]0.04092
south [c1] [c2]0.04061
remain [c1] [c2]0.04014
preacher [c1] [c2]0.04
laid [c1] [c2]0.0397
behave [c1] [c2]0.03963
line [c1] [c2]0.03937
therefore [c1] [c2]0.03936
dumb [c1] [c2]0.03913
leaves [c1] [c2]0.03909
starting [c1] [c2]0.03862
silence [c1] [c2]0.03844
insult [c1] [c2]0.0384
college [c1] [c2]0.03837
hat [c1] [c2]0.03824
force [c1] [c2]0.03801
purse [c1] [c2]0.03787
bills [c1] [c2]0.03773
barely [c1] [c2]0.03766
pulls [c1] [c2]0.03758
obey [c1] [c2]0.03748
sweet [c1] [c2]0.03747
block [c1] [c2]0.03744
hadn' [c1] [c2]0.03742
brings [c1] [c2]0.03726
difficult [c1] [c2]0.03713
forth [c1] [c2]0.03713
greet [c1] [c2]0.03709
bloody [c1] [c2]0.03706
straighten [c1] [c2]0.037
search [c1] [c2]0.03694
worked [c1] [c2]0.03687
window [c1] [c2]0.03663
forest [c1] [c2]0.03656
hangs [c1] [c2]0.03639
offstage [c1] [c2]0.03585
o [c1] [c2]0.03562
slightly [c1] [c2]0.03543
backs [c1] [c2]0.03534
pants [c1] [c2]0.03502
upstage [c1] [c2]0.035
broke [c1] [c2]0.03485
smart [c1] [c2]0.03472
ancestors [c1] [c2]0.03459
although [c1] [c2]0.03442
round [c1] [c2]0.03442
couple [c1] [c2]0.0341
ball [c1] [c2]0.03408
named [c1] [c2]0.03403
fun [c1] [c2]0.03399