Information Gain (Weka)
Corpus One ( date=1950-2006 keywords=american verbose="on"): 394
documents (count greater than
5000 words). Corpus Two ( date=1950-2006 keywords=NOT+american verbose="on"): 303 documents(count greater than 5000 words).
User EXCLUDE Feature List Submitted ... generating user exclude feature list from input pattern: color colour honor honour center centre favor favour
Found 9 Features
to exclude: ceńter center centre color colour favor favour honor honour Run name: bldr
Reading raw data files for Corpus One.
Reading raw data files for Corpus Two.
Pruning Instance Feature Vectors..... Excluding features found in more than 557
and less than 139....
From 95494 global features, removed 390 found in
more than 557 documents, 93034 in less than
139 documents. 2070 features
remaining for this classification task.
Done pruning all document vectors. Retained 563197 features. Deleted 950161 features in too many or too few
documents.
Using 2070 features
Running Weka Information Gain selector.
Executing: java -cp /projects/datamining/weka-3-4-8a/weka.jar -Xmx2000m weka.attributeSelection.InfoGainAttributeEval -S "weka.attributeSelection.Ranker" -I /projects/datamining/philomine/runs/bldr/arff/data/bldr.arff > /projects/datamining/philomine/runs/bldr/wekaig/output/bldr.output
Weka Information Gain Results| gonna [c1] [c2] | 0.39341 | | ain' [c1] [c2] | 0.33455 | | folks [c1] [c2] | 0.31313 | | em [c1] [c2] | 0.30981 | | huh [c1] [c2] | 0.28309 | | colored [c1] [c2] | 0.27006 | | yeah [c1] [c2] | 0.22664 | | eh [c1] [c2] | 0.22661 | | toward [c1] [c2] | 0.21629 | | guess [c1] [c2] | 0.20137 | | cause [c1] [c2] | 0.18499 | | wanna [c1] [c2] | 0.16454 | | nothin' [c1] [c2] | 0.16399 | | ass [c1] [c2] | 0.15704 | | kinda [c1] [c2] | 0.15684 | | pretty [c1] [c2] | 0.15666 | | outta [c1] [c2] | 0.15039 | | gotta [c1] [c2] | 0.14956 | | baby [c1] [c2] | 0.14357 | | talkin' [c1] [c2] | 0.13851 | | doin' [c1] [c2] | 0.13703 | | shall [c1] [c2] | 0.13545 | | wasn' [c1] [c2] | 0.13449 | | cannot [c1] [c2] | 0.13113 | | naw [c1] [c2] | 0.1298 | | negro [c1] [c2] | 0.12816 | | towards [c1] [c2] | 0.12587 | | y' [c1] [c2] | 0.12428 | | honey [c1] [c2] | 0.12061 | | daddy [c1] [c2] | 0.12012 | | comin' [c1] [c2] | 0.1166 | | neighborhood [c1] [c2] | 0.11298 | | somethin' [c1] [c2] | 0.11087 | | tryin' [c1] [c2] | 0.10975 | | goin' [c1] [c2] | 0.10868 | | stuff [c1] [c2] | 0.10788 | | lookin' [c1] [c2] | 0.10781 | | til [c1] [c2] | 0.10545 | | probably [c1] [c2] | 0.10517 | | they' [c1] [c2] | 0.10486 | | blues [c1] [c2] | 0.10482 | | gettin' [c1] [c2] | 0.10417 | | uh [c1] [c2] | 0.10302 | | she' [c1] [c2] | 0.09902 | | figured [c1] [c2] | 0.09608 | | somebody [c1] [c2] | 0.09569 | | aw [c1] [c2] | 0.09493 | | goat [c1] [c2] | 0.09275 | | awhile [c1] [c2] | 0.09123 | | gotten [c1] [c2] | 0.08939 | | couldn' [c1] [c2] | 0.08721 | | nigger [c1] [c2] | 0.08556 | | negroes [c1] [c2] | 0.08497 | | shit [c1] [c2] | 0.08408 | | whom [c1] [c2] | 0.08331 | | scared [c1] [c2] | 0.08316 | | bet [c1] [c2] | 0.08225 | | stairs [c1] [c2] | 0.08154 | | hurt [c1] [c2] | 0.08084 | | crosses [c1] [c2] | 0.08077 | | who' [c1] [c2] | 0.08039 | | village [c1] [c2] | 0.08026 | | wives [c1] [c2] | 0.07957 | | dig [c1] [c2] | 0.07917 | | somebody' [c1] [c2] | 0.07856 | | crazy [c1] [c2] | 0.07728 | | supposed [c1] [c2] | 0.076 | | chief [c1] [c2] | 0.0736 | | hit [c1] [c2] | 0.07313 | | damn [c1] [c2] | 0.07263 | | bout [c1] [c2] | 0.07222 | | started [c1] [c2] | 0.06817 | | dollars [c1] [c2] | 0.06811 | | nerve [c1] [c2] | 0.06723 | | liked [c1] [c2] | 0.06635 | | fine [c1] [c2] | 0.06562 | | sounds [c1] [c2] | 0.06486 | | bush [c1] [c2] | 0.06422 | | southern [c1] [c2] | 0.06412 | | market [c1] [c2] | 0.06374 | | street [c1] [c2] | 0.06367 | | how' [c1] [c2] | 0.06303 | | goddamn [c1] [c2] | 0.06295 | | kids [c1] [c2] | 0.06167 | | beat [c1] [c2] | 0.06163 | | starts [c1] [c2] | 0.06028 | | mere [c1] [c2] | 0.06026 | | check [c1] [c2] | 0.06019 | | kiss [c1] [c2] | 0.06 | | ole [c1] [c2] | 0.05985 | | beg [c1] [c2] | 0.05939 | | niggers [c1] [c2] | 0.0593 | | fix [c1] [c2] | 0.05886 | | york [c1] [c2] | 0.05818 | | washington [c1] [c2] | 0.05788 | | gods [c1] [c2] | 0.05705 | | matters [c1] [c2] | 0.05649 | | palm [c1] [c2] | 0.05649 | | nobody' [c1] [c2] | 0.05522 | | quit [c1] [c2] | 0.05498 | | american [c1] [c2] | 0.0547 | | fuck [c1] [c2] | 0.05343 | | hey [c1] [c2] | 0.05313 | | n [c1] [c2] | 0.05265 | | sometimes [c1] [c2] | 0.05231 | | miss [c1] [c2] | 0.05197 | | idiot [c1] [c2] | 0.05156 | | bullshit [c1] [c2] | 0.05148 | | reaches [c1] [c2] | 0.05139 | | dim [c1] [c2] | 0.05078 | | mess [c1] [c2] | 0.05049 | | hip [c1] [c2] | 0.04966 | | hung [c1] [c2] | 0.04947 | | blue [c1] [c2] | 0.04883 | | return [c1] [c2] | 0.04882 | | bust [c1] [c2] | 0.04854 | | fathers [c1] [c2] | 0.04801 | | apartment [c1] [c2] | 0.04794 | | dollar [c1] [c2] | 0.04793 | | brown [c1] [c2] | 0.04776 | | kid [c1] [c2] | 0.04749 | | joint [c1] [c2] | 0.04746 | | where' [c1] [c2] | 0.04726 | | low [c1] [c2] | 0.04696 | | looked [c1] [c2] | 0.04652 | | needed [c1] [c2] | 0.04649 | | learned [c1] [c2] | 0.04643 | | message [c1] [c2] | 0.04624 | | minute [c1] [c2] | 0.04614 | | whose [c1] [c2] | 0.04612 | | exits [c1] [c2] | 0.04609 | | coat [c1] [c2] | 0.04585 | | loose [c1] [c2] | 0.04584 | | lay [c1] [c2] | 0.04524 | | properly [c1] [c2] | 0.04487 | | butt [c1] [c2] | 0.04464 | | mama [c1] [c2] | 0.04424 | | anymore [c1] [c2] | 0.04412 | | arrive [c1] [c2] | 0.04402 | | store [c1] [c2] | 0.04381 | | government [c1] [c2] | 0.04371 | | ought [c1] [c2] | 0.0433 | | bill [c1] [c2] | 0.04315 | | phone [c1] [c2] | 0.04253 | | top [c1] [c2] | 0.04195 | | wealth [c1] [c2] | 0.04187 | | bitch [c1] [c2] | 0.0414 | | burned [c1] [c2] | 0.04107 | | ma' [c1] [c2] | 0.04107 | | upstairs [c1] [c2] | 0.04099 | | sacrifice [c1] [c2] | 0.04092 | | south [c1] [c2] | 0.04061 | | remain [c1] [c2] | 0.04014 | | preacher [c1] [c2] | 0.04 | | laid [c1] [c2] | 0.0397 | | behave [c1] [c2] | 0.03963 | | line [c1] [c2] | 0.03937 | | therefore [c1] [c2] | 0.03936 | | dumb [c1] [c2] | 0.03913 | | leaves [c1] [c2] | 0.03909 | | starting [c1] [c2] | 0.03862 | | silence [c1] [c2] | 0.03844 | | insult [c1] [c2] | 0.0384 | | college [c1] [c2] | 0.03837 | | hat [c1] [c2] | 0.03824 | | force [c1] [c2] | 0.03801 | | purse [c1] [c2] | 0.03787 | | bills [c1] [c2] | 0.03773 | | barely [c1] [c2] | 0.03766 | | pulls [c1] [c2] | 0.03758 | | obey [c1] [c2] | 0.03748 | | sweet [c1] [c2] | 0.03747 | | block [c1] [c2] | 0.03744 | | hadn' [c1] [c2] | 0.03742 | | brings [c1] [c2] | 0.03726 | | difficult [c1] [c2] | 0.03713 | | forth [c1] [c2] | 0.03713 | | greet [c1] [c2] | 0.03709 | | bloody [c1] [c2] | 0.03706 | | straighten [c1] [c2] | 0.037 | | search [c1] [c2] | 0.03694 | | worked [c1] [c2] | 0.03687 | | window [c1] [c2] | 0.03663 | | forest [c1] [c2] | 0.03656 | | hangs [c1] [c2] | 0.03639 | | offstage [c1] [c2] | 0.03585 | | o [c1] [c2] | 0.03562 | | slightly [c1] [c2] | 0.03543 | | backs [c1] [c2] | 0.03534 | | pants [c1] [c2] | 0.03502 | | upstage [c1] [c2] | 0.035 | | broke [c1] [c2] | 0.03485 | | smart [c1] [c2] | 0.03472 | | ancestors [c1] [c2] | 0.03459 | | although [c1] [c2] | 0.03442 | | round [c1] [c2] | 0.03442 | | couple [c1] [c2] | 0.0341 | | ball [c1] [c2] | 0.03408 | | named [c1] [c2] | 0.03403 | | fun [c1] [c2] | 0.03399 |
|