As promised, I have tried dividing Prosper's Pre-2008 loans into two randomly-chosen halves and run the analysis on them. My findings for words that perform poorly are pretty similar to my original findings. Words that perform well aren't as clear cut. Because of that, I'm not going to post the well-performing words here. For the curious, feel free to take a look at the post on my blog here:
http://lendingtuber.blogspot.com/2011/09/bad-and-good-words-revisited.html.
I will, however, post the words that performed poorly (taken from the same post):
Percent Paid (By Loan): Is the percent of loans, containing the indicated word at least once, which finished with a status Paid.
Percent Paid (By Word): Is the percent of time that a loan ended with the status paid, weighted by the frequency of the word in the listing. (For example, a loan with a title "Help, help, help, help!" which did not pay would count four times more than a loan with "Help" listed only once.)
Word Count: The number of listings containing the word at least once. (Notably
not the total number of times the word was used--the maximum here is once per listing.)
Group 1 Worst Performing WordsWord | Percent Paid (By Loan) | Percent Paid (By Word) | Word Count |
(average Paid) | 61.1% | | |
payday | 38.9% | 38.6% | 596 |
behind | 42.9% | 43.6% | 592 |
mother | 43.5% | 44.8% | 566 |
chance | 44.5% | 42.1% | 631 |
track | 46.8% | 45.6% | 581 |
son | 47.1% | 44.8% | 597 |
daughter | 48.1% | 46.3% | 516 |
child | 48.7% | 47.9% | 520 |
husband | 49% | 51.3% | 896 |
single | 49.5% | 49.7% | 707 |
Group 2 Worst Performing WordsWord | Percent Paid (By Loan) | Percent Paid (By Word) | Word Count |
(average Paid) | 59.6% | | |
payday | 37.5% | 39.2% | 595 |
behind | 42.4% | 41.3% | 566 |
chance | 43.5% | 41.5% | 575 |
son | 45.7% | 44.2% | 514 |
mother | 46.6% | 46.7% | 601 |
children | 47% | 45.6% | 854 |
daughter | 47.7% | 44.6% | 539 |
DELETED | 47.7% | 46.6% | 507 |
child | 47.8% | 46.3% | 552 |
30000 | 48.3% | 47.6% | 532 |
We see the word 'payday' at the bottom, with the words 'behind', 'chance' and then family words like 'mother', 'child', etc. to be on the bottom for both groups -- very similar to my original findings.