Let's share some knowledge (two ideas this time)... I've been monitoring what pages googlebot visits the most in order to understand what is meant as "quality". Out of 30 top-visited (googlebot visits) URLs on my site that was affected by the latest update, 29 were in the top200 entrance pages from google search (USA mostly - monitoring started on 24th Feb 2011).
So by monitoring what pages googlebot crawls the most, you can determine what is considered good and what's not that good. Pages with zero googlebot traffic are usually the low-value ones.
I may be wrong, but this is what I think - at least my internal statistics show this.
Bear in mind that there may be URL's that you think don't exist. There are scraper sites (and similar) that may point to your site via broken URL's... say your URL is http...something.../index.php?some_parameters, and the scraper site (or whatever) will add some additional parameter to your URL. The google webmaster tools won't show you all errors (I know this already) and this might cause problems for sure. My site contained about 16000 pages that are surely considered low-value and they're slowly disappearing from google's index now...
Also there must be some ratio between "good quality" and "poor quality" pages that triggers that "content farm penalty". Say your site contains 41 pages, out of which 15 were entrance pages from google during the last month; In such a case the ratio is good, and thus no penalty is applied. I have another site with about 30 entrance pages, the site contains about 81 pages - still no penalty. And so on.... But if you're having thousands of "useless" pages in google's index, then most likely you're a victim of penalty.
Just my thoughts... They make sense to me since I operate only one site that has been affected (it contained thousands "non-entrance pages") and other sites (about 10 - all of them have a nice ratio of entrance pages vs. all pages on site) behave normally.