Are ebook Samples really Useful?

Why Did I Do This?

One of the biggest problems with books these days – and I guess I really mean ebooks – is there’s just too much freaking choice. The rise of self-publishing is undoubtedly a good thing, it means that anyone and everyone can get their words online and into a form you can conveniently download onto your phone, tablet or ereader device. But not everyone and anyone can write, or has something interesting to say, or can use a spell-checker apparently. And that’s before we get into issues of taste and preference.

One of the tools that sites like Amazon use to counter this problem – along with ratings and reviews – is the availability of free samples. Basically every ebook available from Amazon also has a sample – usually the first chapter or so – that you can download for free. A try-before-you-buy option with no commitment. Good idea huh?

Yes. Well, I mean I think so in principle but I seem to almost never use them in practice. This post will be partly about why that is. Maybe.

However the thing that really inspired this post was when samples are used in the recurring arguments over the relative quality of indies versus trad-published books. This is a sub-section of an argument about quality and it basically says that even if there is a lot of unreadable junk out there it’s possible to find the “gems” by using, amongst other things, samples.

Let’s just say I’m sceptical about this – surely it simply takes too much time to read samples to use them as anything other than a final filter? But that’s a gut reaction. So I thought I’d test it. Sort of.

What did I do?

I decided to throw a few numbers together and see what came out.

On the 16th August 2012 I went to amazon.co.uk and I looked at the available fiction ebooks (I almost never read non-fiction). I read mostly from the following genres (Amazon’s categories) SciFi, Fantasy, Crime & Thrillers and Action & Adventure. I looked for a “comedy” category but although I found “humour” as a category for paper books I didn’t for the Kindle store. Also that included non-fiction humour – books of essays and memoirs and so on – which I’m less inclined to read.

Anyway here’s a list of how many titles there were:

Genre	Total
Action & Adventure	38,375
Crime & Thrillers	74,605
Fantasy	38,790
SciFi	33,904
All four	185,674
SciFi/Fantasy	72,694
All Fiction	561,178

Clearly, even without further analysis that’s too many books. Fortunately Amazon gives me lots of ways to filter these. I can look at just the ones with a 4star or higher review average (I want to read the good ones right?), or the ones which came out in the last 30days (let’s assume I check regularly) or I could look at what’s about to come out. Or combine two or more of these.

Genre	Total	4star	30days	Coming Soon	4star+30
Action & Adventure	38,375	4,508	1,435	70	70
Crime & Thrillers	74,605	12,987	3,035	509	250
Fantasy	38,790	6,383	1,813	178	136
SciFi	33,904	4,102	1,427	102	10
All four	185,674	27,980	7,710	859	466
SciFi/Fantasy	72,694	10,485	3,240	280	146
All Fiction	561,178	67,690	22,813	3,253	1,409

Now some of those numbers look less scary but what do they mean in terms of reading samples?

What did I assume?

I needed to make an mathematical model (i.e. a spreadsheet) and for that I need some generalisations or assumptions.

First let’s assume that it takes me on average 5mins to read a sample. Sample sizes vary but I am a slow reader so I think this is on the low end but that will favour the proposition that samples are a good way to filter.

So let’s plug that into our model and here’s the time taken to read all those samples:

Genre	Total	4star	30days	Coming Soon	4star+30
Action & Adventure	133d 5h55m	16d 15h40m	5d 23h35m	5h50m	5h50m
Crime & Thrillers	259d 1h05m	45d 2h15m	11d 12h55m	2d 18h25m	1d 20h50m
Fantasy	135d 16h20m	22d 3h55m	6d 7h05m	1d 14h15m	11h20m
SciFi	118d 17h20m	14d 5h50m	5d 5h22m	8h30m	50m

All four	645d 16h50m	97d 3h40m	27d 18h30m	3d 23h35m	2d 14h50m

SciFi/Fantasy	252d 9h50m	36d 9h45m	11d 6h00m	1d 23h20m	1d 12h10m

All Fiction	1949d 12h50m	235d 0h50m	79d 5h05m	11d 7h05m	5d 21h25m

Whoops! The power of multiplication has turned what had seemed reasonable book numbers into to unreasonable lengths of time. I’m clearly not going to spend days (or months, years!) reading samples to decide my next “full” book read. About the only thing that seems reasonable is 4star SciFi from the last 30 days.

How did I refine the model? (assumptions #2)

OK so I’ve got some numbers now but are they at all useful? Would any sane person really trying to read all the samples from a particular category? Probably not. We can refine the model with a couple of additional assumptions. Let’s say I go to Amazon and look at the list of my particular category – it shows me them in pages of 12 where I get the book covers, titles and authors. Probably what I would do is page through this list and click on a few likely looking ones and read the blurb and if that didn’t immediately disqualify itself I’d then download the sample.

So let’s assume it takes 5seconds to scan each page of 12 book titles and covers.

Let’s assume that for any list 10% are worth reading the blurb and that it takes 15seconds to skim-read the blurb.

Remember this is based on testing the idea that samples are actually the way to go so the blurb-reading is really to confirm that the cover/title has given the correct impression as regards genre and probable content.

Finally let’s assume that we commit to read the samples of half the ones where we read the blurb i.e. 5% of the list overall.

Plugging those numbers in to our new model the overall time take per list is:

Genre	Total	4star	30days	Coming Soon	4star+30
Action & Adventure	8d 12h19m	1d 21h11m	6h44m	19m	19m
Crime & Thrillers	15d 14h34m	3d 13h01m	1d 14h15m	2h23m	1d10m
Fantasy	8d 14h16m	1d 5h59m	8h31m	50m	38m
SciFi	7d 15h19m	1d 19h16m	6h42m	28m	2m

All four	36d 8h29m	5d 11h28m	2d 12h13m	4h02m	2h11m

SciFi/Fantasy	14d 5h35m	2d 1h16m	1d 15h13m	1h18m	41m

All Fiction	110d 21h01m	13d 6h04m	4d 11h12m	1d 15h17m	6h37m

Still a lot of large numbers there. I’m automatically rejecting anything over a day. However an hour and a half to check out upcoming SciFi/Fantasy seems doable, as does a couple of hours to review the 4star+ books in my favourite genres from the past 30 days.

So, whilst the numbers overall confirm my gut instinct, limit the scope a little and it may actually be a viable method.

Hold on a second your model is wrong because…

I can think of two main reasons someone may object to the way I’ve set this up:

The numbers in your assumptions are wrong. Obviously it’s true that if we vary these numbers we can come out with different answers. All I can say is I think the assumptions are roughly true for me and I’ve tried to err on the side that would lessen time taken so that I’m giving sampling as a method a fair chance.
In reality, no-one would do it that way. Clearly when you have a nice simple equation you can plug whatever numbers you like in and get the answer. A human being however would react differently given 10 books to sample rather than 10,000. In other words the assumptions don’t scale. I think this is true. I think that the larger the number of books you have the more you would want to use other filters first OR the more likely you are to simply bail out early i.e. read the first 25 samples say, and pick the best of those. However I think the numbers are still useful because they show the difficulty of getting your book read, based on sampling alone, if it’s lower down that list. Which I think just confirms what indie authors already know which is the importance of getting as may good reviews, ratings and getting as high up those popularity lists as possible.

Have I learnt anything?

I think so. I had assumed that if I wanted to find something new to read I should follow the usual routes – reviews from trusted sources and recommendations from family/friends – methods which haven’t changed since I started reading (well before the advent of ebooks). I hadn’t expected sampling would help because I hadn’t expected that the numbers would ever dip to low enough levels to be reasonable. Turns out that may not be true and scanning the latest 4star books in my chosen genres once a month for samples might be a worthwhile investment.

Or not. Because intellectually I can see the merit. Psychologically an hour reading samples when I could be reading my next book seems like an hour wasted.