“Can you really tell the difference?” - My wife, expressing fighting words
This experiment was inspired by an argument with my wife, a stylish but atrocious water filter, and the explosion of start-ups attempting to turn everything you purchase into a subscription service.
About a year ago I was growing tired of our tap water and its overly chlorinated taste. Initially I thought just to buy a Brita, but Brita filters always seemed like something you’d shove in your dorm room mini-fridge and not display on your kitchen counter. I looked around to see if there was anything better out there, and lo and behold, there was a water filter company called Soma with a beautifully designed water filter that seemed to fit the bill. They emphasize how their filters are “plant-based” and “sustainable”, but I just cared about the design. They also put their CEO's head in a circle (along with all their other pictures), which is the universal "Hey Millennials! Our Corporation is Different" indicator, so as a millennial I believed that they believed in their mission statement and just wanted to deliver me an effective and stylish water jug. The initial reviews on Amazon seemed fine, so I went to their website and placed an order.
The filter arrived a few days later and it indeed looked great. I followed all the instructions for prepping the water filter, and then turned on the tap to fill it up. Immediately I was struck by how quickly the filter seemed to be “filtering” the water. My experience with other filters was filling the upper chamber and coming back in five minutes after it slowly dripped through, but the water in this filter seemed to be traveling relatively unimpeded from the upper chamber to the lower. Impressed by the speed at which Soma was able to filter their water, I waited for the flow to stop and poured myself a glass.
It did nothing.
“Can you really tell the difference? I think you’re crazy. They all taste fine to me.”
A statement which I couldn’t refute, as the filter I did have was used and I certainly wasn’t going to order a new one.
A few months later, I got a package in the mail from Soma. It turns out if you order from their site, you agree to sign up to their filter subscription service, where they “helpfully” send and charge you for a new filter every few months. This is part of a larger trend in startups subtly signing you up for subscription services by purchasing their products, a practice pioneered in the 90s by “8 CDs For A Penny!” Columbia House and now adopted online for everything from lingerie to kids clothing to women’s active wear (who is trying to make subscription clothing a thing? Stop trying to make subscription clothing a thing).
Anyway, armed with a fresh Soma filter (one which they touted was improved from their previous filter), I cancelled my subscription and set out to design an experiment to test my ability to distinguish between water types. And hopefully show both how I could tell the difference between filtered and unfiltered water, and potentially show empirically how bad the Soma filtered water tasted.
How do you design an unbiased experiment when you're predisposed to a certain outcome?
I went online, did some research to see what other people had done, and found this post where someone actually tested the chlorine and impurity content of various filters. The important takeaway from that post is that while other similar filters reduced chlorine content by 95%, the Soma filter only reduced it by about half. Their testing methodology was good, but they based their overall decisions on a subjective ranking system that doesn’t emphasize how poorly the Soma filtered water tasted. The Soma filter did indeed filter less chlorine out of the water, but was there a more objective way to show how terrible it tasted? In addition, how do you design an unbiased experiment when you're predisposed to a certain outcome (in this case, the Soma filter does not change the taste of tap water).
I decided to perform a series of blind pairwise comparisons between four types of water: Pur filtered, Soma filtered, tap, and bottled water. The goal was to see how distinguishable each type of water was from each other type. I wouldn’t know what types of water being compared in each round, and the drive to prove my wife that I could indeed distinguish filtered water from tap (and thus scientifically and indisputably win a marital argument, a rare event) would keep me honest. If I showed I could distinguish between types of water, but couldn’t tell the difference between tap and the Soma water, then I objectively showed that the Soma filter did little to change the taste of the water.
For types of water that were indistinguishable, I would only be able to correctly classify them about 50% of the time, by chance. For types of water that were distinguishable, I should be able to correctly categorize a significantly higher percentage of them. Here, I define that percentage as 75%. Along with assuming an acceptable false negative rate (\(\beta\)) of 0.20 and a false positive rate (\(\alpha\)) of 0.05, this sets my minimum required sample size at 23 runs. You can see this in the following figure, where the power (which is 1-\(\beta\)) crosses the 0.80 threshold at 23 runs. There is no formal industry (where “industry” here is a term that loosely means “researchers”) standard for power, usually 80% is a cutoff for an acceptable design size . For an (\(\alpha\)) of 0.05, this means we accept four times as many false negatives as false positives, with the idea that a false negative is usually not as bad as a false positive result.
Usually, power is a monotonically increasing function of sample size, but for a binomial test you get the odd case where a slightly lower number of runs will occasionally have higher power than a design one or two runs larger. This is due to the discrete nature of the binomial distribution.
experimentsize=1:25 alpha=0.05 baseprob = 0.5 thresholdprob = 0.75 power = 1-pbinom(qbinom(1-alpha,experimentsize,baseprob),experimentsize,thresholdprob) poweranalysis = data.frame(experimentsize,power)
(An easy way to calculate sample size if you don’t want to do it by hand is the tool G*Power, which is free and available on most platforms)
Each round now involves filling up 48 cups of water, 23 with water type A and 23 with water type B, along with a taste calibration cup for each type before starting each round. The cups are lined up side-by-side for 23 rounds, and then a random number generator tells my wife how to switch them. Correctly determining when the cups were switched here indicates I successfully distinguished them. Here’s an image of the set-up:
Here, I controlled for several variables. First, I filled all the pitchers the previous night and let the temperature settle to 73 degrees, and confirmed they were all the same with a laser thermometer before the start of the experiment. I had my wife run randomize (with a random number generator) both the cup switches as well as the order the water types were compared. Between each round, I left the room while she filled and arranged the water, and had her hide the pitchers during each round. I dried the cups after each round to remove any traces of the previous water. I also did a control round, where all 48 cups were filled with the same type of water and I was tasked with the (futile) goal of trying to classify them. This would make it less clear to me (as the subject) if two waters tasted the same which two waters I was drinking. Finally, and most importantly, my wife agreed to participate (she's a good sport).
And here are the actual results of the experiment, shown in decreasing order of distinguishability:
We take this data, and then determine the number of successes and see if the number is statistically different than 50%. We calculate the p-values directly, but we can also show this visually in the lower confidence intervals. If the lower confidence interval crossed 50%, the value is not significantly different and we cannot say the water types are significantly different. The upper confidence intervals are not shown because it is a one-sided test (they just stretch to 1).
results = experiment %>% group_by(Comparison) %>% filter(Truth == Data) %>% summarize(successes = n(), pval = binom.test(n(),23,p=0.5,alternative = "greater")$p.value, lowerci = binom.test(n(),23,p=0.5,alternative = "greater")[], upperci = binom.test(n(),23,p=0.5,alternative = "greater")[])
Soma filtered water performed the worst, having a taste statistically indistinguishable from tap water. Bottled water performed the best, being distinguishable from tap 100% of the time. Pur did almost as well against tap, only having two misclassifications. It did even better against Soma filtered water, with only one misclassification. Bottled and Pur filtered water were harder to distinguish, but here it shows there is a difference. In this case, I described the Pur filtered water during the test as “smooth” and the bottled water as “slightly alkaline,” and I actually preferred the Pur water’s taste to the bottled. Here, bottled vs Soma filtered are also statistically not different, but only one more success would have made them. If you look at the actual data above, you can see the first two runs accounted for 2/8 of the errors made in that round. The first couple runs are coming right off the initial calibration cups, so its possible that I was not fully “calibrated” to the taste. In addition, I designed the test with an acceptable false negative rate of 20%. If we assume everything but the Tap vs Soma is distinguishable, one out of five false negatives is within the designed sensitivity of the experiment.
Soma filtered water performed the worst, having a taste statistically indistinguishable from tap water.
Here we look at the actual p-values of all the comparisons:
Looking at the tap water row, it’s obvious here that the Soma filter does little to nothing in improving the taste from ordinary tap water. Their marketing campaign spouts how environmentally friendly their product is, but I doubt the environmental worth of a worthless (but pretty) piece of plastic.
In terms of arbitrary ranking scales: I rate it 0/5 empty water cups. Thanks, Soma.
As for recommendations, the Pur water filter is cheap and the filters themselves don’t cost a lot. It’s still just looks like a water filter and won’t win any style rewards, but it actually does the one job that it’s supposed to do. As for me and my wife’s argument, she no longer thinks I’m crazy for thinking the Soma water tastes bad–now I’m crazy because I spent three hours on a Sunday night sipping glasses of water. ¯\_(ツ)_/¯
Haha thanks! great post, I have been in there offices. Maybe this I
Will create some buzz and they will improve there filters once again, and and auto change you….
Do you speak English?
Too bad you couldn’t have also included Brita in the test.
I would have, but it would have almost doubled the size of the test!
It seems paradoxical that you could distinguish tap from bottle 100% of the time, but you couldn’t seem to tell the difference between tap/soma or bottle/soma even though you state in the article that water from the soma tastes just like tap water, Surely comparing bottled to soma should give comparable results to bottle vs. tap?
Any thoughts on why this isn’t the case?
The author said researchers found Soma filtered about 50% of the chlorine but that still was enough to make it slightly better than tap water.
The issue is that water quality is a spectrum, taste isn’t one dimensional, and comparisons are not transitive.
As a simple toy example: if tap water started at a impurity concentration of 100 ppm and a person could distinguish changes in water of 60 ppm, then Soma reducing the concentration of impurities from 100 ppm to 50 ppm would still leave it indistinguishable from tap water. It would also be indistinguishable from bottled, with zero ppm. However, bottled water with zero ppm would clearly be distinguishable from tap. This shows how comparisons are not necessarily transitive.
More importantly, most human senses are not linear in detection thresholds, so in reality the detection threshold changes depending on the base impurity level. The point of this test was to move away from simply comparing impurity concentrations or chlorine levels and instead focus on the taste, which is a complex multidimensional sense that is hard to quantify but easy to understand in terms of comparisons.
And the last point is that the test was designed with an acceptable 20% false negative rate, so out of the 6 rounds one false negative is well within the design tolerance. Doing the test with more comparisons in the future (and thus gaining more statistical power) would help mitigate this issue.
the real crime is that you need to sit there, pour some water in, wait for it to go through filter then keep filling up. any water filter product tha makes you fill it up more than once per container is useless.
I have one major issue (that you do bring up) and it’s that you can distinguish between tap and bottled water 100% of the time, and can distinguish between tap and Soma a statistically significant amount of time, but not between the bottled and the Soma. Now, you clearly noticed this and brought it up. Your possible explanation is that perhaps you were not yet calibrated as you missed the first two cups. However I do not thing your accuracy changes depending on what cup you are on. If you count the total number of errors across all trials in the first half (1-11) it is actually less than the number of errors in the second half (13-23). I left out #12 since you didn’t have an even number of trials and there was two errors there so even if averaged it would not make a difference.
I’m still pretty impressed with the effort put fourth here but not 100% convinced given this issues.
Glad you enjoyed the read! As to your concern, see my response above to Tom Anderson’s comment. What the issue comes down to is that comparisons are not necessarily transitive, and the test was sized with an allowable 20% false negative rate. My comment about the first two cups being off was not expressing any particular poignant insight, but rather a personal observation that I became more confident in my classifications after a few trials into each round.
Skimming through this I didn’t see you list the municipality that provided the tap water. I would be interested to see how this test would vary across the states. For example, I know when my brother was living in Oakland he would always tell me the water tasted good and they would just filter for impurities, but the water here in San Diego is very hard and tastes much more chlorinated IMHO.
Washington, DC. Surprise: An area famous for being a swamp doesn’t have the greatest tasting tap water.
Part of the issue here may be the type of filtering performed.
For example you’re basing most of your categorization around the presence or lack of chlorine: but there are many other factors in how water tastes: how hard/soft the water is being the obvious one. The Soma filter may be excellent at filtering nasty tasting hard water into something more palatable, as long as the chlorine levels are relatively low to start with.
[…]  http://www.laundryview.com  https://jsoup.org/  http://tylermw.com/soma-water-filters-are-worthless-how-i-used-r-to-win-an-argument-with-my-wife/ […]
How’d you put together the dynamic graphics? I WANT.
Someone who has spoke in length with water filtration experts will tell you the way these charcoal and ion exchange resin filters work is for the water to spend as much time in contact with filtering matter as possible, sounds like the water mover too fast through the filter.
Another issue is that your water likely wasn’t filtered at all, due to poor design of the Soma, both in getting the filter to fit in flushly, and in the bottom being completely blocked so that you can’t even see if the water is coming through the filter or around it. Given how fast your water filtered, the vast majority of it almost certainly went around the filter instead of through it. If the only design flaw was the difficulty in getting the filter go sit flushly, then, like Brita, we could at least see the bottom of the filter itself to see whether the water was coming through or around it. Instead, the soma pitcher inexplicably completely blocks us from seeing the bottom of the filter, showing us only the same 3 holes echoing the bottom of the filter itself, and leaving us no way to see whether the water is actually coming through or around the filter instead. Since yours seemed to filter so fast, I’m sure from my Soma experience that it just didn’t filter at all, and mostly went down the sides of the not-flush seal. When it does ac trip ally filter, with lots of adjustments and trial and error to know how to push in the filter and check the seal, it definitely does taste better in terms of chlorine, though this doesn’t really override its flaws.
Never mind the taste, I bought one of these Soma water filtration pitchers (not cheap) because of its glass pitcher and apparently eco friendly filter. Over a short time the suction areas of the funnel and filter developed black mold which did not come off with any washing. I live in an area with really good water quality though it is a bit acidic and so we do have mitigation for that but when I alerted the company about this mold problem, offered to mail and/or photo the product, they never responded. Surprise!!! I now use the glass pitcher as a vase. This product stinks, both because it does not work and their customer service stinks.
A really good way to blind taste two competing products is the “duo-trio” test.
Fill two glasses with one of the products and one with the other one and mix the glasses. Taste them blind.
It is amazing how often with just two cups you are able to pick the difference but fail on the duo-trio.
I am completely unable to work out the stats for this kind of test, I would love to see a new blog post with such an analysis !
Millennial here, checking up about Soma before purchasing… and now I won’t be! Wasn’t expecting to find such a deep experiment and such a hilarious article. Thank you!
When you buy a SOMA filter, you need to throw the water away the first 2 times. I personally can taste the difference for about 2 days, and then I either get used to the taste or it doesn’t improve the taste. I have to say that the water in Montreal is pretty good tasting to start with.
I’m sad to read this study, I’ve had my soma for a long time. But I have always had my doubts about the filter’s effectiveness and when they speeded up the flow rate
I had my doubts (although it was a pleasure not to wait around for the thing to fill up). I initially disliked that the filter is not recyclable, but rationalized that it was less plastic than a months worth of bottles. Also, as noted, the customer service is not good with filter deliveries erratic. Though I have had no mold issues, from time to time I have seen
green algae. After years I am giving up and will use the lovely carafe for another purpose.
Such an in-depth discussion on water filters. Thanks.
Yea I bought one and stopped using it cuz the fresh taste just doesn’t last past a couple of fill-ups. I gave it 6 months and didn’t feel it was worth the extra cost of buying new filters when they just don’t last long at all. I could clearly taste the water getting worse and worse after the initial great taste. It’s a great concept but not cost-effective at all.