In the previous 3 lectures in post 1 & post 2, I have summarized the first 3 lectures of the course A/B testing by Google on Udacity. Briefly, the first post reminds about statistic knowledge specifically binomial distribution. Also policy and ethical concern should be considered when running and designing experiment. And the second post is about three factors to consider when choosing & characterizing metrics. In this post, I will continue four factors in designing an experiment.
Four factors to consider when designing an experiment: Choose subject, population, Size and Duration vs Exposure.
1. Choose subject:
When choosing subject for our experiment, we need one unit to represent the subject of our experiment. It is called unit of diversion. Unit of diversion is used to define which user or event is assigned to control or experiment group. The choice of unit of diversion depends on three important considerations:
- User consistency: If we are dealing with a user-visible change, we would want our user to have consistent experience throughout experiment. So a user_id or a cookie would be a good choice, where as if the change that we are trying to implement, is not user visible, then an event-based diversion like a pageview would make more sense. This is important, because if we use pageviews as a unit of diversion for a user visible change, then every time the user reloads the page, they might be assigned to a new group, i.e. if the user was initially in the experiment group, now they may end up in the control group.
- Ethical considerations: Since actual people are involved in the experiment as experimental units, it’s very important to give careful considerations to the ethics of the experiment. Some ethical considerations are risk, benefit and privacy. If the risk exceeds the threshold for the minimal risk, i.e. it encompasses physical, psychological, emotional, social or economic concerns, then getting an informed consent becomes crucial. If the users would benefit post completion of the study, then stating the benefits is important. If the internal processes for collections of new data are well in place, then privacy won’t be a huge issue, but if not, additional safety measure would be needed.
2. Choose population
2.1. Decide about population: Inter- vs intra- user experience
When choosing population we have two kind of user experiment: inter- vs intra-user. Intra-user is same user to the same group. It means intra-user presents in both control and experiment group. Inter-user reversely lets different people in both group.
2.2. Who are your target user?
There are cases you define target user in advance such as you do not want to overlap with other running experiment, see the effect in each segment etc. Some cases that you do not want to define target user in advance.
Sometimes you want to choose cohort instead of population. Population is whole group of user. Within that population you can define what’s called cohort. Cohort is when you lock users based on geographic region, device, browser, time access etc.
When use cohort instead of population:
- Looking for learning effect
- Examine user retention
- Want to increase user activity
- Anything requiring user to be established
Variability affects on size.
When you do not have enough population to run experiment. There are ways to reduce size:
Increase dmin, alpha or beta, change unit of diversion, target experiment
+ alpha level is the chance that you will find a significant result when one does not exist.
+ 1-beta or statistical power is rejecting the null hypothesis when it should be rejected.
4. Duration vs Exposure
When you want to run? Weekend, weekday, holidays etc. Traffic volume might different for each day.
How many percentage of traffic I want to run? Some experiment is risky then it should limit the exposure.
How long is the duration? For safety reason we might run experiment on subset of population and extending the experiment running time to collect enough of user.
When you apply some changes, users take time to adapt to your change. 2 controversial attitude towards the change: change aversion — when user first see the change they like, they can react to like it or hate it but after the change, they are probably plateau to a very different behavior. Learning effect should be considered for many reasons:
- Unit of diversion — when we choose unit of diversion is user-id or cookie
- How long and how often users see the changes.
- For safety reason as above refer to, as we make bigger change, it potentially lead to bigger risk and it might be harder for users to adapt.
You can run retrospective analysis using A/A test before and after the experiment to detect the learning effect. If you detect some difference, you can know for sure it caused by learning effect difference in experiment period.