Bugs happen. We do our best to write clear, concise, bug-free, and tested code when possible, but we accept the fact that they are unavoidable. We use Pivotal Tracker to track our work. Our projects are organized into two areas, web and apps. Web being any technology involved with our front end site and database, App being any integration with various applications like PowerPoint, Keynote, or Google Slides or operating systems like Android, iOS, OSX, and Windows.
Whenever a bug is discovered by a team member or reported by a customer, we write up a Pivotal story using a customized bug template. Sometimes bugs are related to multiple projects. In such cases, it’s fine to create a bug for each project to track work in the respective area. After the bug is created, the QA team triages and assigns to the appropriate individual.
Each repo has bug thresholds determined by the QA team and repo admins. They are based on severity labels to ensure the highest priority bugs get addressed first. Thresholds vary across repos depending on their level of bug tolerance.
Thresholds act as a circuit breaker for bugs. We normally don’t let bugs interrupt our development work, due to the high cost of context switching. When a threshold is exceeded, the Product team along with the repo admins discuss the bugs at standup. They may be resolved right away or in a bug sprint.
We have four scheduled bug sprints each quarter that last an entire sprint. The QA team determines the bug sprints and the priority order.
Getting below the threshold is a minimum; if the project is still not below the threshold after the bug sprint, an emergency second sprint will be called.
It’s also good practice for engineers to be proactive about fixing bugs before a threshold is reached as this helps the company to be more empathetic towards our customers needs and also stay on track with the project.
Bug priority naming conventions must be followed to make sure a bug is triaged:
Repo name - The GitHub repo name should be used in the label to associate a bug with a GitHub project so the repo admins can take a look at it. For example, if the source of the bug is the https://github.com/polleverywhere/quebert repo, the label
quebert should be applied to the bug.
Severity - the amount of time we can go before fixing the bug. A time-based approach to severity was chosen because it’s more understandable by customer service, sales, and account management (as opposed to high, medium, low).
criticalshould be fixed immediately by the engineering team, even if that means working evenings and weekends. During stand-up, all critical bugs appear on a dashboard and are statused at least daily. All critical bugs are tracked as part of our downtime.
showstoppershould be fixed immediately (within business hours) by the engineering team. During stand-up, all show stopper bugs appear on a dashboard and are statused at least daily.
weeklabel. Weekly bugs are statused at least weekly.
The one who first discovers a critical bug is responsible for immediately:
Even if the critical bug is identified outside of business hours, work should begin on fixing the problem immediately. The engineers working on a critical bug should not continue working on an epic or other project until the critical bug has been resolved.
It is not necessary to fix the root cause to resolve a critical bug. If something can be done quickly to stop the severe disruption to our users, then the
critical tag can be removed from the bug. For example, if the commit that caused the bug can be safely and quickly reverted, this is a good way to quickly resolve the critical bug. The root cause can be fixed later as a regular bug.
Critical bugs appear on our daily stand-up dashboard. Each critical bug is explicitly statused daily until it is fixed and deployed to production.
Once a critical bug is resolved, the engineer who fixed the issue documents the following in our critical bug tracking spreadsheet:
Over a longer period of time the team works to reduce the number of critical bugs that ship to production and reduce critical bug response times. This is measured by periodically reviewing the aggregate data from the critical bug spreadsheet.
If a bug can’t be reproduced, tag someone from the QA team and they will work with the person or a customer service team member who created the ticket to figure out the details needed to reproduce the ticket.
During the development phase, bugs will happen. We differentiate pre and post-release bugs with separate labels. Pre-release label should be the label associated with the epic outlining the feature work. Epic feature labels are purple in color, to easily differentiate them from our post-release work, which are green in color.
Any bugs which remain after an epic release, need to be relabeled and prioritized in the general process because they are now affecting live customers. It is the responsibility for the PM and engineering lead of the epic to label and prioritize these. It is expected that the vast majority of these bugs (if not all) will be addressed as part of the cooldown sprint
What happens if a bug sits around for a while? QA team along with the repo admins define a threshold for the Week, Month and Eventually labels. If a repo passes its threshold one of two things will happen. The team owning the repo may go directly into a bug sprint. If that team is focused on something time critical, or if there is a big context switching cost, the next viable sprint is chosen to begin a bug sprint.
However, our customer facing teams (Sales, Account Management, Customer Support) are encouraged to advocate to the QA team to expedite and reprioritize a bug for the well being of our customers. Even if a particular repo is beneath the threshold, certain bugs may be introduced into sprint planning.