Bugs

Bugs happen. We do our best to write clear, concise, bug-free, and tested code when possible, but we accept the fact that they are unavoidable. We use Pivotal Tracker to track our work. Our projects are organized into two areas, web and apps. Web being any technology involved with our front end site and database, App being any integration with various applications like PowerPoint, Keynote, or Google Slides or operating systems like Android, iOS, OSX, and Windows.

Creating bugs

Whenever a bug is discovered by a team member or reported by a customer, we write up a Pivotal story using a customized bug template. Sometimes bugs are related to multiple projects. In such cases, it’s fine to create a bug for each project to track work in the respective area. After the bug is created, the QA team triages and assigns to the appropriate individual.

Bug sprints & thresholds

Each repo has bug thresholds determined by the QA team and repo admins. They are based on severity labels to ensure the highest priority bugs get addressed first. Thresholds vary across repos depending on their level of bug tolerance.

Thresholds act as a circuit breaker for bugs. We normally don’t let bugs interrupt our development work, due to the high cost of context switching. When a threshold is exceeded, the Product team along with the repo admins discuss the bugs at standup. They may be resolved right away or in a bug sprint.

We have four scheduled bug sprints each quarter that last an entire sprint. The QA team determines the bug sprints and the priority order.

Getting below the threshold is a minimum; if the project is still not below the threshold after the bug sprint, an emergency second sprint will be called.

It’s also good practice for engineers to be proactive about fixing bugs before a threshold is reached as this helps the company to be more empathetic towards our customers needs and also stay on track with the project.

Severity labels

Bug priority naming conventions must be followed to make sure a bug is triaged:

  • Repo name - The GitHub repo name should be used in the label to associate a bug with a GitHub project so the repo admins can take a look at it. For example, if the source of the bug is the https://github.com/polleverywhere/quebert repo, the label quebert should be applied to the bug.

  • Severity - the amount of time we can go before fixing the bug. A time-based approach to severity was chosen because it’s more understandable by customer service, sales, and account management (as opposed to high, medium, low).

    • Critical - A critical bug is one that causes severe disruption to our users. Bugs tagged critical should be fixed immediately by the engineering team, even if that means working evenings and weekends. During stand-up, all critical bugs appear on a dashboard and are statused at least daily. All critical bugs are tracked as part of our downtime.
    • Show stopper - A show stopper bug is one that has risk of causing severe disruption to our users. Bugs tagged showstopper should be fixed immediately (within business hours) by the engineering team. During stand-up, all show stopper bugs appear on a dashboard and are statused at least daily.
    • Week - If the bug is a constant source of pain for customers, they will likely tag it with the week label. Weekly bugs are statused at least weekly.
    • Month - A monthly bug is a minor defect that only impacts a few customers. This could be something as small as an alignment issue in a feature that’s buried deep within the product. These bugs are statused at least weekly.
    • Eventually - Nice to have, lesser important bugs are typically tagged with “eventually” and regularly reviewed.
    • Epic name - If the bug is discovered during the pre-release development phase, add a label that matches the epic tracking that feature. Don’t tag it with the repo name

Critical bugs

The one who first discovers a critical bug is responsible for immediately:

  1. Creating a Pivotal Tracker story with the tag critical
  2. Finding QA or a PM and a engineer who can help fix the issue
  3. Notifying all customer facing employees that there is an open issue affecting production. Usually via email.

Even if the critical bug is identified outside of business hours, work should begin on fixing the problem immediately. The engineers working on a critical bug should not continue working on an epic or other project until the critical bug has been resolved.

It is not necessary to fix the root cause to resolve a critical bug. If something can be done quickly to stop the severe disruption to our users, then the critical tag can be removed from the bug. For example, if the commit that caused the bug can be safely and quickly reverted, this is a good way to quickly resolve the critical bug. The root cause can be fixed later as a regular bug.

Critical bugs appear on our daily stand-up dashboard. Each critical bug is explicitly statused daily until it is fixed and deployed to production.

Once a critical bug is resolved, the engineer who fixed the issue documents the following in our critical bug tracking spreadsheet:

  • How long was the bug in production?
  • How long did it take for us to fix the bug?
  • What was the root cause of the bug?
  • What was the fix for the bug?
  • Can measures be taken system wide to prevent the bug from happening again in the future?

Every week during scrutineering, the team looks at the critical bugs recorded in the spreadsheet for an in-depth postmortem. These postmortems are treated as learning experiences and are blameless.

Over a longer period of time the team works to reduce the number of critical bugs that ship to production and reduce critical bug response times. This is measured by periodically reviewing the aggregate data from the critical bug spreadsheet.

Bugs that can’t be reproduced or need more detail

If a bug can’t be reproduced, tag someone from the QA team and they will work with the person or a customer service team member who created the ticket to figure out the details needed to reproduce the ticket.

What about bugs during the development phase?

During the development phase, bugs will happen. We differentiate pre and post-release bugs with separate labels. Pre-release label should be the label associated with the epic outlining the feature work. Epic feature labels are purple in color, to easily differentiate them from our post-release work, which are green in color.

Any bugs which remain after an epic release, need to be relabeled and prioritized in the general process because they are now affecting live customers. It is the responsibility for the PM and engineering lead of the epic to label and prioritize these. It is expected that the vast majority of these bugs (if not all) will be addressed as part of the cooldown sprint

The backlog buildup

What happens if a bug sits around for a while? QA team along with the repo admins define a threshold for the Week, Month and Eventually labels. If a repo passes its threshold one of two things will happen. The team owning the repo may go directly into a bug sprint. If that team is focused on something time critical, or if there is a big context switching cost, the next viable sprint is chosen to begin a bug sprint.

However, our customer facing teams (Sales, Account Management, Customer Support) are encouraged to advocate to the QA team to expedite and reprioritize a bug for the well being of our customers. Even if a particular repo is beneath the threshold, certain bugs may be introduced into sprint planning.