Bugs

Bugs happen. We do our best to write clear, concise, bug-free, and tested code when possible but we accept the fact that they are unavoidable. We use Pivotal Tracker to track our work. Our projects are organized into two areas, web and apps. Web being any technology involved with our front end site and database and app being any integration with various applications like PowerPoint, Keynote, and Google Slides or operating systems like Android, iOS, OSX, and Windows.

Creating bugs

Whenever a bug is discovered by a team member or reported by a customer, we write a bug. This involves creating a bug in pivotal within the appropriate project. Sometimes bugs are involved with multiple projects. In such cases, it’s fine to create a bug for each project to track work in the respective area. After the bug is created, it is triaged and assigned to the appropriate individual(s).

Bugs remaining percentage

The following formula is used to calculate a repo’s “bugs remaining” percentage.

(Bugs created - bugs accepted) / bugs created = bugs remaining

This percentage is calculated within a 90 day (to align with quarters) rolling window. A target percentage of “bugs remaining” is set per quarter by the product and business teams.

This approach acknowledges the reality that not all bugs are fixed. For example, we typically target a 15% bug remaining target, not shooting for 0%.

Bug sprints & thresholds

Each repo has bug thresholds determined by the repo admins. They are based on severity levels to ensure the highest priority bugs get addressed first. Thresholds vary across repos depending on our level of bug tolerance.

Thresholds act as a circuit breaker for bugs. We normally don’t let bugs interrupt our development work, due to the high cost of context switching. When a threshold is met it is discussed at standup. The repo admins may decide to fix the bugs right away; a bug sprint won’t start if the project falls below its thresholds by the start of the next sprint. This is okay because it still means bugs are being fixed on a regular basis. If the project is still above its threshold by the start of the next sprint, it becomes a bug sprint: normal work is put on hold while all repo admins work on bugs.

A bug sprint lasts the entire sprint. Bugs get worked on in priority order, including the Eventually bugs. Repo admins with urgent work may be excused from the bug sprint by a PM as long as all other admins are informed.

Getting below the threshold is a minimum; if the project is still not below the threshold after the bug sprint, an emergency second sprint will be called.

It’s also good practice for repo admins to be proactive about fixing bugs before a threshold is reached as this helps the company stay on track with project work.

Severity labels

Bug priority naming conventions must be followed to make sure a bug is triaged:

  • Repo name - The GitHub repo name should be used in the label to associate a bug with a GitHub project so the repo admins can take a look at it. For example, if the source of the bug is the https://github.com/polleverywhere/quebert repo, the label quebert should be applied to the bug.

  • Severity - Severity is the most amount of time we can go before fixing the bug. A time-based approach to severity was chosen because it’s more understandable by customer service, sales, and account management (as opposed to high, medium, low).

    • Critical - A critical bug is one that causes severe disruption to our users. Bugs tagged critical should be fixed immediately by the engineering team, even if that means working evenings and weekends. During stand-up, all critical bugs appear on a dashboard and are statused at least daily. All critical bugs are tracked as part of our downtime.
    • Show stopper - A show stopper bug is one that has risk of causing severe disruption to our users. Bugs tagged showstopper should be fixed immediately (within business hours) by the engineering team. During stand-up, all show stopper bugs appear on a dashboard and are statused at least daily.
    • Week - If the bug is a constant source of pain for customers, they will likely tag it with the week label. Weekly bugs are statused at least weekly.
    • Month - A monthly bug is a minor defect that only impacts a few customers. This could be something as small as an alignment issue in a feature that’s buried deep within the product. These bugs are statused at least weekly.
    • Eventually - Nice to have, lesser important bugs are typically tagged with “eventually”. They are reviewed when a bug threshold is hit.
  • Epic name - If the bug is discovered during the pre-release development phase, add a label that matches the epic tracking that feature. Don’t tag it with the repo name

Critical bugs

The one who first discovers a critical bug is responsible for immediately:

  1. Creating a Pivotal Tracker story with the tag critical; and
  2. Finding an engineer who can fix the issue.
  3. Notifying all customer facing employees that there is an open issue affecting production. Usually via email.

Even if the critical bug is identified outside of business hours, work should begin on fixing the problem immediately. The engineers working on a critical bug should not continue working on an epic or other project until the critical bug has been resolved.

It is not necessary to fix the root cause to resolve a critical bug. If something can be done quickly to stop the severe disruption to our users, then the critical tag can be removed from the bug. For example, if the commit that caused the bug can be safely and quickly reverted, this is a good way to quickly resolve the critical bug. The root cause can be fixed later as a regular bug.

Critical bugs appear on our daily stand-up dashboard. Each critical bug is explicitly statused daily until it is fixed and deployed to production.

Once a critical bug is resolved, the engineer who fixed the issue documents the following in our critical bug tracking spreadsheet:

  • How long was the bug in production?
  • How long did it take for us to fix the bug?
  • What was the root cause of the bug?
  • What was the fix for the bug?
  • Can measures be taken architecturally to prevent the bug from happening again in the future?

Every week during scrutineering, the team looks at the critical bugs recorded in the spreadsheet for an in-depth postmortem. These postmortems are treated as learning experiences and are blameless.

Over a longer period of time the team works to reduce the number of critical bugs that ship to production and reduce critical bug response times. This is measured by periodically reviewing the aggregate data from the critical bug spreadsheet.

Bugs that can’t be reproduced or need more detail

If a bug can’t be reproduced, the review tag should be applied to it. The person working on the bug, product manager, or QA lead should work with the person or customer who created the ticket to figure out the details needed to reproduce the ticket.

What about bugs during the development phase?

During the development phase, bugs will happen. We differentiate pre and post-release bugs with separate labels. Pre-release label should be the label associated with the epic outlining the feature work. Epic feature labels are purple in color, to easily differentiate them from our post-release work, which are green in color.

Any bugs which remain after an epic release, need to be relabeled and prioritized in the general process because they are now affecting live customers. It is the responsibility for the PM and engineering lead of the epic to label and prioritize these. It is expected that the vast majority of these bugs (if not all) will be addressed as part of the cooldown sprint

The backlog buildup

What happens if a bug sits around for a while? Well, it depends on the team that owns the repo. Each team defines a threshold for the Week, Month and Eventually labels. If a repo passes its threshold one of two things will happen. The team owning the repo may go directly into a bug sprint. If that team is focused on something time critical, or if there is a big context switching cost, the next viable sprint is chosen to begin a bug sprint. These rules are determined by the repo admins.

However, our customer facing teams (Sales, Account Management, Customer Support) are encouraged to advocate to the PMs and QA team to expedite and reprioritize a bug for the well being of our customers. Even if a particular repo is beneath the threshold, certain bugs may be introduced into sprint planning.