Before we dive back into defining an approach that mixes Product and Custom focus, we should probably ask “How does one focus on the customer?”
In an ideal world there are no defects; but since there are, and customers expect them to be resolved, an ideal approach balances itself between being proactive and reactive; proactively resolve reported/discovered defects and appropriately manage escalations. On the surface, one focuses on the customer by making sure that the defects are fixed before a customer reports them and if a customer reports them, they are addressed promptly (top priority of course) and the customer is satisfied – below the surface, defect management and prioritization play an important role towards our customer focus as this tells us how soon we can (and will) actually resolve defects.
The list below doesn’t capture all possible approaches in resolving defects; it captures approaches that I have had some experience with (I recommend that you certainly do not try the first one):
Don’t resolve them
- Push the defect to the original developer, or the developer most familiar with the functionality for resolution.
- Allocate time for the first available developer to pull a defect in and resolve.
- Have a defect duty rotation where a person (or team) resolves only defects for a time period
- Have a dedicated team for defect resolution.
- … Some other create ways to resolve reported defects..
Always push to most familiar
Pushing the defect to the most familiar sounds like a great idea and in many cases it is because the one most familiar would be able to resolve quickest and positively impacts customer experience. The issue with this approach is that the most familiar person might be involved with something that has a higher priority than this defect, or is out on PTO and would not be able to resolve for another 2 weeks. Let’s say it takes someone familiar with the code to resolve in 1 hour but takes someone not familiar with the code 6 hours. If the defect goes to the person currently tied up and is most familiar, the customer will have to wait a minimum of 2 weeks + 1 hour; however, if it goes to the person not currently tied up and is least familiar, the customer will have to wait a minimum of 6 hours. Being collaborative and available for the team may improve the turnaround time on average, but a hard coded “always send to the one who created it” may not be the best approach.
Allocate time to pull
Allocating time towards the end of an iteration, development or whatever, towards “defect resolution” allows the team to first get the scheduled work out of the way, and “if” there is time, someone may pick something up from the backlog of defects. The issue with this approach is that if the development cycles are fully booked (maybe there is a hard date) and there is any risk or complexity that might lead to developers putting in all they have to meet the delivery; defect resolution gets thrown on the back burner. In most cases where the approach is “allocate time to pull defects”, the unspoken rule is that new products come first, defects second – unless; where the “unless” is for escalations and chaos/fire-fighting instances. For agile teams, if the time allocated towards defect resolution does not change from sprint to sprint, then there is no impact to velocity; however, if the time allocation is not fixed the velocity can get impacted depending on how much time is spent on defect resolutions (usually hours) vs. features (SP’s)…. You could estimate defects in story-points but this can lead to additional issues that will need to be worked out… i.e. do you really want to hold a sprint back? etc..
Defect resolution duty rotation
For a given time (usually the span of a sprint) a developer within a team, or a whole team themselves will be on defect resolution duty; once the time span is over, someone else (or a different team) takes on defect resolution duty and so on. This helps cross-train and helps make everyone familiar with the code base. It can also help improving code quality since everyone is learning from everyone’s mistakes and provides a great collaboration platform; while it does have some great benefits it does introduce some challenges. A significant issue is that developers and teams lose traction as they switch focus from “new product” to “old product”; the interruption can cause a delay since the developer(s) will need to get back to where they were after the rotation is over. For organizations that have many teams or larger teams this may be less of a concern since the rotation might happen every few months; but even then, when it does happen it does have a negative impact.
Dedicated team for defect resolution
The thought on this one is that if there is one team solely focused on supporting “released software” (defect/engineering sustaining team) and other team(s) focused on creating “new software” (Feature teams) that you end up with a two-tiered development approach where both the product and the customer can be focused on. The feature (new software) team is rarely impacted by defects from the “live” world and they can always focus on delivering new product; the defect/engineering sustaining team is dedicated to resolving defects and is not tied up with new features. The issue with this approach is that no one aspires to be a “defect fixer”, developers want to “develop” new and innovative “features” (or at least I did); It is possible to make this work if more attention is given to down-time cross-training, root cause analysis, collaboration, role rotations, etc… (I have seen teams evolve this approach into a “defect resolution duty rotation” approach)
In addition to the above, there can also be hybrid approaches that mix various approaches, i.e. defect resolution duty rotation with an added “pager duty” where someone (not on defect duty) is on-call but in general there is no “incorrect approach”; however:
Any approach can become incorrect when developers are forced to accept an approach that they do not agree with (or understand).
Any approach that is going to be implemented should be discussed with the teams that will be implementing it, focus on and explain the “why”. When an approach does not work, try to adjust it or try something else!
“if the code repository is an “elephant” and new code is peanuts being fed to this elephant by the other guys, then I am always cleaning up after the elephant; who wants to be a shit cleaner forever?”.
The takeaway from the previous post on KPI and metrics was that we should proactively monitor process and optimize as needed; just because it worked when you were a startup does not mean it will work when you are “startedup”, you will need it to scale and by capturing metrics and KPI’s you will be able to perform analysis when/if things go wrong. This however does not mean that you need process for the sake of having process or that you should focus on process over people; agility is important and being lean goes a long way.
The chicken and egg problem: What came first, the chicken, or the egg?
You have great team(s) and you have great product(s). Your team(s) is/are at capacity enhancing and maintaining the current product(s), but you need to create more product(s). In order to grow product(s) you need to add more people but these people need to be grown as well. Hiring people and not growing them will make product growth challenging as there will be a longer ramp up time or will disengage and leave (or you end up with an us vs them culture); and redirecting your current team to grow people rather than the current products will grow the new hires at a rapid pace but your current products will stop growing, what do you do? going back to the chicken and egg problem, I think in the long scheme of things it is irrelevant what came first; what is more important is the realization and existence of the chicken and egg, or the “idea” of a chicken and/or an egg, and that you need to ensure that the cycle continues, chickens give eggs and eggs (eventually) give chickens.
Single points of failure (Single Threads)
As engineers and architects we focus on identifying single points of failure within architecture; as managers we need to identify single points of failure within team members, processes and tools. Ask yourself, if I was to randomly start pulling people out (pto, resignations, etc) what would be the impact? Would we still meet delivery? Do we lose key subject matter experts? Most of the time people end up becoming single threads because there are many hats to wear and things need to get done; documentation and knowledge transfer becomes a “will get to it” task that many never get to.
Single points of failure and knowledge silos end up becoming a real impact to growth when you bring on new hires who need to be brought up to speed and grown because the same resources that need to help grow others are already busy with their existing work. Not only does it impact growth, it also negatively impacts collaboration and team culture, when people do not grow they disengage and this causes further issues. As you grow from a startup company with a smaller team to a “startedup” company with a larger and growing team, your single points of failure can grow and teams/members can get frustrated as they get pulled from different directions.
A few simple approaches to reducing and/or eliminating single points of failure are:
- Focus on collaboration and knowledge sharing among teams (culture), the more people share what they learn the more people know.
- Work-load for single threads can be split between product development and people development.
- On-boarding programs and training documentation can be built as part of a product backlog.
- New hires can be paired with senior resources to create mentor-ship and knowledge transfer programs.
Each organization is different and each has its unique attributes that require a solution or several approaches that solve the problem for that specific organization; a silver bullet approach doesn’t really work.
- To grow new product(s) outside your current capacity you need to grow team(s).
- To grow new team(s) who will grow new (products) your current team(s) can be impacted.
- Your current team(s) can end up becoming single threads and/or single points of failure.
- Recognize that this can become a problem.
- Focus on culture, collaboration, knowledge transfer, documentation, etc. so that the impact to the current team(s) and product(s) will be minimal and your new team(s) will rapidly grow and be engaged.
Several years ago I had a Volvo (88 760 GLE) and one day I noticed little streams of smoke from under the hood every time I would get back home; I had little experience with cars back then so I took it to a friend’s dads shop. I should have probably left when I got there because there was a customer yelling at him for messing up his beetle and charging him extra to fix it; apparently he put some hoses on wrong and then had to redo the work, I wasn’t there for the whole story, just the last 20 minutes of it and then the customer drove off.
My friend’s dad asked me to start my car and pull up next to him and leave the car running. He popped the hood and started to look around, he checked the hoses, looked at the pump, lines, drove it around, several hours passed by… he went from suggesting that there was coolant leak, to transmission leak to radiator oil leak… several more hours passed by as he came up with theories and looking at things… after being there for about 6 hours I decided to stick my head in and look at the engine block near the side where I told him to look and thought the smoke was coming from. Sure enough, just as I looked, I saw bubbles near the engine block’s cover, pointed it out to him and he said “ah, it’s loose! oil is getting out”; brought over his tools, tightened it and the problem solved.
I had been there for 7 hours, he wasn’t the type of guy who would say “this is my son’s friend, I am going to help him out”; he was a business man and to him I was a customer. I was upset with him for wasting 7 hours of my time; but what was really on my mind at that time was “how much will these 7 hours cost me?”, especially since he had a sign posted that had “Service hour rate: $45/hr” in big letters…. I will get back to this later in the post.
KPI and metrics
Man hours, hours, T-shirt size, story points, etc are measurements. I will try my best to not go down a rabbit hole with scrum, story point’s vs hour estimates… I will not! and hopefully I won’t lose my original messaging in all of this. Let’s start with this; at some point or the other, the focus and bottom line for a company will be “shipping product” against a “delivery schedule”. People, process, culture, story points, hour estimates, etc. will eventually stop existing if the “startedup” company cannot ship product and closes down (the focus here is shipping product according to other peoples expectations, i.e investors, C-suite, etc). With this at the back of our minds let’s continue on.
Story points are a measure of risk and/or effort and/or complexity (the and/or is there for the ones who disagree that story points do not measure complexity and/or risk).
Work/Task estimates are hours (usually) it takes to complete a task (with risk, complexity and effort already factored in).
Some argue that story points are just a block of time that provide the developers with padding; some argue that story points and hour estimates are not the same; some argue that time estimates need to be detailed and you should only use blocks of time (i.e. story points). I am not here to argue about any of these.
If you look at what part story points and sprints play: Story points (representing stories) go into sprints and sprints are boxed in time; at the end of the day, we are basically fighting for time. Others (Non-developers) usually want to know “how long will it take”, “when will it be done” because they need to set schedules, communicate to others, but (most) developers just want to work 🙂
How can I tell you how long it will take to fix when I have not even looked at the code; code that someone else wrote years ago!
When you have your team of 10 who have been working on a project (or two); the story points, velocity charts and estimations work out great. The team of 10 will negotiate points and the best suited person will do the work; everyone starts getting a good idea of what others and they themselves can do with improved accuracy (and velocity).
What risk can come up when you throw in 65 new hires and 2 new projects?
One thing that can happen is that the wrong person can get the wrong story; it is also possible that the new team may incorrectly estimate story points.
This happens or can happen because it takes experience and familiarity to get both (story point estimates and story assignment) of these right. Let’s say everyone is working on their tasks for a sprint, there is 1 story (something to do with SSL) remaining and it MUST make the sprint (which closes sooner than the story can be worked); the one developer available knows NOTHING about SSL, and a simple change measured as 2 story point remains, there is a developer on the team who knows about SSL but she is already working on a different feature that requires her knowledge on encryption.
Why did this get set as 2 story point when it was obvious to the team that there was risk? Rather than negotiate for 5 points the team settled for 2 because they expected the more senior programmer to have a better idea of how many story points it would take; the senior programmer saw no problem with 2, because in the past, her team was comfortable with it being 2. Repeat this several times, and you have a pattern; how do you break this pattern? or how would you even recognize this pattern? How would someone have suggested 5? How do you further refine estimating story points so that its not just based on gut, experience or familiarity? You start building and tracking additional KPI’s/metrics. Velocity and Burn Up/Down charts are common KPI’s that most use, you need more to help fish out patterns and gaps.
I think it’s important to acknowledge that in a true scrum setup (a perfect world, which is possible) these things may not happen (or happen rarely); if something doesn’t make a sprint, it moves to the next, but in most (all) of the places I have worked at, true scrums do not exist, shit happens and you cannot NOT make the dates; unless the team pulls together, works OT and possibly burns out (if it keeps happening).
As many others do, I like to base my estimation on experience and collectively agree with a team; but wouldn’t it be easier if there was another set of metrics that provided extra assurance or a reality check? i.e. historical data. Either metrics against tasks or metrics against similar stories. i.e. a story around “user login” averages to be a 4 point story based on previous similar stories; a task to “check credentials against db” takes 2 hours? The metrics can be captured after sprints/projects are done in adhoc meetings or release review meetings the data would be used for new hires, for times when things are under/over estimated. New hires and others could use this data to help estimate and understand gaps between what it takes on average and how the teams perform; the KPI’s would further detail teams health…
Going back to the Vovlo; I was waiting at his desk while he put his tools away, then he walked back to his desk, opened up a book, flipped pages to a section that read “Diagnostics”, found a line item for “Oil Leak” and sub-item “Gasket”, and said “2 hours, so you owe me $90 for fixing the problem”. Even though he spent 7 hours on it, he charged me for 2 because that’s what the book that had metrics for that type of service said it should take.
The Volvo example is important to me because it identifies performance issues; i.e. he should have done it in less than 2 hours if he was a good mechanic because that’s how long it takes on average; he should be asking himself “why did it take me more than 3 times as long and how can I do this better” because that’s what we would use similar development metrics for. “I seem to always under estimate UX changes, I need to pay better attention”.
The example I used has so far revolved around a startup company of 10 growing to a startedup company of 65; let me use a different example: A software development boutique is agile and they have client projects captured in back to back sprints. There are account executives that double as product owners who talk to the customers and based on experience and some dialogue with a few dev leads, they estimate effort and agree to a schedule and budget. Once they are ready to start the sprint (for a new project) the dev leads will update the team and as soon as sprint planning (stories placed) is done they start rolling.
A few issues:
- The dev lead and account executive time-boxed the maximum amount of time it can take based on their meetings with the customer; there was no team review
- Account executives double as product owners; their stories aren’t reviewed by developers until the work actually starts since developers are already busy on other projects
- There is no room for scope creep; things cannot get thrown out since this is a client project, and it must meet a date
- When there is scope creep and because its boxed; resources will work over time and burn out since the cycle just repeats it self – regardless of what your story points are, they have to fit in the sprints.
You could point out that the issue here is that there isn’t team involvement with the original estimation (for the time box) but this is because of how the company chooses to operate, so you cannot change this. You could state that the issue here is that there will always be some sort of scope creep so you cannot expect a hard stop but this is also because of how the company caters to customers expectations and needs to operate.
I would argue that the issue here is that the account executives do not have a “rate book” or “performance history” for similar tasks/stories that can help them come up with better estimates and factor in complexity when needed. In addition to that, since this company is in a pattern of running over (and solving by having people work over time, every time) there should be some sort of analysis done after each project to come up with “mistakes made” or “lessons learnt” so that people can learn from the patterns and put out better estimates; there will be times where you cannot change the entire companies culture, so instead you need to look for what can be improved.
With focus on additional KPI and metrics; one can identify issues with process or gaps before they become a bigger problem; Don’t just stick with how things worked when you were a startup and expect things to continue to always be perfect once you start growing, when you are “startedup” you need to start looking at adding new KPI’s and measurements that will help the bigger team work better and scale.
The Startup and the 3 P’s: Product, Process and People
I will not pretend to know everything about startups and startup culture, but I will list the reasons why startup culture is exciting, at least for me:
You meet great people, people who have ideas and want to try things, people who have passion and want to make an impact, people who will challenge you to do better. There is passion for working together as a team, passion for building trust within the team and passion for collectively making an impact in other people’s lives; or sometimes, passion for just making something happen – to create. There is passion for possibly creating something that could go big – disrupt everything, all built from the ground up with the teams sweat, blood and tears where everyone is high on adrenaline. Suits? Offices? As long as you are connected with your team and are working well together, those things don’t matter. There is no red-tape, or big top-down structures, everyone and anyone has access to all. Anyone can start working on anything, there are many hats to choose from; wear all. You don’t get bored as things are evolving and stay fresh, there are new ideas, old ideas, odd ideas; anything can change anytime.
At the end of the day, a startup is defined by its growth; when a startup doesn’t grow, it dies; it stops.
There can be several growth stages for a startup, and startups evolve; once they start growing they are now “startedup” and will hopefully grow exponentially. In a perfect world, the cultural values that made the startup fun would remain and in some cases they do (depending on where the growth has lead the startup) but there are times where the culture itself that helped the startup grow and evolve starts conflicting with what is needed to grow to the next level.
Let’s say you follow agile and you end up with iterations, planned work, release schedules and a clear pipeline of what needs to be built. This all worked great when you had 2 products and a team of 10; since you have grown, the expectations of what you can or will deliver have also grown. Some brilliant folks in your team have discovered 2 more products that should be added to your portfolio; how do you grow your current 2 products (since they have a feature and defect backlog) and also work on these 2 new products without increasing your team size, changing delivery for current products or burning out resources? Before you grew, you may have had your own expectations of when and how you would bring on these two new products; now that you’ve grown, others may have different expectations from you and your team(s). Maybe you say “we need more people”, which brings me to the next point
With the growth of the startup, either through sales, funding or more investment and the need to create more product it is decided that you bring on more people, and you do. You end up facing the same issue, how do you grow people with the same 10 resources you had who are busy working the two existing products; some of the people you bring on may be self-starters and will figure everything out by themselves but what about the ones who don’t? So now you say “we need some process and automation to free up some of the manual work so that we can do more with the same resources”, which brings us to…
How do you focus on process and automation to free up time when the people you have are busy with supporting the existing two products, or are supporting the existing two products and are also trying to bring the new hires on-board?
A part of me says that the above three growth challenges are not really challenges and that they are part of what it means to be a startup culture and are expected. However; there are a few by-products that the 3 P’s create that can become toxic, stop growth and hurt the culture if they are not accounted for when trying to grow.
The Frat party & the first team
The first team consists of the people that built the startup; it was their teamwork and effort that made the startup grow; anyone who comes later is an outsider and “we need to be careful about who we let into our frat party” (once upon a time I lived on frat row). This one is not intentional, but when you work closely in teams and blur the line between friendship and co-workers, you end up creating an inner circle and make it challenging for an outsider to easily integrate and feel welcomed. This by-product is a blocker for People growth.
The golden simple process
At some point there was predictability and little chaos in what all needed to be done (smaller team, less products) so everyone starts expecting things to always be perfect. Even though you have grown, you have kept your process simple and did not optimize for KPI’s and other metrics that can help with predictability, complexity, risk and estimation. There will be times where things change, dates get reset and/or product scope creeps. If you had built a roadmap of what releases when, had committed the teams to that and put all these releases with their iterations back-to-back (because of all the product that had to get pushed out to show growth and maturity) and dates or requirements change on you (usually not for the better) the team and its happy culture will get disrupted as it will take effort to get things back on track; when/if this happens all the time, it gets hard to get away from the domino effect and people burn out, get disengaged and/or leave. This by-product is a blocker for Process growth.
When you were small, everyone knew what everyone else was doing, everyone shared and individuals had their skillsets. Now you have grown, 2 months ago you were 10 people, today you are 75, the 65 newer ones don’t understand the code base or the original design, there is some good documentation but they need more information and there are 3 key people who know different things about the original products; original products that you want the new 65 people to work on so that the first team can work on the two new ones; how do you distribute the knowledge known by the 3 key people, make them available to the 65 and allow the 3 key people to focus on their new projects? If they are constantly being pinged by others and cannot get their work done; their sense of accomplishment doesn’t scale much; especially if you did not plan for them to set time aside and help others. This by-product is a blocker for Product growth.
Each blocker is situation (just like leadership) and can be solved; we will examine and solve for each, before we move onto other “StartedUp” culture challenges. The next post goes into process KPI’s and metrics – addressing the golden simple process blocker.
So what does come after Scrum/Kanban/Agile?
The evolution of the SDLC continues as expected and if you look at the trend, the focus has been to get things done quicker – Rapid.
Andy @ Assembla calls it Scalable Agile and has great content explaining the concept behind it; I call it Rapid – Real-time Actionable Prioritized Individual Delivery, or Rapid Agile and the basics around the process are very similar. The focus or goal is to prioritize individual efforts rather than a team sprint, act upon real-time feedback and deploy much quicker; often deploying new features and bug fixes several times a day rather than ever other week or so.
Case Study: Blank Label
At Blank Label when we were much younger, we manually deployed whenever we wanted as we were trying to rapidly enhance and stabilize our offering. With every feature came a lot of bugs and it was usually the changes in usability that brought inconsistency in usability as we had limited resources and a lot of “to-do’s”. Eventually we settled on a 2 week delivery cycle but as we were attempting to find out who our customer was, we made some drastic changes only to see orders drop from several a day to almost none every week. We had no idea what of the 20 things we changed in the 2 weeks that killed it as A/B testing had told us that our sales would go up, not vanish.
Fast-forward many months – we saw that we had gone back to our old habits, but this time we had process and we were not disrupting things when we updated, we deployed to a staging server, tested things out there and then released to the live server. This still required manual builds and deployments, many times fixes just didn’t make it to the live system when issues were discovered because it required someone to build and upload…
Now fast-forward to yesterday – we now build the backlog in Jira and make use of green-hopper for their kanban board. We go through a to-do, doing, build and live workflow, where bamboo will automate code checkins and deploy with AWS ec2 to our staging server; once testing passes staging, the deployment to live is a simple click of a button and live is then refreshed.
For test, we have gone from 1-3 pushes to test per day to 5-8 pushes per day and for live will be going from one refresh per day to 2-3 refreshes. We still have a small team of developers, once this team grows we will probably see a large increase in pushes to test, but will probably maintain the 2-3 refreshes to live per day (depending on the functionality).
In the not so distant future, I will provide the development workflow process along with the tutorial to build all the Atlassian stuff that will get you to a rapid continuous integration, deployment and deliver model as I did not see a lot of solutions that focused on the .Net MVC3/4 Razor stack.
“We are committed to being very agile in making sure that our waterfall process is iterative”
While machines might give the same level of output everyday, the same doesn’t apply to people.
In a real environment, each member’s momentum and motivations will change without notice. This requires management to step in and make sure they are constantly evolving the processes so that members will remain motivated. Leadership is important to motivate key players, channel information to keep everyone engaged and make working horizontally routine. To accomplish this, management might bring in an outside champion, who will help wring in new ideas and “fresh” stream of motivation. Instead of looking at a project as a whole, it should be broken down into smaller goals that are realistic and achievable. Management should encourage and build on small success, to demonstrate that it was an important milestone and motivate members. Progress and results should be demonstrated so that members can all acknowledge the ongoing progress and success. Management can keep the members interests by ensuring that they are always focused on achieving the goals that were set, and if deviation is detected, corrective action is taken to bring things back on track.
A horizontal team is dynamic as it is always changing, management needs to account for its dynamic nature and make necessary adjustments to accommodate/adapt to the change. Money is probably the best motivator, but too much money too early in the process can hinder individual initiative and prevent people from innovating. Since horizontal structures focus on team collaboration, a downfall is that meetings can go on forever and schedules can be missed; this can unmotivated some members who feel that decisions are not being made in a timely manner, which is why management can implement deadlines which will help practitioners develop a common schedule and manage workloads effectively so that the projects momentum is kept at a steady pace.
[Source: Moving from the Heroic to the everyday: Hopkins, Couture, Moore]