Skip to main content

Joel Liu's Library tagged mindset   View Popular

24 Jan 08

Five whys - Joel on Software

  • After some internal discussion we all agreed that rather than imposing a
    statistically meaningless measurement and hoping that the mere measurement of
    something meaningless would cause it to get better, what we really needed was a
    process of continuous improvement. Instead of setting up a SLA for our
    customers, we set up a blog
    where we would document every outage in real time, provide complete
    post-mortems, ask the five whys, get to the root cause, and tell our customers
    what we're doing to prevent that problem in the future. In this case, the change
    is that our internal documentation will include detailed checklists
    for all operational procedures in the live environment.
    • Our link to Peer1 NY went down
    • Why? – Our switch appears to have put the port in a failed state
    • Why? – After some discussion with the Peer1 NOC, we speculate that it was
      quite possibly caused by an Ethernet speed / duplex mismatch
    • Why? – The switch interface was set to auto-negotiate instead of being
      manually configured
    • Why? – We were fully aware of problems like this, and have been for many
      years.  But - we do not have a written standard and verification process
      for production switch configurations.
    • Why? – Documentation is often thought of as an aid for when the sysadmin
      isn’t around or for other members of the operations team, whereas, it should
      really be thought of as a checklist.
1 - 3 of 3
Showing 20 items per page

Diigo is about better ways to research, share and collaborate on information. Learn more »

Join Diigo