sconover-code

Process kills developer passion - O'Reilly Radar

Simon Farnsworth [11 May 2011 08:47 AM]

I'm just going to remind people of what Agile means (from http://agilemanifesto.org/ ):

It means:

Individuals and interactions, rather than processes and tools.

Working software, rather than documentation.

Customer collaboration, rather than contract negotiation.

Responding to change, rather than following a plan.

Practically, this means that as soon as your process is more important than your developers, you've failed, and aren't Agile any more. In an Agile world, the point of your process is to make your individual developers work better, and make interactions between developers easier; if process is killing things, it's gone too far.

Note that this doesn't mean no process at all; the concept of Agile processes is to capture ways good software engineers work, and formalize them in a way that all software engineers can use to get better results. TDD, for example, works well because it's one way to get you to understand the problem before you try and solve it.

As soon as you *must* use a process, even when it's inappropriate, you've missed the point; the processes are formalized so that an inexperienced developer can use them without prior experience, not so that you can insist on (say) Scrum for all software. And note that lack of a formal process is in itself a process.

Yes, this is hard; it involves having some competent, experienced developers around to tell whether the junior guys are reasonable when they're saying "this process is wrong for this bit of project; I'm going to use this other process instead". It involves understanding enough about what you want to do to know what "working software" means for you. It involves being flexible, and responding when things change underneath you. But it's what Agile is supposed to be about, not endless process that doesn't help.

Posted May 11, 2011

InfoQ: Agile 10 Years On

More Agile papers and chapters were written about the need to write comprehensive tests than ever were written about writing good, working code; only Uncle Bob and friends of late have come to the rescue. (And as much as the literature talks about tests, it offers precious little about what makes a good test.) As for embracing change, don’t you dare threaten the XP practices, and don’t look for a place to suggest improvements to the Scrum framework. Freedom is slavery. (There is, on the other hand, a process for adding new CMMI practices.)

Beyond that, Agile became a substitute for thinking. That the Manifesto said X was adequate to justify doing X. To be non-Agile was to be a Communist. I’ll never forget the black arm bands at OOPSLA one year. (Today, they seem to have devolved into more innocent and prettier woven wrist bracelets.) Agile was not so much a response as a reaction - a reaction against the excesses of the management practices that arose in the 1960s and 1970s, and again among the CASE tools and methodologies of the 1980s. And it was a violent action. As Mary Poppendieck envisions the movements of our discipline as being a pendulum that is rarely centered, so the more extreme (to coin a phrase) programmers slammed the pendulum as far to one end as possible. While the 1980s methodologies were preoccupied with plans, and by inference the thought and planning behind them, the Agile world frowned on anything but doing. Each of the Manifesto’s dichotomies relates to doing. None of them are fundamentally about the thinking we find in Lean. Just drop the first step of the Plan-Do-Check-Act business cycle. Ignorance is strength. The proverbial baby and bathwater come to mind.

Posted May 6, 2011

Exploration Through Example » Blog Archive » A question about some people

Daniel Hinz Says:

My observation is that the development community (some) have ended up feeling short changed by the way Agile is being realized in many organizations. Large IT, product or embedded development shops have been ‘adopting’ Agile as a way to increase the productivity of producing code. However, moving the development teams to Agile and transforming a large organization to Agile are not the same thing. Accounting systems, budgeting, project initiation and approval are huge bureaucracies to move.

Much of the P.O., Scrum Master or other Agile program management training is based on creating laser focus on customer value. (I note that the question you put out earlier in the week asked about business value which is a wider net.) The question that keeps getting asked is what value does the customer get from paying back this technical debt? What value does the customer get from simplifying this design? What value does the customer get from cleaning this code? The answer is almost universally none. So, the P.O. keeps pulling those activities out of the backlog because these are all internal codebase issues, the customer does not see it or realize value from it, at least not directly. The P.O. motivation is that if they just get this one out in a way that is deemed successful then can look forward to a promotion and a change of responsibilities. That means that too often they are not around to have to deal with the future value of these decisions. Who is? The development teams!

When we look at the 12 Agile principles we get the impression that the ones at the top of the list are all about the customer and the ones at the bottom of the list are all about the development team. Due to the customer only focus, sustainability, simplicity, technical excellence, emergent architecture and design are all perceived to have fallen off the table. The development teams end up feeling short changed. They felt they finally had an approach that would support their intrinsic need to build software systems they could be proud of. When that did not happen they went looking for a way to meet their own needs. Thus, the Craftsmanship movement.

Posted May 5, 2011

What Martin Fowler saw in the Agile Manifesto - SD Times: Software Development News

I look at it as a historic statement, one that really helped turn the industry away from its view of what professional software development should look like. In that, it achieved much more than I would have imagined, and that is quite enough.

But I think it also provides a great starting point for understanding what the key to agile thinking is, what Jim Highsmith refers to as "Being Agile rather than Doing Agile."

There's always been talk of tweaking it to better express things. And people often snicker and say that as an unchanged document, it's hardly agile itself. But that snickering misses the point; the whole philosophy of agile isn't about interpreting some dusty document, but in making your own journey of discovery. The manifesto is a useful part of that journey, but in the end you have to think for yourself. (And, sadly, many people dislike that message.)

Posted May 3, 2011

Summary of the Amazon EC2 and Amazon RDS Service Disruption

Primary Outage

At 12:47 AM PDT on April 21st, a network change was performed as part of our normal AWS scaling activities in a single Availability Zone in the US East Region. The configuration change was to upgrade the capacity of the primary network. During the change, one of the standard steps is to shift traffic off of one of the redundant routers in the primary EBS network to allow the upgrade to happen. The traffic shift was executed incorrectly and rather than routing the traffic to the other router on the primary network, the traffic was routed onto the lower capacity redundant EBS network. For a portion of the EBS cluster in the affected Availability Zone, this meant that they did not have a functioning primary or secondary network because traffic was purposely shifted away from the primary network and the secondary network couldn’t handle the traffic level it was receiving. As a result, many EBS nodes in the affected Availability Zone were completely isolated from other EBS nodes in its cluster. Unlike a normal network interruption, this change disconnected both the primary and secondary network simultaneously, leaving the affected nodes completely isolated from one another.

When this network connectivity issue occurred, a large number of EBS nodes in a single EBS cluster lost connection to their replicas. When the incorrect traffic shift was rolled back and network connectivity was restored, these nodes rapidly began searching the EBS cluster for available server space where they could re-mirror data. Once again, in a normally functioning cluster, this occurs in milliseconds. In this case, because the issue affected such a large number of volumes concurrently, the free capacity of the EBS cluster was quickly exhausted, leaving many of the nodes “stuck” in a loop, continuously searching the cluster for free space. This quickly led to a “re-mirroring storm,” where a large number of volumes were effectively “stuck” while the nodes searched the cluster for the storage space it needed for its new replica. At this point, about 13% of the volumes in the affected Availability Zone were in this “stuck” state.

After the initial sequence of events described above, the degraded EBS cluster had an immediate impact on the EBS control plane. When the EBS cluster in the affected Availability Zone entered the re-mirroring storm and exhausted its available capacity, the cluster became unable to service “create volume” API requests. Because the EBS control plane (and the create volume API in particular) was configured with a long time-out period, these slow API calls began to back up and resulted in thread starvation in the EBS control plane. The EBS control plane has a regional pool of available threads it can use to service requests. When these threads were completely filled up by the large number of queued requests, the EBS control plane had no ability to service API requests and began to fail API requests for other Availability Zones in that Region as well. At 2:40 AM PDT on April 21st, the team deployed a change that disabled all new Create Volume requests in the affected Availability Zone, and by 2:50 AM PDT, latencies and error rates for all other EBS related APIs recovered.

Two factors caused the situation in this EBS cluster to degrade further during the early part of the event. First, the nodes failing to find new nodes did not back off aggressively enough when they could not find space, but instead, continued to search repeatedly. There was also a race condition in the code on the EBS nodes that, with a very low probability, caused them to fail when they were concurrently closing a large number of requests for replication. In a normally operating EBS cluster, this issue would result in very few, if any, node crashes; however, during this re-mirroring storm, the volume of connection attempts was extremely high, so it began triggering this issue more frequently. Nodes began to fail as a result of the bug, resulting in more volumes left needing to re-mirror. This created more “stuck” volumes and added more requests to the re-mirroring storm.

By 5:30 AM PDT, error rates and latencies again increased for EBS API calls across the Region. When data for a volume needs to be re-mirrored, a negotiation must take place between the EC2 instance, the EBS nodes with the volume data, and the EBS control plane (which acts as an authority in this process) so that only one copy of the data is designated as the primary replica and recognized by the EC2 instance as the place where all accesses should be sent. This provides strong consistency of EBS volumes. As more EBS nodes continued to fail because of the race condition described above, the volume of such negotiations with the EBS control plane increased. Because data was not being successfully re-mirrored, the number of these calls increased as the system retried and new requests came in. The load caused a brown out of the EBS control plane and again affected EBS APIs across the Region.

Summary of the Amazon EC2 and Amazon RDS Service Disruption

Summary of the Amazon EC2 and Amazon RDS Service Disruption in the US East Region

Now that we have fully restored functionality to all affected services, we would like to share more details with our customers about the events that occurred with the Amazon Elastic Compute Cloud (“EC2”) last week, our efforts to restore the services, and what we are doing to prevent this sort of issue from happening again. We are very aware that many of our customers were significantly impacted by this event, and as with any significant service issue, our intention is to share the details of what happened and how we will improve the service for our customers.

Kicking off the Slow Software Movement

From: http://www.agileproductdesign.com/writing/slow_software.pdf

Over breakfast, an old friend who happens to run a software development company was complaining that too many of his people focus on software development—engineering, requirements, and project management stuff. When confronted with a problem, they jump in and start gathering requirements, putting together project plans, and developing.

 

What is now a problem used to be cause for celebration. He went on to explain that his staff members don’t take enough time to understand and appreciate the problems they’re trying to solve. They are quick to launch into a project, but the result—no matter how quickly or effectively it was built—just isn’t right.

 

My friend wants people to better understand what success means before starting to build. He said, only half joking, “I want to start the Slow Software

movement!” He was alluding to the Slow Food movement—a non-profit international group trying to bring diversity and quality back into the food we eat. His comments made sense to me and set me to wondering why we’re in such a hurry.

 

Years ago it was incredible that we could build software at all. We were constrained by the tools we used—both the development languages and the computer systems. Clever people worked within these constraints to create software applications that allowed users to perform tasks that were nearly impossible before.

 

But constraints are changing.

...

I’ve observed over the past dozen or so years a drift in the way we think about, design, and build software. While in the past there was more emphasis

on engineering discipline and process, now they’re not sufficient. When we sit down to use software today, we’re not impressed by the engineering discipline that went into it, we don’t ponder the process used to build it, and we don’t wonder whether the product delivered on time with its intended scope. We’re no longer amazed that we have software to use. We now expect software to work well. We expect that the value we get from using it is worth the price we pay for it.

 

Our development approaches have changed, too. Popular processes now place emphasis on delivering business value, not lines of code. The quality of software is increasingly judged not by its lack of bugs but by its usability. Writing high-quality, bug-free code no longer seems to be a measure of success. Delivering on time now seems less important than delivering value.

 

If value realized and quality of use really are our new measures of success, maybe it is time to slow down. Before we plan and build, we should take time to understand what value is. Make sure the first thing we gather is how the people paying for the software will get value from it. That’s likely not a list of features but rather a list of goals or a description of a world that’s a little better because of this software. Then we will decide what to build based on what best meets those goals.

 

We also might want to understand better the people using our software. They’ve got goals, too, which likely are met using other software or manual processes today. What we decide to build should outperform what users already have. The quality of what we build will be judged alongside the other tools on which users currently rely.

 

Gathering business goals, talking to and observing people, and validating software with usability testing likely will slow us down. But maybe it wasn’t really “fast” that we wanted in the first place.

Widespread Application Outage

2) Block storage is not a cloud-friendly technology. EC2, S3, and other AWS services have grown much more stable, reliable, and performant over the four years we've been using them. EBS, unfortunately, has not improved much, and in fact has possibly gotten worse. Amazon employs some of the best infrastructure engineers in the world: if they can't make it work, then probably no one can. Block storage has physical locality that can't easily be transferred. That makes it not a cloud-friendly technology. With this information in hand, we'll be taking a hard look on how to reduce our dependence on EBS.

defunkt/dotjs - GitHub

..................... dotjs ........................

dotjs is a Google Chrome extension that executes JavaScript files in ~/.js based on their filename.

If you navigate to http://www.google.com/, dotjs will execute ~/.js/google.com.js.

This makes it super easy to spruce up your favorite pages using JavaScript.

Bonus: files in ~/.js have jQuery 1.5 loaded, regardless of whether the site you're hacking uses jQuery.

Double bonus: ~/.js/default.js is loaded on every request, meaning you can stick plugins or helper functions in it.

GreaseMonkey user scripts are great, but you need to publish them somewhere and re-publish after making modifications. With dotjs, just add or edit files in ~/.js.

Example

$ cat ~/.js/github.com.js
// swap github logo with trollface
$('#header .logo img')
  .css('width', '100px')
  .css('margin-top', '-15px')
  .attr('src', '//bit.ly/ghD24e')

How It Works

Chrome extensions can't access the local filesystem, so dotjs runs a tiny web server on port 3131 that serves files out of ~/.js.

You don't have to worry about starting or stopping this web server because we put a pretty great plist into ~/Library/LaunchAgents that handles all that for us.

The dotjs Chrome extension then makes ajax requests to http://localhost:3131/convore.com.js any time you hit a page on convore.com, for example, and executes the returned JavaScript.