Hello everybody
Thank you for coming
We had a survey recently, maybe some of you have participated
Majority has voted to hear about code refactoring
And I hoped it will not!
Because I felt sick while planning this presentation
I have noticed that many of you have downloaded code from GitHub
That is the code I made about two years ago
I had plans to make a lecture about it, but that code was so ugly...
... that in the end I never made that next step
You may have seen what this example is about
It generates, and then solves, Sudoku puzzles in plain English
Is there anyone here who likes to play Sudoku?
1, 2, 3... 4... 5, 6
I'll prepare chess for the next meeting
I'll see if it's possible to write an entire chess server in one function...
... and then refactor it
Now it turned out the funny way in the end
As I kept postponing that work for a year...
... then it happened to me, what usually happens...
I couldn't remember how it works
That was exactly what I wanted, originally
I wanted to see if I can untangle that mess
It took three days of coding to prepare this presentation
And you will see, there will be progress by the end
For those who never heard of me...
People mostly recognize me from my Pluralsight courses
There are about 40 hours of my courses there
I've been specializing functional C# recently
There are two courses at the top belonging that field
Before those two, there is one very popular course...
"Making Your C# Code More Object-oriented" is the most popular of them all
If you wish to improve your OOP skills, then watch that one
And so, the topic of this lecture...
How does a 1000-lines-of-code function come into being?
One line at the time, but...
Not all lines are the same, and not all of them are equally sticky
There are a few common patterns there
I will show you a couple of them here
Suppose that you must add a few lines more to this
Will you append that code here?
If you choose to append, then the reason will usually lie with variable reuse
To avoid having to pull out many arguments to a function...
... you simply append code in-place
Later you add more...
You had five lines of code; now you have 17
You're still not afraid of future maintenance
The next thing that happens...
You start changing the reused variable
This code is one block, but it is three segments
The second segment inherits a variable
The third segment observes the result of its predecessor...
... and it's already hard to pull any of these into a separate function
There is a hard-wired communication between code segments
You are sinking into a quicksand, up to your ankles so far
Soon enough you'll get stuck
It will become worse when conditionals are added
That is where pressure raises, as the third segment cannot make assumptions anymore
An attempt to pull out a function ends up in having 7 arguments, 3 of them in/out
The last piece of code depends on everything that precedes it
After this, the third act commences...
... where all this happens in the loop
Adding more code now works in a different way
Some code comes inside the loop; some related code comes outside of it
I'm sure you have seen this a hundred times already
This last piece of code - when exactly will it execute?
This block inside the loop might execute twice, or seven times, before the last one runs once
Execution dynamic becomes hard to grasp
Functions structured like this one are a true challenge to refactor
Planting a bug becomes so easy while trying to untangle such a code
Quickly, you lose motivation to proceed, with the function grown to 70 lines or so
When the next feature is to be implemented, there might be a younger colleague with you
Then you tell your colleague that infamous sentence...
"Do that the same way I did before"
One immediately recognizes the copy-paste pattern in your prior code
We will now see what happens after you get into this situation
We already have an abnormally large and convoluted piece of code...
What will you see in that huge function?
Nested branching instructions...
... loops inside loops, some of them inside else branches, others nesting if-elses...
... seven nesting levels deep...
There is one useful technique, where you reduce font size to 3 or so
And then you observe the code structure
If indentation goes wild, than that is really bad
Code should have one level of indentation
Possibly having one branching as a shortcut out when function is not applicable
One look at such code tells you what it does
Two levels of nesting deep, you already have four execution scenarios
Adding functionalities further leaves you with little options
You quickly get into that proverbial situation...
"I had 76 bugs; fixed two of them. Now I have 78 bugs."
Touch one piece of code, and you'll break some random feature that used to work fine
And so, what makes us want to refactor/redesign code?
For one thing, extending convoluted code is hard and slow
That happens when user asks you for a new feature and you give him one-month estimate
The second issue is that changes are affected totally unrelated segments of code
That is what we observe as brittle code
It makes you fear from making a change
This first issue is related to lacking decoupling
The second issue causes lots of regressions
Regression occurs when a prior feature stops working
To make things worse, ugly code usually doesn't come with tests
That keeps you in that vicious circle where adding new features breaks existing ones
The first issue means that the code is not extensible
The second issue means that development is slowed down
Which all brings us to the code example
What does this application do?
I'll start the application and you will see...
This is all the result of running one single function
It constructs one Sudoku board, fully populated
Then it removes some values from it
And then it starts solving it while trying to explain all its moves
Like telling that cell in row 4, column 1 can only contain value 5
... and so on...
Here it says row #6 can only accommodate value 9 in cell #2
And so, it keeps thinking until the very end
In this case, it succeeded in solving the entire board
But the problem is that sometimes it hits an obstacle
Here it has found that there are multiple solutions and it must choose one
Occasionally, it will fail to find any solution using the rules, I have implemented
Here it has happened; after lots of thinking, it has figured that it knows no solution
Those are the issues we are facing so far
Now if you wondered how does all this work
Here you can see that one single function spanning between lines 10 and 1012
That makes it a 1001 line of code long function, including comments
It first constructs the board
Let me show you what that code looks like
That is it...
This is the depth-first search algorithm implementation...
... with an additional heuristic named most constrained variable first
This algorithm can solve nearly any Sudoku in less than 100 states visited...
... which is fascinating, knowing that the entire board has 81 cells
And here we find this strange code which prints the board out
That leads to the next code segment
Here a heuristic is telling how many numbers should remain on the board
Many Sudoku implementations are using a similar heuristic, leading to poor results...
... but I simply didn't know what else to do in such an ugly code as this one here
Further on, the algorithm is making some changes, forming the initial board in the end
Here you can see that piece of code repeated
Half of all this code already exists somewhere above
When you catch yourself writing a long function...
... you will notice loads of duplicated code
That's because you need that code in multiple places, often with slight variations
Here I'm building some structures that I need
This code is so strange that it's very hard for anyone to understand it
After that, the loop begins in which solution is sought
As long as it makes changes, it keeps going
Calculating candidates for all cells...
Not very long piece of code, but very ugly
After that... now look, some changes require a loop inside a loop
Right after those nested if-elses above...
Here we have a loop, a branching, and inside of it another loop...
Then, inside that main loop, which seems to be solving the board...
... then comes another loop which deals with separate steps
There was that printout with all the thinking before the program concludes one number
Well, those intermediate steps, which do not produce the number themselves...
Those steps are applying concrete rules
Those of you who are solving Sudoku, they know this:
There are cells with only one candidate
That is one rule here, some 30-40 lines of code
All this code is that one rule
The next rule to apply...
In a group of cells - a row, a column, a block...
One digit only appears in one place, even if there are other candidates in that place
We conclude that that one number occupies that cell
Here it took over 100 lines of code to draw that conclusion
Look at this code. It's disastrous...
It ends in this line
Then, you know that trick, when you observe a row, for example...
... some two digits you only find in two cells
E.g. (1, 5) you find in only two cells; then you remove (1, 5) from all other cells
That is what this rule is doing
It is about 80 lines long
Generalization of that rule is N numbers in N cells
That adds another 100 lines of code, much more ugly code
Look how deeply this code nests
It ends in a loop inside a loop inside a loop
And one more loop for the end
And only after that there comes a 200 lines long code segment...
... where I'm trying to figure if the board has multiple solutions
If it does, then I'm trying to choose one of them
That is one ugly code
I wasn't trying to make it ugly
I was only trying to write it in one function
All this is the consequence
Finally, the printout...
That makes it a 1000 lines of code function
The question: What is the problem here?
The problem is, whatever I touch here, something will break
While I was preparing for this lecture...
... I made some manual mistake
And I figured that there is some NullReferenceException being thrown
There are stacks there, used by those depth-first traversals to push board states
Some state was missing on one stack
I simply couldn't understand what's wrong
That is the situation you know for sure from your own practice...
When you spend a day looking at a 100 lines long function...
... to understand what made it reach that illegal state
Then I used a diff program to find what changes I made there
It was one of those algorithms I broke
Try to imagine me staring at the diff for half an hour...
... quitting in the end
It turned out to be one word mistyped in a line full of changes...
I referenced the wrong object, causing subsequent execution to break
Bottom line is that one typing mistake had costed me an hour and a half of work
What adds more charm to it, and you will laugh when you hear it, it all happened today
That is what usually happens at work
With code like this, that is the kind of troubles that happen you
Now I'm going to show you two examples...
... which illustrate our inability to extend the application further
This is the situation which makes Sudoku players nervous
What you can see is the actual printout from the program
It says that there are two solutions
The program might pick this one solution and solve the board
Or, it might go the other way around and solve it differently
This is the point when the customer says: No more boards with multiple solutions!
And my response is: I cannot make that happen
I mean, I can...
That depth-first traversal algorithm I used is efficient in finding a solution...
... but it's very slow in finding the second solution
It must touch millions of states during traversal before it hits the second solution
In fact, I can use that algorithm to remove digits as long as the board has a unique solution...
... but then, initial board generation would take 10-15 seconds to complete
Then I ask the customer: Can we hit the production with what we have?
The second example is even more interesting
Here is how the application thinks; this is actual printout
It makes a note regarding column #9
It has noticed (1, 3) only appearing in these two cells
It removes 1 and 3 from the remaining three cells
Then it observes the third row, after removing 1 as the candidate
Then it notices that 2 and 9 only appear in these two cells
It automatically removes 2 and 9 from the other cells
Now when this one here stays without candidates 2 and 9...
... then it notices that 9 can only appear in row #8
Then it concludes that this cell's value is 9, full stop
This is the actual printout from the program, where it concluded that 9 fits into this cell
I forgot to tell you what the greatest tragedy in long functions is
Do you know what's so tragic about them?
While you're thinking... let's see what's wrong with this printout
Those of you who solve Sudoku will know that ...
... this effect does not depend on these two premises
It does depend on these two...
But not on this one, and that one...
In other words, the printout should have been more compact
If users complained about long explanations...
... that would be the feature I am utterly unable to add
So, there are two features which I am unable to add to that code
That is the point at which your project starts falling apart
When customers bring you money and ask how much does it cost to implement...
... and you say: "We're unable do it, unless we start it all over again"
And the answer to: What makes working with large functions tragic is... that they work
If they weren't operational, you would instantly receive a budget to do them all over again
As long as they are operational, you are sentenced to maintain them
To make sure, I did not tell you should bring your long code into defunct state
That is why we have gathered here... I will demonstrate the process on this example
There's the whole bunch of issues here
Let's start fixing them
I presume that all of you know the basics of refactoring
For example, introducing a variable, extracting a method...
... extracting an interface...
... moving a method to another class...
... introducing a dependency...
I'll skip a survey how many of you have read the Refactoring book from Martin Fowler
I don't know what your impressions were while you read it...
Fowler has imagined refactoring as a...
... formally provable method of constructing code semantically identical to the original
The author has been observing code as an object...
... and he was constructing another object which satisfies the same predicates as the first one
I may have lost you already on this...
Anyway, basic refactoring techniques: extracting a variable, a method, an interface...
... they produce code which compiles to a different output...
E.g. if you had a concurrency or locking problem, it will usually grow more severe
But that is not what refactoring is meant to deal with
Refactoring deals with semantics
If you had one thread, this input, and that system state...
Then the final state will be the same when two versions of code run
That is the essence of refactoring
You will then quickly realize that refactoring is of little use on a huge function like this one
Refactoring operates in two planes
The one plane is that basic one, and it works well when writing fresh function
Refactoring helps keep the new code from going wild
When a new function grows beyond 100 lines in length, you're toast
Your goal is to keep it below 15 lines in length
You want it to properly show its dependencies
When the dependency shows up, you want it to have a proper name
That's what you find out while writing the function
You should have that practice in your fingertips
That is the muscle memory... You don't use brain for that
For years already, I've been coding like that
I never write functions longer than 15 lines
And even if it happens that I hit that ceiling, I cut that out right away...
... leaving the function with no more than five lines of code
At least 80% of all functions I write are one liners
That is another story...
In my opinion, a function should perform one transformation at one level of abstraction
A function collects results from lower levels of abstraction and puts them into that transform
In fact, every function is just a formula...
... applied to data collected from other parties, that's all
Done that way, your code is very easy to combine
... applied to data collected from other parties, that's all
But that is another story...
With this, we will close the topic of pure refactoring...
... which is meant to lead to creation of semantically identical code
What I plan to do is not going to leave us with semantically identical code...
... because that is exactly what I do not want. This code's semantic is awful
I will combine refactoring techniques with redesign...
... and you will witness how I walk on the edge, trying not to stumble and fall
What's the difficulty in pulling the code out of a 1000 lines long function?
The difficulty is that every once in a while you'll plant a bug in it
You may have changed the function
There's another huge issue relating to complex code
The issue is that you don't know the bugs that are already in
Users often see bugs as features
Users are... using the application and they see patterns in its behavior
They accept strange behaviors and use the application as if that was normal
You're bound to not fix the bug, because users expect the application to keep running that way
There's also that concept of "our dear bugs"
User knows exactly what doesn't work in your application
Then he learns to work around damaged features
If you choose to fix the bug, you will render the user's workaround non-applicable
Ultimately, the user will ask you to please return the bug back in
And so, let's see...
I will close all these regions for now
Which reminds me to mention regions
Do you know that regions are pure evil?
When you put a region in code, you are admitting bad design
And I have added regions to this design to emphasize how bad it is
This first part, constructing a full board, is full of garbage
It constructs something strange... here it is
If you tried to read this code before arriving, I doubt that you knew what's going on here
This code is in fact constructing a matrix of chars, which can be printed out to console
Down below, I'm really printing that matrix
But look here...
Right in the middle of calculation which digit can fit where, in this depth-first algorithm...
... right in there, I'm also calculating where to put a char - for it to appear on the screen!
I'm mixing presentation logic with domain logic
In the end, I'm joining those into strings, splitting them with new lines and printing
I will remove these region directives before refactoring
The first step - this whole block, which constructs the solved board...
... I will pull it out into a new method
I'm sure you know how extracting a method operates
Refactor -> Extract -> Extract Method
This is ReSharper I'm using, this is its Extract Method feature
And now I need you to focus...
It asks me whether to extract a local function or a method
Method it will be
But read this at the bottom...
"To enable undo, open all files with changes for editing"
Make sure to tick that checkbox
Ctrl+Z is your savior when you make changes
When refactoring, it should be mandatory to open all documents with changes...
... so that you don't have to git-pull yesterday's code because there were changes in 15 files
And so, we arrive at the dialog where the tool suggests the extracted function
These are all the variables the new function will need
It says that these two variables must be the output variables
Variable Random - this is the randomized algorithm, though I don't like this
But this one here - that is the state stack, used to traverse states in depth...
... that stack is indicates as an out parameter
When you see out parameters in refactoring, you should issue one precise hit on Cancel
Let me see - state stack
One useful feature in Visual Studio, an old one, is this highlighting
Select the variable you want to see and then scroll down
This is where it's used... here again...
I want to see usages outside this block, because this will all go into the new function
This part is what comes after...
Here, I'm really using the stack
That algorithm above leaves in this stack the path of states as it populated the board
Then this code peeks at the top of the stack and starts from that state
Further down we might encounter something even worse
Still nothing interesting
Some of you might ask why didn't I right-click the variable and choose find all usages
I had that feeling... But alright, this has gone too far
This is the point where "find all usages" becomes useful
But in majority of cases you want to scroll a bit, just to get the feeling about your code
There was one ugly piece of code around here, I can't find it now
Alright, I give up... Right-click, find usages
There are 14 usages out there
I don't remember where that code segment was ending
Stack peek is where I used to read the final state
Now observe this view which comes from ReSharper, and these little arrows it shows
Arrow pointing to the right represents reading from the variable
Arrow pointing to the left, which is red, represents writing to the variable
This is exactly what I have shown in the first slide
Right in the middle I'm creating a new stack and reusing the variable
I'm overwriting the variable, and then I keep using it as if it were a new one
You will encounter this pattern frequently
This leaves me with 780 lines of code using one stack...
... and then another 220 lines using the other stack, while still calling it the same
That must be cut somewhere...
... but unfortunately, at least as far as I have seen...
... there is no tool support to rename a variable from this point onwards
Except by hand, I haven't found tools support for that
I will show you a trick which I'm using in situations like this
I will right-click the entire block again
There is that useful technique named "lean on the compiler"
Again, I select "Extract Method"
I'll give it name CreateSolvedBoard
The method will, as you can see, return a char matrix - the solved board
But then I will remove the out parameters
I want to detach everything outside from what's inside this function
This also means to break the build
And then, I'll follow all the things in red and fix them
Very often, ReSharper and similar tools will eventually burry you
Why is it so?
It's because their goal is produce code that compiles
While your goal is to know what you have changed
Interests of the two of you do not match
My desire is to break all the points where these variables are used
My desire is to break all the points where these variables are used...
This will force me read every line with great attention to detail
The red color is popping up, uncovering 44 compile errors
That doesn't mean 44 changes to make
You would usually fix one line and dismiss a dozen of compile errors
So, let's see these errors...
First, this 'var' keyword, I don't like it here
It's also too good to me, it also helps things compile easily
I'll put char matrix instead, and then see what happens
You want to hear from the compiler what's wrong
Later, we can make another pass to put IEnumerable or something similar there
Right now, I want to produce a worst-case scenario
I want compiler to force me enter each line which used to touch this
Here you can see that - do you see this Peek?
It was the first one to touch that stack...
That looked very suspicious, it's what I call making phone calls
The first entity pushes the state to stack...
... then this one comes to peek, not even a proper pop, as it leaves to others to pop...
The result is your inability to change some algorithm above...
... because somebody depends on what was left on top of the stack
That is wrong - consumer must depend on function result, not on state it produces
Now I must manage this somehow
I have that char matrix, I know, and I will use it here
I will invent some arithmetic transform to obtain an array of ints
I've just traded one dangerous piece of code for an ugly piece of code
I'll have to come back to this point later, but right now, I'm detached from that stack
Let me tell you one very important point about refactoring...
And a disclaimer again - this is not refactoring, this is refactoring plus redesign
You should only work on one code block, doing one thing at one level of abstraction
Don't make fixes down below, then back up again, then down for some more...
Once I get back to the beginning, I won't be able to remember what I wanted to do
I'd already have at least one half of a proper bug by then
I may have forgotten some mandatory changes while I was digging down below for half an hour
You have the picture. Don't go into that trap
Stick with one single target which occupies your thoughts
While keeping one thought in your head, you'll ultimately complete it
If you start jumping form one task to another, you'll make a bug
Remember, this code has no tests; functions like this have no tests
To test a function like this, you'll need something like 700 tests
That would be if it had 700 execution branches
A sloppy refactoring and redesigning will lead to bugs which are hard to see
A week will pass before they surface
For that reason, I have added this piece of code only to keep it correct
My only interest here is to get grasp of an int array, with 81 numbers, representing 81 cells
Let's move on, and we'll get back to this point later
But since I'm strictly focused on doing one thing at a time, you'll see that this code will disappear
I will not even have to address it for being ugly
Look at the next problem
I'm using that random number generator here, because this next algorithm is also randomized
I'm forced to declare a new variable of Random type to use it lager
Any of you who has been writing randomized algorithms will know this is also half a bug
Seed for this random generator is current time
If prior function was fast enough, then two random sequences would be identical
Random class is dangerous because it is not thread-safe
You would be safe to create one Random object per thread
In that case, every thread initializes one Random instance
But threads initialized simultaneously, which is common, will all observe the same sequence
The truth is, whatever you do about the Random class, it's somehow flawed
This lecture is not about random numbers...
... though I guess many programmers would need one
The question right now is: What do you do when you see a line of code like this?
This line was only added to satisfy the compiler
This is the clear case of a problem not discoverable by ReSharper
The tool would hide this problem from me
Quite contrary is when you must deal with the situation
Like here where I see a new Random variable, and I know of troubles related to Random...
... that is the perfect moment for you to report a bug to yourself
How can you apply this refactoring plus redesign process and stay safe?
For all the time, you focus on making only one change in one pass
But you know that problems will surface as you go
And you don't try to solve those problems, but you create a bug report and write...
In line 39, a new Random instance is created, with small delta-T compared to previous one...
... consider this a non-functional bug, etc. Assign to yourself in current sprint and it's done
Otherwise, if you know that you're going to finish all that right now...
... write aside what awaits you before you can close the current ticket
That brings us to these reused variables
I will only declare them again
I have detached them from that function
The function has its own stacks, this algorithm which follows has own stacks and that's all
The code compiles fine and it should be working fine... Let's see
It looks like it's thinking, talking... and it has solved the board
This outlines the basic technique of extracting a new function...
... and to detach it correctly from the rest of the code
To make the compiler force you to pay attention to all errors
We will move faster from now on; the next topic will be small joys, so to call them
You may find three ugly lines of code like these and turn them into one line which looks nice
Little improvements like these always pay off, and pay off quickly
Look at this - this code forms one string you can print out, which represents that whole matrix
Right in the middle of that code, we find this
This operation cries out for a name, and I'll give it a name ToPrintableString
I want everybody to know that the string it produces is printable, not just any string
The code has become a bit nicer
I will use expression bodied syntax here
As soon as you put code on a separate line, it immediately becomes prettier
Next task - what is this code?
I can write this code in a different way
I'm preparing it to become a call to something
I want to format the message as a label, followed by a new line and then the board
And then I give that concept a name PrintToConsole
I'll fix the new function to be prettier, too
Up at the calling site, I'll transform it into one line and there you go
That initial LINQ expression, with string concatenations, new lines, and so on...
That has turned into a nice, domesticated call to a function
In no more than five minutes time, you will witness how good a decision this was
That same board - select - new string code appears in two more places
One line of code here
Here at the bottom, where the modified board is printed, we find that code again
Two lines of code this time and that's all
Those are the little joys, where fixing one little thing makes you feel better
Four lines of code less is not motive I needed to do this
Not having to watch that code means more to me
As I said, that investment will return quickly
Now we'll start dealing with more serious issues
For a long time, my problem lied in returning this char matrix
We uncover the question why board printing logic is mixed with state calculation logic?
I'm in pain all the time, my algorithms are awful because of that, let's fix it now
I say: I don't want chars, I want an int array with 81 numbers in it
For advanced users - I want a class named Board
We won't do that today, but we will arrive at the point where introducing Board is the next step
I want you to focus on this idea
When you have a function returning a type, you have the place to introduce a new type
And then you change the type...
Same "cold turkey" style as before, and compiler reports 12 errors
Also, make sure that there is no valid assignment from prior type to the new one
Do your best to break compilation
All usages must go red
And not only the type; change the variable name, too!
Even if you chose a bad name, you can always rename it back to what it was when it all ends
It is mandatory to overwrite the name with new name
The goal is to make all usages show their face
You want to see all usages, for example, a call to ToString
Or a variable used in interpolated string - that would compile just fine
What if I replaced an object by an array, then interpolated string must change
Now we only have to follow the compiler
PrintToConsole will, from now on, print this new variable
This is where extract function step pays off, even though the function is only two lines long
Now we can simply pass the new object to the function and expect it to cope with that
It used to cope with the prior object, it will cope with the new one, too, and off we go
Here? Ah, this is that expression! Look at it now
The function is now returning what it needs; the whole expression has disappeared, see?
That whole expression was a design flaw, it was never meant to exist
My advice to you is to endure a few cuts and bruises as you refactor...
... because the next time you return, you will come with vengeance
Next occurrence - this is where presentation logic used to mix in
I'm working with state, and then repeating that same change in the printout matrix
I don't have to do that, because matrix doesn't exist anymore
Same thing here, printing out the new board object
But beware! - this line was a trap
I am sending an int array, true, but which one, exactly?
I had to read code carefully; see this 'board' variable? It used to be 'board' before, too
This is what makes this change dangerous; you cannot lean on automatic changes
I had to find what this "Starting look" is... here I initialized a new variable for it
I remember now, this was the line where I pulled out the new variable
That is where I parted ways with prior code, and I must be careful to not plant a bug
Same issue here, writing to matrix - farewell
Writing to matrix again - deleting it
What's this? Calculating matrix position, writing to it - delete all
Deleting this, too
What else do we have? Oh, this is something terrible
This is probably that rule which finds N numbers in N cells...
... or the other one, which finds multiple solutions
It's one of those two algorithms, it had to calculate multiple changes
Can you see how complicate this is?
Right in the middle of an algorithm which is complex in its own right
I'm deleting it all
I'm printing the new state in the end
You can see that I have changed all variable names
I'm leaving no room for accidental successful compile
All these lines are broken because that old variable 'board' doesn't exist anymore
(Shouldn't it be that [inaudible] variable?)
Oh no, not a chance, if you observe closer, you'll see that this is the loop inside the loop
So, it definitely is the 'state' variable...
Listen, guys, you cannot do this job and keep your hands clean
To cut the long story short, it's the 'state'; I've been reading my code, too
Current state of affairs is that lots of things above the cursor are red...
... but it's only the function calls that are still red
This is where we push the problem down to the next abstraction level, the next function
We solve the problem by telling the function that it receives int array from now on
With this change, all code above becomes green, while some code below turns red
That happened because this function is now calling the next one in line
Now the next function called receives the new type
There is that wave, you have to push the change all the way through to the bottom of the call tree
What's good about it is that the compiler is leading the way
Remember when I told you not to go deep and then return, you'll forget to do something
With this technique, there's nothing to forget - compiler is telling you
At any given time, you're only dealing with one function
I may have broken 30 functions, but by fixing one at the time, I'll fix them all in a day
And that is all there is, really
Untangling 1000 lines of code is half an art, half a craft
In your toolbox you carry a hammer, pincers, tin snips and stuff like that; you don't play nice
And now look what I'm going to do; I bet none of you expected to see this
This code used to concatenate those chars, but I don't have chars anymore, I have ints
Well, this code does the same thing...
LINQ... you can construct amazing expressions with it
This expression, it worked right away; I'm never mistaken with expressions like this
LINQ is like that...
It puts everything in its place for you
It helps you build enormously complex expressions in a steady, systematic way
You'll never be able to modify these huge expressions
When somebody asks you for a change, you start all over again instead
But LINQ is great to prove the concept, or to put a problem under the rug
I've even added this Join extension method
That's one of my quirks...
Even in production code, I'm very fond of console applications
I can easily prove concepts in console
And for all that time I always missed the Join of IEnumerable of string
They'll never add this one line of code to .NET Framework for me
And so, in all my projects...
... you'll find enumerable extensions with the Join method on IEnumerable of string
Let's join in celebration in this moment...
... for the first time since this lecture started we have extracted a class
Not a big one, but it's the first one There will be more of them soon
Right now, everything compiles, except this one
It's that the function is not returning an int array, as it still returns that char matrix
The time has come to change it, but look...
CreateSolvedBoard is home for a fairly complex algorithm
There is an explanation on that site of mine (codinghelmet.com), and it's not easy at all
Now imagine me adding presentation logic on top of all that complexity
By doing that, I've added one component more to puzzle the reader
This code here is that printable matrix initialization; I'm deleting it
Now that I've deleted it, I find that it was the matrix declaration
Now the compiler tells me where I used to address it
I'm simply following the trace
This I need not
Nor this
I don't need this either
Next, here I find the state
Do you remember that stack peek we had before?
This is the new place for it
That's because this algorithm has formed the stack of states
That stack was leading the algorithm through the space of states until the board was solved
I'm calling Peek right here, because I know that the stack top is the state I'm looking for
There's no telepathy anymore, when the caller knew exactly how this algorithm operates
Such behavior is forbidden
It looks like I got carried away with this Sudoku, but don't get me wrong...
... but let me tell you before I forget: Business-related code should look the same
You will find all these effects there, too
It's not just that I'm solving a board game and that's somehow different...
In financial software, games, and so on, you'll find all these principles and mistakes
Let's apply the technique once again
CreateInitialBoard function, let me find it
It receives a solved board and returns the board with holes
That code at the beginning, let me find it, those two regions if you remember
It's four lines of code now
It creates a solved board and prints it
From that it starts off to construct the initial board, then prints it and then goes on
Again, we have numerous compile errors
I've pulled the guts out of my code, and now I'm going to put them back in
I'll go fast from now on, as this is more of a tinman's job
What is this code doing? I don't know... I can't remember this code
I think I know, one state variable is missing...
... because that one used to... no... I don't remember... I was doing this code yesterday
The next piece of code is also using some random number generator
It's the same story as before, we've been running in circles
I'm still only making it compile
This is a good example of what I've been doing - 'finalState' and 'solvedState', a different name
That made the compiler make me think more closely
I had to think deep before selecting the right variable
I'm only satisfying the compiler, expecting the compilation to pass
OK, your turn, are there any compile-time errors? No, not any more.
I have just succeeded in extracting yet another function
By this point, we have completed the demonstration of one refactoring pattern
There is one more pattern left for you to see, and after that the lecture will be finished
What follows will be the most beautiful, as for the first time we will see objects
This loop is producing that string of sentences you could see on the slides
Each of these blocks is telling something
When I find the beginning of this code, you will see the whole Sudoku solving algorithm
As long as there are changes made to the board...
... it calculates candidate numbers for all (empty) cells
And then, while there are small changes...
... those that couldn't decide which number to write in a cell
Then it applies these rules...
It loops until these small changes stop happening
In the end, it tries to see if there are multiple solutions
It tries to make some changes in that way, too
Ultimately, if it was successful in solving a cell, then it prints the board out
Otherwise, it gives up
The process repeats itself as long as there are changes made; that is that algorithm
Now observe closely... a loop
More loops... loops are evil, you don't write loops yourself
Loops are always somebody else's job to do
Your job is to deal with logic
Then you turn to LINQ, and let it apply the logic to all of them
As soon as you start looping on your own, your code grow uglier, more complex...
... there's a lot more to think about...
That's the code which looks like that first slide...
... add a bit of code inside the loop, a bit more after the loop...
In a blink of an eye, your code will be ripe for refactoring, but refactoring will become difficult
For that reason, avoid loops
Here we find a loop I made, with four operations in it, 320 lines long all together
These blocks of code are of the same kind, and that is where I want to see objects
We haven't been extracting classes so far, only functions
If you look down, you'll find all the code I used to extract
I have turned a 1000 lines long function into a 1000 lines long class
That is still not what I want, I want these functions to disappear
We can achieve that by extracting objects, and then handing the work over to them
That is the greatest benefit from redesign
But now we are in a difficult position, because it is not easy at all to pull out objects from here
You'll see in a minute - those new objects will overwrite each other's state
That's because all code here overwrites all the state
But we will still do it
Because that catastrophic design problem, that red flashing light, will become obvious
We will then write new classes, an entirely new solution...
... localized to that subsystem, which will then separate responsibilities correctly...
... encapsulate states correctly, ensure that only dedicated objects can make changes
Each of these four steps can be a call to a function located in what I'll call a board change
I'm creating an interface and thinking what it will do? The change will apply itself!
Right now, I am entering that morass, which I wanted to get out of...
... as I have three states, two booleans and a board, producing new booleans and board
I'm allowing this as I'm constantly trying to address one problem at one level of abstraction
Later, we will give all this a name, pull it out to a class, and so on
And then I read this first rule, which looks for the sole candidate in one cell and makes it official
I'm creating an implementation of this interface, and naming it single candidate
I've given a name to the concept, and now it has the Apply function
Can you imagine what Apply will be? It will be that whole block, copied and pasted here
Then I simply run through the pasted code and look for the red color
This is the same situation as before, in which I do not use ReSharper to help me...
... because it will pull all dependencies for me; I want a bare function signature instead
From that I want to see what's missing before it compiles - all that I will do manually
You will recognize two kinds of issues here
This random number generator, what is it? - it is a dependency
I don't need that as function argument; one instance will suffice for the lifetime of an object
What else do we have? - we have this candidate masks array
A loop before this function call used to calculate it
That changes with every call, and it will become another argument
Observe what I'm doing with it. I'm removing a candidate from the mask
Each candidate mask is a bitmap, I didn't want to bother you with that
Digit 1 at one position indicates that corresponding number is a candidate
Anyway, what really counts is that I'm overwriting that
It turns out that this object is overwriting the global state, state residing outside of it
That is the design flaw, half a bug. The other half waits for you to write it, if you haven't already
But I won't do anything about this!
I cannot deal with that issue while in the middle of refactoring
If I did, I would probably forget to do something else which was important
On the other hand, I have a couple of helper variables
To make these relatively slow algorithms a bit faster...
... I'm using this lookup telling where one bit resides in the bitmask
I also have this counter telling how many ones there are in a bitmask
I have pre-calculated those lookups, but they will never change
Take a number, like 11011, it has this many ones, and that will never change
I have calculated those lookups out there, and they will become my dependencies here
That is what ReSharper, or any other tool for that matter, cannot do
It knows not what changes with every call and what remains the same for the object's lifetime
That is why I'm keeping that decision for myself, doing that 'dry' copy-paste...
And then we start, follow the red color, follow the trail of blood, and solve the crime...
On the other hand, here we find another piece of terrible code
When I see writing data somewhere, this writes to Console, but that doesn't change anything...
... as it could be any transmission here
When such code is extracted along with other adjacent code, that is a huge design flaw
But I won't address that issue either
I will define dependencies in the class, and set them through the constructor
I will reference private properties in all usage points
That leaves with this single identifier in red letters
Now it becomes clear that this abstraction must receive one additional argument
That will be an int array with candidate masks; I still haven't turned that into a class
A lot of dirt remains, but I'm not cleaning it up as I remain focused on this single abstraction level
I'm stepping into the interface, adding the argument there and now everybody's dirty
But when I step back to the origin, I can select this entire block and replace it with - this
It's ugly, but it works
I'm defining the rule object, and asking it to apply to current state and return the next state
(Q: Why didn't we define a class for the rule, and then extract interface later, if needed?)
Because I have already seen that all of them are the same
But you were right to ask that. You shouldn't haste with introducing new interfaces
I have been reading this code to such level of detail that I have figured that...
... all rules are cracking the same three variables (and the fourth one) in the same way
Let's finish it up, then; all the code that remains follows this same pattern
Here we have the next rule in line
This one looks for a digit which can only appear in one cell inside a set of cells
Nothing special here, a new implementation created, which you don't even have to read
Pulling out dependencies, selecting the original block...
... replacing it with this; we can already see an organized structure emerging
The next region...
This rule finds two candidates in two cells, and removes them from all colliding cells
I've assigned it a name
Copy and paste of that code from over there
Introducing dependencies...
You don't want to read this detail, it's a...
It's a pre-calculated index mapping cells to rows, columns and blocks
You'll need an hour to get grasp of this code
That would be subject of a thorough redesign, not refactoring
Next thing to do is to use this new rule, and the last rule I have is...
N numbers in N cells, everything looks the same as in previous cases
I only want to show you all four rules in their extracted form
The same dependencies are missing as in the previous rule, because this rule is the same...
... only raised to a more abstract level
Now I want you to observe the structure of this loop
It used to be over 300 lines of code
I could even add an effort to improve formatting
The point is that, as soon as you invent an abstraction, it only takes one step...
... to invent a composition of that same abstraction
Composite pattern is the most powerful of them all
This is what it looks like...
You would define one additional concrete board change
And call it something like composite change
It has the same method signature as before
But internally, it consists of two things
The first change, and the second change, which is to say all the rest
Composite change then first applies the first change...
And then subdues its result to whatever change may follow
Isn't this great?
Also, whenever I make a composite, I make sure to create one more component
That is the extension method attached to the original abstraction
The method accepts those two board changes
Look closely what it is doing...
It is the extension on the first one, it receives whatever follows, and its name is 'Then'
This lets me write code like A, then B, then C, then D, then E, and it turns great
Let me show it applied to this code here... This is what it looks like
As soon as the first abstraction appeared in the project, I was suddenly able to do some good
If you wanted to know how this works, I'm creating the first board change here
Then I instruct it which rule to apply next
And the one that comes after that, and one more...
Finally, take that entire chain, and apply it to the state
Begin with 'stepChangeMade' set to False, and run till the end
The next possibility here, which I won't apply, is to make this whole loop disappear
I could introduce a new abstraction which represents a sequence of changes
I would then start from this chain...
The next call would be some extension method which says "keep applying" to the initial state
It could return an IEnumerable of this triplet
My loop would then turn into TakeWhile - stepChangeMade, and disappear entirely
Then you encapsulate that into an object, and all this becomes one line of code...
... this whole chain and a few lines surrounding it
I will leave the code in this stage, I don't know if I can run it - today it worked fine
This is the final stage which we have reached mainly through refactoring
As you can see, it's still thinking, talking, finding multiple solutions, that's all it ever did
This is the current state of the function...
... I'll close this block...
You can try to extract a method from this region... And from this region, too
These are those helper objects I used...
And here is the loop...
This block is the worst of them all, looking for multiple solutions, about 280 lines of code total
In any case, I have reduced those initial 1000 lines of code to some 400 lines
Had I extracted the rest of the methods, this function would only be about 30 lines long
Had that been my target, then my target would be accomplished
If you knew that original function well, then this process would take some two days to complete
That was not a joke, that is simply what it is
Where exactly will then we find ourselves after those two days?
To be honest, we have attained an object model which any person should be deeply ashamed of
It's because all the objects I have right now are overwriting each others' state
That is the ticking bomb
That is the moment when we realize that the next step is a thorough redesign
We could start that endeavor this way...
What is this int array here? There will be no int arrays
A new class named Board is what we need
All the code below would turn red again
Constructing an initial board means that some digits are removed from fully populated board
Constructing a board means that the Board object needs modifications
A Board would expose an API to construct a new board - mutable or immutable, your choice
If Board is immutable, it would return a new board with one digit less on it
Further down we meet these lookup structures
Those structures are revealing which subsets of board cells can cause collisions
Those are the definitions of rows, columns and blocks
Board class would have to expose properties returning groups of cells
That would allow me to write algorithms that are avoiding collisions
The board would have to consist of cells, not of ints
A cell could have a value, or be empty, or have a set of candidates
Do you understand? We would build objects, a proper object model
This existing code would start regrouping, there are heaps of duplicated code here
All these algorithms are traversing rows, columns and blocks, to recognize satisfied rules
That practice would now end - all of them would ask the Board for cell groups
Each algorithm would then apply its logic onto every group of cells
Roles reverse; those candidates, which Rule objects used to overwrite, would belong to cell
None of the algorithms would hold it
In the end, another two days of work later, you would finally reach a very clean object code...
... code which doesn't require further refactoring
That would be a true object-oriented programming, which is much nicer than this
What we have suffered so far is a mandatory step for design flaws to become obvious




Không có nhận xét nào:
Đăng nhận xét