A broadly applicable JavaIdiom: Avoid aliasing object state by returning copies of state data from accessors. It is also an idiom for ObjectiveCee.
Many accessor methods need to return values which are references to objects which represent a part of the state of the called object; directly returning the state in this form is called aliasing: the same object has two apparent identities (aliases). Both the original owner and the client can change the state of the object in ways that are unpredictable to the other causing the program to behave incorrectly.
The problem is that the value being returned by the accessor is conceptually immutable while the actual object being returned is not. We want to give the client the data it needs in the form that it expects; the client should not be constrained in the ways that it may manipulate the returned object; and the object should fully implement its contract without surprising side effects. In other words we must give the client a fully functioning, mutable, object. The inital state of this object must reflect the state of the queried object at the time of the query and must vary independently thereafter.
Therefore:
ReturnNewObjectsFromAccessorMethods. Whenever the state of the object is queried create a new object to satisfy the query. Often it is best to store the elements of the objects state in the form of primitive member variables and construct more complicated objects as the return values for the accessors.
For example:
public Point getPoint() { return new Point( _x, _y ); }It is also common to use clone() methods or copy constructors to duplicate member variables in accessors.
Once you've done this, the immediate problem is solved: there is no aliasing between the objects used to implement the state of the queried object and the objects returned from its accessors. You will be creating and deleting more objects than before and extensive use of this kind of idiom can ultimately cause performance degradation. But there is another problem that deserves more immediate attention: this idiom assumes that you will be returning MutableValueObjects. In Java ValueObjectsShouldBeImmutable so you should consider the possibility that your design needs further evolution before it is complete.
The primary indicator for use of this pattern over other alternatives is the need for a mutable return value. If that is not a requirement then you can ReturnImmutablesFromAccessorMethods instead. This may require inventing a new ImmutableValueObject and so should probably be done as a separate step. If efficiency is a concern you might UseaCollectingParameterInAccessorMethods. which shifts the burden of object lifetime management to the client. MartinFowler suggests: "...that one of the reasons this issue is so awkward is that there are some objects that should be immutable, and some that should not. I've found useful to distinguish between ValueObject and ReferenceObject. Then ValueObjectsShouldBeImmutable while ReferenceObjects should not." This clarifies the role of the object in the system: if it's immutable it should be treated as a value, otherwise it should not. In general clients should not query for ReferenceObjects at all. As a rule they should use TellDontAsk and the LawOfDemeter in order to get their work done. In practice, however, TightGroupsOfClasses often violate these rules and will want to share ReferenceObjects. You should not ReturnNewObjectsFromAccessorMethods that give access to ReferenceObjects. Instead you should ensure that all clients of a ReferenceObject honor the semantic constraint that it is non-locally mutable.
Also known as DefensiveCopy
I should like to investigate this theme a bit. It seems I have a different view of the world concerning "directly returning the state in this form will expose the internals of the called object, and the caller will have the ability to modify that state unpredictably" .
I like to create objects to "localize design decisions", not to "hide information". I really don't mind giving out the state information of an object. I think maybe I just have never been on a project on which I couldn't trust my colleagues with my state information. What I care about is that when I learn something new and need to effect a change, then I only have to go to one place to make that change. Perhaps someone who disagrees can tell me what kind of a project they were on that motivates their worry. It seems that if the other objects shouldn't be updating the point, then they shouldn't. And if it is ok, then it is ok.
Perhaps taking an example one step more complex might help (and then again, perhaps not, we'll see). If I am working with a bank account sort of object, it has a Journal, which is a log of all the Transactions, perhaps with balance snapshots. If someone asks for a Transaction from the Journal, I am quite content to give them the Transaction. Now that they have it, they probably should not go changing it - but I don't think that is for the Journal to decide. And the reason I give them back a Transaction object instead of an array holding the same values is not that the Transaction stops them from changing the values, but that the Transaction is a holder of design decision, and in very many tests, I find that there is less work in upgrading the design if I have Transaction-level design units compared with strings and arrays.
When I look at the code of people who to like to protect their objects' internals from their colleagues objects, I see large (enormous) amounts of what seems to me as unnecessary text, copying objects and replicating interfaces.Before signing up for this new, heavier and more complicated design style, I should like some more motivation on just why it is needed. -- AlistairCockburn (thanking you for a good topic)
This idiom is a symptom of a development organization that is uncomfortable moving responsibility around. Speaking technically, if an object holds another object, but it performs all the operations anyone in the system ever want done with the child object, then there is never any reason for the accessor in the first place. Speaking organizationally, I most often see idioms like this in conjunction with "code ownership" and developers working alone. I remember saying to one client, "If this object we are working with only supported XXX, we could delete half of our methods." "Yes, but that's Joe's object (Joe sat right across the aisle, for goodness sake)."
Some might say this idiom is a sign of bad design. If objects were designed to do what they needed to do in the first place, then the problem wouldn't arise. I can only make good decisions about tactical division of responsibility from the code. I'm not smart enough to see them before I have any experience. That's why I said that shifting responsibility is the problem.
"Shifting responsibility" means both that the code has to be able to move to where it belongs in the system, and that the developers have to be able to quickly accept responsibility for small bits of new work. PairProgramming is one excellent way to achieve fine grained and low cost shifts of people and hence program responsibility. -- KentBeck
I don't really understand that characterization, since I believe that any class must ensure its own behavior. In this case, it is a Java idiom to return certain types of objects, such as Point, rather than just the attributes of those objects, since so many AWT classes expect them. Then, too - this is not a question of secrecy or trust, as much as it is one of making it easier for users of the class to do the right things and harder to do the wrong things. We don't want clients to manipulate returned internal state directly because that would create excessive coupling between the class and its clients (who would have to know the class's internal validation rules). For example, it may be that the Point returned has some required relationship to other Points in the class. If the client changes the returned Point, that might invalidate an invariant of the class - the rules of which could change independently of the use of the class. -- RussellGold
I guess that is the part I am questioning - whose responsibility is it to see that the "client" does not change a value? Maybe it is exactly the client's responsibility; perhaps we don't yet know whose it is. If an object shouldn't be changed, shouldn't it be that object's responsibility? What I am struggling with is the notion that it is one class's responsibility to keep anyone else from changing any of its referents' contents. When I have seen this done, I see massive amounts of code, harder to use, not easier to use. I am searching for a quantizable or demonstrable or citable benefit, and not a cute answer on either side of the issue. So I wish to make sure I offend no one with my phrasing, and I still want to be convinced. -- AlistairCockburn
In the transaction example, I think the idea is that when I ask for the transaction, its because I'm about to do something with or to it. So the first stage is to ask if the system is trying to tell us that the *bank account* object needs new protocol. The system might be better factored if we extend the bank account object instead of trying to find safe ways to export a transaction from its journal.
The second stage is to recognize that by exporting the transaction, we've implicitly wired into the code the assumption that bank account objects have journals and that journals have transactions. The implicit nature of this assumption means that, in my experience, we've planted a time bomb...sooner or later, someone will change the structure of the journal or its transactions and break our code. Either that or they'll be unable to do something they need because our use of their transaction precludes it.
In any case, we've violated the encapsulation of the bank account objects, no matter how careful we are about protecting it from change.
Consider a display object that has a selection region that needs to be highlighted. One approach (the one discussed here) is to surface the selection region, which is then operated on, as follows:
aDisplayObject selectionRegion highlight
Another approach is to define the highlight operation on the display object itself:
aDisplayObject highlightSelection
In the latter approach, the display object no longer reveals its internal structure. Instead, its protocol is extended to support the client's need.
My suggestion (perhaps Kent agrees) is that the latter tends towards lighter-weight solutions. -- TomStambaugh
The original description of this idiom did not address when it should and should not be used.
This idiom is generally used when returning small common objects that might well be returned by value in C++. Since JavaDoesntPassByValue (or return by value), that option is not available.
The example that TomStambaugh gives is clearly one in which this idiom is inappropriate - where the whole idea of the accessor is to provide access to an object which the client should manipulate. -- RussellGold, PhilGoodwin
This is much more than a Java idiom. This discussion runs rampant in Smalltalk groups. Around 1994, it seemed that only Ward was willing to say Accessors Are Possibly Not Wonderful; the rest of my contacts all swore by them. By mid-1995 that changed, and the infrastructure team on my project went through the whole "ReturnNewObjectsFromAccessorMethods, and also MakeCompleteProtectiveInterfaces? while you're at it," roughly as you described on the immutables page. Suddenly, accessor fans were in the minority, and a friend of mine nearly ate me alive at a meeting for even suggesting Accessors Are Possibly Useful - my environment had reversed position. Accessors are only part of the story - the real story is accessibility and mutability. This issue happens in all of the languages, and what I am trying to sort out just now is, under what circumstances do I care about handing out a mutable object? I personally, rarely care. So I am looking for an example of when I should really care.
I withdraw my world-dominating-conspiracy theory. If an object has instance variables _x and _y, it certainly makes more sense for the external protocol to consist of "Point location()" than "integer x(), integer y()". Reduced surface area and all that. Why wouldn't you just store the Point in the first place? If you then have aliasing problems, you should consider using ValueObject (immutable). -- KentBeck
What you really want is a ConcreteDataType?: an object whose value (not identity) is transferred by assignment. In languages like C++ that support ConcreteDataType?s ReturnNewObjectsFromAccessorMethods is automatic. Clients ofa CDR can modify its value without allocating a new instance. Languages like Java and Smalltalk use ValueObjects instead. With a ValueObject each operation allocates a new object (which extends the ReturnNewObjectsFromAccessorMethods idiom to include mutators as well). This can be used to give the appearance of object mutation without aliasing problems. -- RussellGold, KentBeck, PhilGoodwin
I too have often thought about problems associated with accessor functions. I think language extension is required to improve the encapsulation process.
There are two main reasons for having accessor methods:
I think there is a better way of providing both (1) and (2). It would mean extending the language, but might go like this:
For any data member object 'a' in a class we could provide the following syntatic method of access (note there is no need to generate actual methods) :
objectA.a = c ; // Assignment (already provided in Java) objectA.a( c );// Pseudo assignment via function-like syntax c = objectA.a();// Pseudo value via function-like returnThen allow the pseudo functions to be overridden (providing B above) in derived classes. The opposite would also be true. That is the two functions
void myFunctionA( int a ); int myFunctionA();Could be used by the class user as a variable, as in
Obj.myFunction = 10;or
int a = Obj.myFunction;This isn't that absurd. You couldn't do this in C++ because a function without ( ) was a function pointer. This restriction does not apply to Java.
If an object is declared with an additional keyword (lets call it 'snapshot'), by value copying (or new instance generation) could be carried out for us. This could either be achieved by returning a new instance or by extending Java to provide a method of creating byValue copies (using a copy - constructor perhaps?).
So the declaration in some class X of:
snapshot Point mypoint;Could provide fully encapsulate state with intervention, without all the additional messy code.
-- BillMurray?
Firstly: I found this area to be my biggest single source of Smalltalk bugs (it far outweighed those caused by the lack of static type checking). For example, I'd want to position a tooltip just above a button:
Point aPoint = button.getTopLeft(); aPoint.y -= 3; // Up by pixels. aToolTip.setTopLeft( aPoint );And this reasonable-looking code would have the effect of moving the original button.
Secondly: I agree that it's better to use immutable objects. An alternative is to have conventions about who does the copying, but I think that fails when objects get passed around a lot and you can no longer keep track of whether they are the original or a copy. You end up copying twice just to make sure. I don't believe programmer trust is an issue.
Thirdly: Tom's issue of whether is it good to have accessors is separate from the question of how to implement them on those occasions you've decided they are good. I mostly agree with his points. However, it can lead to very large objects and/or interfaces.
For example, a SelectionRegion? can be a complex object in its own right, with dozens of methods. It will then have its own ChangeVelocity. When we add a new method to the SelectionRegion? we don't necessarily want to have to update the DisplayObject? class as well. -- DaveHarris
CsharpLanguage makes a distinction between types that are passed by value and those that are passed by reference. "Small" things, like points, will typically be C# "structs", which are always passed by value. I.e. if you ask an object for it's top left Point, you know you're getting a copy because you know that Points are structs. If you ask it for an "ordinary" (non-struct) object then (unless the documentation or accessor name suggests otherwise) you'll assume you're getting a reference to the internal object because objects are passed by reference. I.e. the language itself contains some useful conventions about when you get a copy and when you get the "original" object.
If an object has to ReturnNewObjectsFromAccessorMethods then it usually has to CopyMutableParameters in methods that change the state of their parameters.
-- NatPryce
I recently ran into this quite heavily. I was (well, am) building a framework for doing complex layouts. And in it, visual things naturally talk to each other and ask about their locations and attachments.
What I've found reasonable is to have two accessor methods. Basically, using the above example (of points), I do things like the following.
public Point getLocation(); // returns reference to internal state public void getLocation(Point returnValue); // copys internal state-- WilliamGrosso
The syntactic convenience of code like:
aToolTip.setOrigin( button.getTopLeft() );over:
Point origin = new Point(); button.getTopLeft( origin ); aToolTip.setOrigin( origin );is quite significant in my humble opinion. I like the concise, functional style.
-- DaveHarris
The "syntactic convenience" objection strikes me as irrelevant. They're both easy to read, easy to parse, and easy to understand. People sometimes complain about the amount of lines they have to type but, really, most of my time is not spent typing in code. If I cut the time I spend typing in code by half, I haven't saved all that much. -- WilliamGrosso
It's not the time to type so much as the ease of reading. The 3-line version is more cluttered than the 1-line version. It is more redundant. It pollutes the namespace with a spurious variable name. The variable itself is mutable and its value is different at different points in the code. All of this stuff distracts from what the code is actually doing. It is not as easy to read. -- DaveHarris
I think that one of the prime motivations for this page was something "We need accessor methods and we don't want to allow the client to change an object's state without at least using a method on the object." Alistair's comments at least imply a sort of "why not expose the state" philosophy. And my response was: I do this using two methods. One which exposes the state and one which doesn't. Among other things, it lets the client decide what role the client is playing and establishes clear ownership of the returned value (you pass it in, you own it. Otherwise, you don't).
I absolutely hate the JDK Collections notion of immutable. Wherein objects implement the full mutable interface but throw exceptions when you try to change things. That, I say, is an abomination unto all good programmers (at least, in a strongly typed language).
What you want is to define Mutable as a subclass of Immutable. Then you can return internal (mutable) state as immutables and the compiler catches unintentional alterations.
It's easy to do what you describe in Java, you just need to turn the typing upside-down, and make the Immutable an interface to the Mutable. For example,
interface Point { int getX(); int getY(); int translate(); ...} class MutablePoint? extends Point {int setX() {...} ...} class ValueObjectPoint? extends Point {...}Everyone passes around Points unless they have a good reason to do otherwise.
Oops, but wait, isn't this merely ReturnImmutablesFromAccessorMethods? And, if we stick with one implementor of Point, are we using InterfaceImplementation? pairs? It begins to look like something more fundamental is going on here....
Good idea. I think it works well for Points, but does it work in more complicated hierarchies? I don't see many counterexamples floating around - the interface hierarchy gets kinda complicated, but not more so than the implementation hierarchy would have.
The key idea seems to be to have two hierarchies. One is the mutable (implementation) hierarchy where we have single inheritance. And the other is the MI interface hierarchy. Which lets us do, essentially, casting to the const value just fine.
The deep question: what's the more fundamental thing?
You can also UseaCollectingParameterInAccessorMethods. That way the client can reuse the same object when it makes sense instead of forcing the server to allocate a new object every time it's called.
"Who should make the copy?" sounds awfully similar to CallerVsCalleeSave? in AssemblyLanguage. As in that situation, we have a classic safety-vs-performance tradeoff, with no policy that's optimal for all situations.
I'd initially favor safety, since (a) aliasing bugs are expensive to find and fix and (b) you don't know where you need performance until your code is done (see ProfileBeforeOptimizing). If it turns out later that (whether for performance or some other reason) you really do need to get the actual internal object, you can add a getFooNoCopy() method (or getFoo(fooType ret)).
By the way, I wish I'd read this a couple months ago. I dealt with this very problem with some code that models physical objects and their locations. We actually stored the locations internally as Point objects, except in some special situations:
We had an Arrow object, for which we stored the head and tail locations. It implemented a general MovableObject interface, which had getPosition() and setPosition() methods. The twist is that the "position" of the arrow was neither its head nor its tail, but its center. So we couldn't expose the center even if we wanted to: it didn't really exist. I ended up writing some simple code to calculate the center on demand for getPosition(), and some less-simple code to compare the old center to the new and shift the head and tail for setPosition(). Caching was involved, since the previous implementation had (incorrectly) done it.
In retrospect, maybe I should have gone with ReturnImmutablesFromAccessorMethods - especially since the guy who'd implemented Point had included an immutable PointReadOnly superclass.
On the other hand, of course I had to "copy" the center Point: it didn't actually exist in the Arrow object. Anything I returned from the accessor would either not be mutable, or not be "live" mutable (i.e. changes to it would be ignored by the Arrow).
A final pressure to return copies is that the containing object might need to determine when it's been changed. Certainly that's the case for setHead() in the Arrow class: if the head changes, the cached Center needs to be invalidated. More broadly, your object might participate in a notification scheme, in which case forcing callers to go through a setFoo() accessor is your only choice (short of making the contained object part of the notification tree, which just pushes the catching accessor down a level anyway).
-- GeorgePaci
All ok and work, how about passing collection, vector? -- milh
We define two interfaces for many objects (mostly for ValueObjects), one is TheThing and the other is TheThingModifiable. TheThing interface defines only accessors ("getters"); TheThingModifiable extends this interface and adds the mutators ("setters"). Concrete classes implement each interface. Factories or aggregating classes use the setters, but return the TheThing interface to consumers. This ensures that no unexpected state changes occur, but doesn't require creating a new object. Thanks to Sajid Mumtaz for introducing this idiom/pattern to our team. -- Scott Bockelman
HaveThisPattern for ObjectiveCee, in fact it according to most authors it's almost assumed. Returning immutable versions of mutable objects is a good thing too. The downside to immutable interfaces where the underlying instance is really mutable is casting. In those languages where you can't cast to an unrelated type it's safe, but in Objective-C (and I'm sure other C derivations) any cast is OK, so the type system can be subverted by a careless programmer. More importantly for Objective-C is the reference counting and memory management problem. If you return an alias (pointer) to an object and it gets deallocated out from under you, or the reference count drops to zero -- Watch Out! --StevenNewton
If A asks B for C, A should get C. If A then asks C to do something that modifies C's state, C should do that. When C's state changes, no matter what triggered the change, B shouldn't break. If B breaks when C's state changes it's a smell indicating that C ought to notify B of state changes or that some of B's behavior probably belongs in C.
In a recent project, I too fell under the spell that returning a pointer to a mutable object was evil. I dutifully made defensive copies in the setter methods and returned copies of object in the getter method. Like all paradigms, things can get taken to a pedantic extreme. There is simply no one pattern that matches all situations - and the return new objects or create defensive copies falls into that category. If all you have is a container, or holder, object with no business logic contained within it, then returning modifiable references is quite satisfactory. I think the problem with any type of "best practices" is that they are not universally applicable to all situations. If you rely on the state of contained objects for business processing logic, then yes, exposing these outside your class is a bad idea. However, if you are only acting as a value holder with no internal processing taking place, then the penalty for creating new objects - especially if they are composite objects - quickly adds up. Not only in terms of performance, but in terms of difficult to follow code where new objects are created and modified, only to be set again in the value object. To summarize, look at the problem first, and then see what patterns make sense rather than slavishly going from patterns to the problem domain.
-- FrankHellwig