Avoid Direct Access Of Members

In many places I have seen the following.

    public void doSomething() {
        this.member_ = "Some value";
        ...
        ...
    }

When there is already a getter/setter for this member. This causes problems while extending the functionality of the class. Therefore, when you have to access a member variable, do the following.

    public void doSomething() {
        setMember("Some value");
        ...
        ...
    }

This enables the inherited classes to override the "setMember" and do some additional tasks. Also, this way some additional checks can be easily added later without breaking the existing code.

-- vhi


What about AspectJ (AspectJay), where you can override even this.member = "Some Value"

DoTheSimplestThingThatCouldPossiblyWork: unless you are having problems with cross-cutting concerns, AspectJ is a lot of work for something that is handled nicely by inheritance.

DoTheSimplestThingThatCouldPossiblyWork: Having the code in a class depend on the class' own implementation strategy is not that bad of a CodeSmell. Just go ahead and use the member variables until such a time as you find that you need to redesign the implementation, then change all member access to accessor functions.

Anyway, AccessorsAreEvil. ;->


The SelfLanguage does not expose member variables (slots) to the programmer. All access to slots is done through accessors. Declaring a slot defines the accessors in some implementation-dependent manner. E.g. declaring a slot called "slot" would define a getter called "slot" and a setter called "slot: new_value".

RubyLanguage has something similar with its accessors, but also allows direct member access from within the object.


Using your own variables (outside of accessors) is one thing. Using your parent class' variables is truly evil.

I spent four days once (intermittently, in between doing real work) tracking down a bug that was driving me nuts. A member variable kept getting changed in the production system, and I couldn't see how. Putting a breakpoint at the setter didn't show me anything unexpected. A line-by-line inspection of the code didn't help me either. Then, all of a sudden, I noticed that the variable wasn't declared as private, but as protected. When I changed this, my jaw dropped as the approximately 500 compile errors this caused creeped out of the woodwork (about 120 of these were value changing!)

Sometimes, good coding habits (like making your non-final variables private) can really leave you blindsided to the evil practices that others use.

You can discover all sorts of interesting things about a program by simply commenting out a variable...


One could argue that Java is a flawed language and ideally one should be able to hide the distinction between changing variables and using an accessor. This can simplify syntax and allow one to swap one for the other. If an accessor is simply a formal wrapper around a variable, then by itself it's not providing any value and is just repetitious bloat that hogs eye real-estate. It only provides value if we LATER want to add more processing or control. Thus, it's more logical to start out with direct variables, but change them to accessors later if and when needed. But with Java you'd have to change the interface to do this. In a "proper" language, you wouldn't: you could switch to accesors from variables without changing existing calling code. A side-effect of this is that you wouldn't be able to tell whether it's a direct reference or an accessor by looking at the call interface alone, unless perhaps you make it "read-only" or add parameters. (Somewhere there is a related existing topic on this.)

But if one is stuck with a language that forces an external extinction, like Java, I would suggest leaving variables "naked" if there will be relatively few users (other classes) of the class and these associations will be relatively stable. If, however, there will be a lot of using classes and/or the associations change fairly often, then go ahead and wrap them in accessors up front. -t

-t

Take a look at C#'s Properties, and please reflect on whether your advice is based on good practice or your personal preference. There are good reasons -- that you've not mentioned -- not to do what you suggest.

Let's stick with Java for now. I'll come back to C# later. Per Java, it's to avoid bloat. One only should create such bloat if the cost of change is great enough to justify it. Bloat creates risk by confusing and/or distracting "the eye", but visiting many callers to convert direct attributes to accessors when later needed also creates risk (and work). Thus, there is a balancing point of risk. If you only are likely to have a few and stable callers, then the bloat doesn't prevent enough risk & rework to justify it's own risk (by being bloat). Granted, I have no formal studies on confusion/distraction caused by bloat, and these are based on my personal observations of my own work and others'.

General illustration of trade-offs:

Scenario A: 2 Callers

Scenario B: 40 Callers

Different people may assign different values to the 1) cost of bloat (see FastEyes), 2) the probability that we'll later need accessors, and 3) the cost of changing the interface (and changing the callers). But do notice the "c * 40" (c times 40) in B.1. It's probably a high total by most scenarios (readers applying their estimated costs). If we don't wrap and have few callers (A.1), then the cost of changing the interface is relatively low, and arguably lower than the cost of bloat (A.2). I believe most would agree under that scenario it's probably a good idea to wrap, but the "best" choice in scenario A is probably subject to heavy debate, based on estimations of one's own WetWare and/or that of typical developers in their shop. It's roughly a wash by my estimate, leaning toward A.1 in the name of YagNi. Related: DecisionMathAndYagni, SimulationOfTheFuture.

Do you feel that's a balanced assessment? The only reason you mention to avoid direct access of member variables is that the interface might change.

I am not following. Please elaborate, perhaps with a specific example.

It's not an issue of a specific example, but of the fact that your scenarios are focused almost entirely on "bloat" and "visual complexity". There are good reasons to avoid direct access of member variables, such as to increase encapsulation and reduce coupling. Why do you not mention these?

"Coupling" is an ill-defined concept, and encapsulation for encapsulation's sake is a waste of code. I am not suggesting that one don't wrap/hide variables, but rather only do it WHEN there is known and existing reason, or if there is likely to be one in the future. Thus, I am not arguing against encapsulation in general. It's a matter of if and when to encapsulate. I'm not a YagNi purist in that I append "likely to need" in my criteria versus "until actually needed" in pure YagNi.

Coupling is a very well-defined ComputerScience/SoftwareEngineering concept. Coupling exists wherever there is a dependency such that altering A affects B. In that case, we say that A and B are coupled. Where coupling is necessary, we try to group it together whenever possible. That's cohesion. To reduce accidental or intentional (but unnecessary) coupling between defined units of cohesive code, we use encapsulation. Your argument appears to be based on the presumption that code will be static and neither re-used nor modified, and is simple enough that accidental coupling is unlikely.

Also, lack of encapsulation can create "accidents", but so can bloated code. If the accidents caused by bloat exceed those caused by lack of encapsulation, then encapsulation is not giving us a net benefit. Bloat can also slow general productivity by making more code to read and change.

What is "encapsulation for encapsulation's sake"? Isn't encapsulation for reducing coupling and increasing cohesion?

It depends. See above.

Encapsulation always reduces coupling and increases cohesion. What you appear to believe "depends" is whether it's worth the additional code of get/set/etc. or not.

Without a clear definition/metric of "coupling" and "cohesion", I cannot confirm nor deny that claim. But this is NOT the topic to define/debate coupling and cohesion, as a reminder.

I've given a clear definition of CouplingAndCohesion. They are abstract and often qualitative (though specific quantitative metrics may be defined for specific cases, but not in general), but that doesn't mean they're vague.

If there are no consensus numeric or Boolean metrics for it, or a definition clear enough to lead to that, then it's "vague" in my book. The existing proposals have too much dependency on damned English, and we know how that turns out.

Many things have no established numeric or boolean metric, and yet they're clear enough to make decisions. For example, "programming language" has no established metric, and yet millions of people use them and create them every day.

Many successfully use YagNi also.

YagNi is about only implementing requirements that you need to implement. It isn't advice to write what would generally be considered bad code that violates encapsulation.

That's your opinion. Again, encapsulating before encapsulation is actually needed can indeed be interpreted as a violation of YagNi. But I don't want to make this into a "principle war" but rather explore the ACTUAL costs versus benefits with something more concrete.

That sounds like a highly nuanced and personal interpretation of YagNi. I'd be curious to see if ExtremeProgramming, or any other Agile methodology that endorses YagNi, advocates it.

As long as they don't rely on ArgumentFromAuthority, I would indeed like to see wider opinions also.


Map Perspective

I consider objects to be "glorified maps" (dictionary structures). It's acceptable to have maps without wrapping each element of a map. If we say "always wrap" all "public" elements of an object, then as soon as we add a single method to the map, we would then be obligated to wrap every element, creating an all-or-nothing DiscontinuitySpike. It's like that one method is a poison pill that suddenly triggers some grand encapsulation rule. The boundary between "map" and "object" can and should be fuzzy: I see no reason to hard-classify them (forced dichotomy). See also MergingMapsAndObjects. -t

That also a nuanced and personal interpretation of ObjectOriented programming. OO is defined by encapsulation, inheritance, and polymorphism, not that objects are like maps.

Encapsulation is NOT a hard requirement of OOP. And NobodyAgreesOnWhatOoIs. Thus, my preferred viewpoint of OO is not less valid than others. And RobertMartin's "jump table" definition/description of OOP can be viewed the same or similar to a map view. A "jump table" is simply a map from name/key to a function (implementation and/or reference to).

Are you referring to the exchange at http://objectmix.com/object/312321-object-oriented-programming-explained-51-lines-6.html ?

The debates over OO definitions fall into established categories. Your "preferred viewpoint" appears to be unique to you, and not in any of the established categories.

Established by who? I only see differing opinions. I'm sticking by my working definition until a consensus is reached. In practice I do see maps morph into objects as more requirements are added (at least for languages that help blur the distinction).

Established by multiple acts of general consensus, which has resulted in groups of agreement. Your working definition doesn't fall into any of them. It appears to be unique to you. Do you therefore consider any program which uses maps to be object oriented?

No. Again RM's "jump table" is essentially a map of behavioral pointers. (It's an odd way to describe it, but it just may reflect too much time spent working with one language.) Thus it is NOT unique to me. (See [1] below for my working definition.)

What's a "behavioral pointer"? The "jump table" definition appears to be unique to Bob Martin, and unique to that conversation. Are you sure he wasn't making fun of something you'd written before?

      Raw-Data
        |
        V
      General-Abstraction (tables) -----------> Stack (specific abstraction)
                   \-----------------------> Queue (specific abstraction)
. Could you provide an example of what you mean by "maps morph into objects as more requirements are added"?

One starts with a typical record "structure" (AKA map), and various actions seem to group naturally with that "structure". The map may still be used in map-ish ways, but we now have methods that are specific to that structure. For example, it might be app configuration info that is typically only set by field technicians. We later want a "list_to_screen" method for it to display it for easier trouble-shooting by field technicians (using a "hidden" back-room UI).

By map, do you mean a collection of named elements, aka a dictionary? I don't know what your field technician example is intended to illustrate, but it's trivially obvious that operations may be defined to manipulate a given structure. That doesn't make it object oriented, only that you've defined a structure and a set of associated operations.

That gets back to how one defines "object oriented". I don't define it by "wrap-ness" or "encapsulation" level. I'd argue that full wrapping is really creation of an AbstractDataType and not (just) object orientation. OO is wider than ADT. You appear to be conflating the two. If they are one in the same, then we should dispense with the term "object oriented".

An AbstractDataType is a mathematical model for a category of data structures. It is isomorphic to certain applications of object oriented programming, but not equivalent. Particularly, they are not the same because AbstractDataType is not defined in terms of polymorphism, inheritance, or encapsulation. Other OO definitions are either too amorphous or too individuated to consider.

That may depend on how one defines polymorphism, inheritance, or encapsulation (the "big 3" for reference). Anyhow, it's reasonable that one may wish to use one or two of those three without having to subscribe to them all. There's no reason I can see to force an artificial DiscontinuitySpike in order to match somebody's category system.

Sure, you can use one or two of polymorphism, inheritance or encapsulation, but then it's not OO.

I have to disagree with your view of what "OOP" is. It doesn't matter anyhow here, for one should design software based on the best design choices, NOT based on vocabulary. You can't make something more efficient or more parsimonious or more economical by redefining it. (Caveat: changing the definition of the goals/metrics or "economics" may affect such.)

I see no logic of the universe that forces a hard distinction between maps and OOP as far as how to use them, even IF I buy your definition. It's not in for a penny, in for a pound. Even if I buy your def, there is a continuum between a map and a "true" object and no clear reason to ignore the continuum or pretend like it doesn't exist or pretend like if we are 70% fitting "true OO" we should go 100% because 70% is "bad" or 30% is "bad" and 100% is "good".

If maps are sufficient for OO, then do the static maps in C (i.e., the 'struct' construct) mean C is object oriented?

That depends on how one defines "OO"[1]. The definition is not really what matters and I don't want to get caught up in another term fight. The point is that useful code constructs can exist that cover the full gamut between and including a pure map ("is" or "used as") and a fully encapsulated object (no public "variables", only methods). What we call these things is irrelevant and shouldn't dictate how we lay out our code. It's silly to say that as soon as one introduces a single method into a map (or object used like a map), then one is suddenly obligated to wrap every key of the map or map-like thing. If I understand your argument correctly, then this all-or-nothing rule would apply under it. I find it a ludicrous and highly artificial "boundary". -t

By the way, if full encapsulation is always the "proper" way to do OO, then an "OOP language" technically shouldn't allow public variables in classes at all: only methods would be able to read and change class variables.

Damn straight. OOP languages shouldn't allow public variables in classes at all. Ever. I don't know why they do allow it.

Because unless the language is carefully-designed to avoid such, it creates bloat, and bloat slows down reading and creates errors due to bloat-induced reading mistakes. They don't do it because they probably don't want the bloat-related problems associated with it.

By the same argument, structured programming is bloat compared to the simplicity of GOTOs, and slows down reading and creates errors due to bloat-inducing reading mistakes. Typical OO languages don't force member variables to be private purely for historical reasons. Modern OO practice does not make member variables public.

How the heck is that the same argument? For one, goto programs are not shorter.

They're simpler, by your metric. For example, they don't risk the reading mistakes that are possible from putting the initialisation, test, and increment sections of a "for" loop close together.

Incidentally, "behavior-oriented programming" or "verb-oriented programming" or "interface-oriented programming" may be better way to describe what you have in mind. Your ADT-like view of OOP came after OOP.

None of those are established terms. "ObjectOriented" is the recognised term.

Also recognized to be a mess as far as terminology. Anyhow, you still haven't addressed the question whether something can be in an in-between state of a map and an object. You still seem to be encouraging a forced and/or artificial dichotomy. -t

I didn't know there was an open question about "whether something can be in an in-between state of a map and an object". I'm not sure why it would matter. Whilst "object" is frequently used to refer to any identifiable language construct, particularly one that defines something to hold data (like a struct, class, table, variable, whatever) as opposed to (say) a control structure like a 'for' loop (which is not normally called an "object"), the loose and general use of "object" is quite distinct from the usual meaning of ObjectOriented. I see no evidence that the industry or academia generally considers Map (as in a kind of container) and Object (as in ObjectOriented) to be equivalent in any defining sense.

I don't want to get caught up in classification slots here; it's likely a wasteful LaynesLaw dance. My point is that there can be a wide range of "structures" between those that are treated/used like a typical map, and those treated/used like a typical "object". I give an example above (config info) of something that starts out like a map, but a method or two is later added on. Whether it's called/labelled/classified as a "map", "object", or a "frippokof" doesn't matter. The point is that "in between" things exist with behaviors/conventions/designs/usage-patterns that straddle both the "map" and "object" world. Your "rule" seems to reject this in-between state, and/or it's rule(s) for when "object-ness" kicks in are ill-defined. I'm looking for something clear like, "If it has more than 3 methods, then The Rule kicks in: all public attributes should now be wrapped", or the like (along with the rational of the rule and its trigger point of 3, of course). -t

Classes and prototypes -- i.e., constructs which serve as a template for instances -- should not publicly expose member variables. In C++ and C#, 'struct' is effectively an alias for 'class', so the same "rule" applies. Other non-class constructs that may be evocative of classes -- like Python's tuples, or various Map or Map-like collections (apparently) -- are not classes or prototypes, nor are they a template for instances, so the "rule" does not apply.

Okay, that's clear enough for my satisfaction. Thank you. However, I won't deviate from my recommendation above that wrapping only be done if there are likely to be a sufficient quantity of instances/clones. -t

Why would the quantity of instances make any difference? Does it make a difference whether 'new Blah()' gets called once or a thousand times? Do you perhaps mean the quantity of references to an instance?

Whether they are subclasses, clones, or instances probably depends on the language used and/or programming style since dynamic languages may blur the distinction between instances and sub-classing. I'm not sure of a compact way to word it that makes sense in all languages and coding styles. The cost-of-change to go back and wrap dependent usages (when the need arises) is generally higher the more "coded" references. I generally wouldn't count quantities in "automated" references, such as a loop that allocates 500 references/clones/instances. I'd only count that once (unless something really unusual is being done). Thanks for bringing up that wording point, though. The main factor that matters here is the cost-of-change, which we are weighing against the cost of bloat. Again, I approach it similar to "investment math" where we are weighing trade-offs based on our best prediction of future events. Without having a working time-machine, that's the best we can do.

Would it be correct to say that it's ok to allow direct access of members if the number of dependencies on a given member are low, and not ok if the number of dependencies on a given member is high?

"Dependencies" is too open-ended. I look at probability and cost first, not "dependencies". If a given "dependency" is unlikely to cause a problem, then it should be given less attention/weight than a factor that is likely and/or costly.

I would have described it the other way around. Probability is inherently unknown, and cost is often unpredictable, but dependency is straightforward. You have a dependency between A and B if changing A affects B. In most imperative programming languages, dependencies are defined by relationships between identifiers. Given some definition or declaration z assigned an identifier 'p', every reference to 'p' represents a dependency upon z. Improving coupling means reducing references to 'p'. Improving cohesion means grouping references to 'p' together. This means that if z changes, the impact is minimised. The question, if any, is assuming 'p' is a member variable, how many references to 'p' does there have to be, and/or how ungrouped do they have to be, before you hide 'p' behind a wrapper?

I have to disagree. Probabilities, cost-of-change, and cost of reading bloat can be roughly estimated. Focusing only on easy-to-measure factors is the SovietShoeFactoryPrinciple. I'll stick with SimulationOfTheFuture as the most rational way to make design decisions, which generally follows investment theories. Focusing on the existing code alone is too narrow a viewpoint. -t

I'd be interested to see code written in the conventional focus-on-CouplingAndCohesion style compared to a style driven by SimulationOfTheFuture. For example, could you change PayrollExampleTwo -- which was written based on focus on CouplingAndCohesion -- to be based around SimulationOfTheFuture? You can even base it on the actual future, because the payroll formulae change every six months.

I don't know enough about that domain to make a confident estimate of change patterns. The kind of changes to the formulas may play a role in the calculations also. I've never worked directly on a payroll app. I can tentatively agree that without sufficient estimates of change patterns, wrapping may be the better default. But if you don't have enough knowledge of the domain, you should probably talk to somebody who does before making that coding decision, and/or study past formulas & changes. AND past coding mistakes. It's quite possible they were caused by BloatInducedReadingConfusion.

Changes occur every six months and can occur anywhere, but the numeric literals change the most frequently, the switch statements change next, then the structure of the formulae (including adding or removing factors), then provinces/territories are added. The last one happened once in the ten years that I maintained the real code upon which PayrollExampleTwo was based. Because changes can occur anywhere to anything, what made the most sense was to design with a focus on CouplingAndCohesion. Thus, on average, any change had the least impact, rather than trying to optimise for specific changes as implied by SimulationOfTheFuture.

Re: "the switch statements change next [2nd in frequency]" is a bit open-ended. How they change can matter a lot.

They can change in all the way that switch statements can change: More cases or fewer, the case literals can change, and the code in them can change.

Let's say we classify the patterns of all possible changes into 20 different patterns. If say change-pattern 7 is 10 times more likely than pattern 15, that could very well affect the final decision of which code design technique is ranked as the most change-friendly. I don't have those specific probability values here to analyze and process.

Such specific probabilistic "change patterns" don't exist in Canadian payroll. Numeric literals are roughly twice as likely to change as switch statements, switch statements are slightly more likely to change than formulae, and provinces/territories will be added about once every 100 years unless they repartition the country for payroll purposes into income tax regions.

They do exist, per actual historic change events. You just don't know them because you forgot or nobody bothered to track them. But anyhow, how does your coding suggestion improve the changing of "Numeric literals", the most common change pattern, according to you?

Why do you assume I "forgot" or "nobody bothered to track them"? I developed Canadian payroll software for a decade, so I know how the payroll specification changes, and it's not something for which you can (a) identify a collection of "change-patterns" beyond those that I've given; or (b) assign numbers like "10 times more likely" other than the numbers I've given such as "roughly twice as likely".

Coding with a focus on CouplingAndCohesion has no impact on changing numeric literals. It has a big effect on changing the switch/case statements (they're cohesive), the formulae within a province/territory (province/territory classes are cohesive but not coupled to each other), changing provincial/territorial factors independent of federal factors (each is cohesive, coupling is via inheritance through a minimal set of functions), etc.


[1] In my book, it would only be OO if it facilitates defining and putting behavior (functions or references to functions) in the 'struct' nodes (per C example). And OO-ness can perhaps be considered on a continuum rather than discrete (is or isn't OO). If C makes it possible but difficult to put and use behavior in struct nodes, then it may be considered "weak OO". "Orientation" generally means "leaning toward". Thus, "object oriented" generally means "leaning toward objects". This can be read as "facilitates object-ish things and usage" or "makes it easier to do object-ish stuff". -t

So you define OO by "has dotted dispatch?"

I don't recognize the phrase "dotted dispatch".

E.g.:

 instance.method()
Note the dot between the instance and the method name. That's dotted dispatch.

Note also that 'class p {void x()};' is merely a syntactic shorthand for 'class c {}; void x(c p);' In terms of TypeSafety, etc., they are equivalent, but the latter requires that 'class p' have publicly-accessible members whilst in the former, method 'x' can access private members of 'class p'. In short, the reason for "putting behavior ... in the 'struct' nodes" instead of outside the 'struct' nodes is specifically to AvoidDirectAccessOfMembers.


CategoryJava, CategoryInterface, CategoryObjectOrientation


NovemberFourteen


EditText of this page (last edited December 16, 2014) or FindPage with title or text search