"CopyAndPaste is InformationLoss about similarity".
Comments
It's clear that:
- when you say "secondary information", you really mean "information that is easy to regenerate from other sources", and that
- when you say "primary information", you really mean "information that is difficult or impossible to regenerate from other sources, and hence information that is precious and valuable that we don't want to lose".
An example of what you call "primary information" that is not
impossible to regenerate would be, say, the factorization into primes of 10^73-1. That factorization has been completed, and if it were lost, it could be done again, but only with non-trivial difficulty (I believe it would take days or weeks or more of computation to factor on a single cpu these days, IIRC), so it might be pretty important to hang on to the information of its prime factors.
Information carrier:
- Language or structure or anything else that can keep information without loss is information carrier (Metal key is information carrier about what is necessary to open the lock. Compiled program is information carrier about the program. Source code (I mean text files) is information carrier about the program.)
- Duplication is transfer of the information from one carrier to another (different bocks of the same text file are different carriers).
Deterministic destructors for C#
C# and Java could be improved with reference counted destructors that force programmers to clean resources explicitly (force programmers to do duplication).
...
- I agree that it opens up the possibility of careless mistakes, but there is very little duplication. Everywhere you use an IDisposable resource, you have to add a total of 9 characters of duplicated code: "using", "(", ")", "{", and "}". If you want to attack their approach here, there is a better argument in the careless mistake category (the programmer forgets to use "using").
In my experience these days, when you have to manage limited hardware resources at the application level (as opposed to the OS level), there is almost always a second requirement of feeding the hardware some isochronous or otherwise realtime data. If that is the case, then .NET is absolutely the wrong tool for the job. If there is no realtime requirement, then you are stuck with IDisposable or a homebrew pooling system.
- But this is my point, IDisposable cause code duplication because programmer have to remember about disposal. -- Aleksey
- Maybe this is a difference in our terminology. To me, those are two separate concerns. Duplication is one concern, and programmer forgetfulness is another.
- Forgetfulness is what can cause problems and bugs. If programming language require to remember something and programmer is forgetful, whose fault is it ? My point is that it's language fault, not programmer fault. The goal of good program design is to allow programmer to be forgetful. Duplication can cause forgetful programmer to forget something and in this particular case he may forget to use using{}. Requirement of language to use using{} construct is bad and cause duplication (in my definition of duplication) because using{} construct duplicates information about when object should be disposed. This information is already available inside compiler when variable goes out of scope and there is no need to require programmer to duplicate it by wriging using{} construction. More than that, if programmer needs to use 10 objects, he is required to use using{using{... 10 nested levels which is bad. -- Aleksey
- Agreed. -- MichaelSparks
...
- The problem with using{} or lock{} constructs is that when I want to allocate resource or lock something outside of for{} construct and deallocate inside or the opposite, it doesn't work and it forces me to do deallocation manually. Another thing is exceptions, they don't work well with using{} or lock{} constructs. What I suggest is to be able to use something like {lock(x);..}, then I can put lock(x) at the beginning of a function and forget about where it ends or if it ends with exception. -- Aleksey
- Even with regular destructors and RAII, you can't intermingle object construction and destruction at different scopes. using{} and lock{} are no worse in that regard. Also, using{} and lock{} work perfectly with exceptions - that is one of the main reasons for their existence! Your example of {lock(x);..} is equivalent in .NET to lock(x){..}. The equivalence holds even with exceptions.
The third thing I used RAII for was cleaning up program state after an exception. In .NET, the suggested approach is to use the finally construct. I think RAII is much more elegant and maintenance-proof, but for whatever reason, they chose to go this route with .NET. I suspect it has something to do with luring in "lesser" programmers. There is a whole army of VB programmers out there who would have simultaneously gone cross-eyed if Microsoft tried to force RAII onto them. Don't forget that .NET was heavily influenced by marketing considerations.
...
I think Microsoft made a good move by using GC instead of reference counting. There are technical reasons and practical reasons. For the technical end, GC is often faster than reference counting for most applications. If tuned properly, it also leads to better responsiveness. Those are minor issues. The real win with GC is that it eliminates entire classes of programmer errors. Circular reference problems are the least of them.
- Yes, it eliminates circular references problem at expense of bringing code duplication problem. It's still good decision from Microsoft, but I just wanted to have something more than that. I say that both problems can be solved and non-deterministic memory finalizers can co-exist together with deterministic resource destructors in one language. -- Aleksey
- If they can co-exist, I don't know how. To have deterministic destructors, you have to have clearly defined object ownership. That goes against the whole basis for GC. If the runtime knows exactly when an object is no longer needed, what would be the need for GC? It could just reclaim the memory immediately. The other benefits of GC like heap compaction are probably not worth the performance hit on their own.
- GC is necessary because memory deallocation is inefficient if done immediately and more efficient if delayed and done by blocks. Also GC is helpful to resolve cyclic references. The problem with C# is that it mixed 2 things in one destructor - destruction of memory and destrution of resources. They have to be separate, one in non-deterministic destructor and one in deterministic. And runtime can track and appropriately invoke both of them, one from GC, another when variable goes out of scope. How to make it co-exist, they can keep reference counter (I think they already have one to mark object for GC) and invoke deterministic destructor in addition to marking it for GC when counter hit 0. GC will not collect it immediately, but deterministic destructor will be invoked immediately unless there is cycle. If there is cycle, then deterministic destructor will be invoked together with finalizer (it's not desirable situation, but it's up to programmer to avoid cycles and use deterministic destructors). In majority of cases like lock{} or using{} there are no cycles involved, so deterministic destructor will be invoked immediately without problems.
- The .NET runtime doesn't use reference counting in its implementation. When the GC happens, the runtime starts from known root objects, like thread objects and the stack, and traverses all pointers until it has touched every object your program has a pointer to. It compares that against the contents of the heap and eliminates the unreachable objects. That takes a lot of CPU time. The other solution, reference counting, also takes a lot of CPU time because each time a pointer is copied, the reference count has to be adjusted in a thread-safe way. The best GC implementations are about as fast as the best reference counting implementations. Doing both would mean spending twice as much time on memory management. I like deterministic destructors, but I wouldn't want to pay for them with double CPU time spent on memory management.
- I agree that my way is less efficient at runtime and it becomes matter of what price to pay runtime or development. I think there are many cases when it's better to quickly deliver reliable twice slower working program rather than fail dead lines and produce unreliable buggy program. It's rational to buy twice more CPU power rather than hire more programmers.
So, GC eliminates several classes of programmer errors, but the absence of RAII makes it more likely that errors of another class will occur. Does it all come out as a wash then? I don't think so. Think about the average programmer who uses these Microsoft products. The average programmer probably runs into issue after issue with explicit memory management in C++. Also, the average programmer using these products has probably never heard of RAII or written any code to that pattern. The net effect is a big decrease in the number of bugs.
- I agree, my suggestion was not to remove non-deterministic finalizers, my suggestion was to add new deterministic destructors in addition to that. If average programmers don't use them, fine, but it would be nice to have those for all other programmers. And as you said today programmers don't worry about memory and they don't use finalizers at all, but they would be happy to use deterministic destructors. -- Aleksey
I was busy and away but still noted you had quite a bit of discussion related to PrimaryInformation and other pages. I am wondering whether the term PrimaryInformation in your case would have been better named differently on hindsight, perhaps FirstOrderInformation or something similar? -- DavidLiu
I agree, I think the best name would be GivenInformation?, I don't see any problem with it (unlike duplication vs derivation and DuplicationMachine vs DerivationMachine?). We could use DerivedInformation instead of SecondaryInformation, but in that case it will be logical to use derivation instead of duplication ... -- Aleksey
CategoryHomePage