Open Problems In Computer Science

Oh look! Another ThreadMess!

With extra marshmallow troll-berries. I would recommend moving the "types" debate to another topic.

Is that really because you understand modules?

What are the major problems in programming or computer science that we haven't been able to solve?

This page is so vague and unspecific that I think it needs to be addressed in other more specific pages.. Computer science is a big can of worms, and this page would simply grow past the size of infinity.

It seems like day-to-day work in programming is filled with figuring out countless details and cobbling together algorithms to put a system together. In that sort of environment, it's easy to forget the big picture. So what are the biggest problems in computer science right now (academic or in practice)?

Is P equal to NP?
Prime factors of very large integers. http://www.wikipedia.org/wiki/Integer_factorization ...
- More specifically, is there an efficient algorithm for this? Factorization algorithms exist; trouble is they take too long for sufficiently large numbers. Of course, if efficient factoring algorithms are found, a TurdFanCollision will be imminent.
... and is factoring integers as hard as the DiscreteLogarithmProblem or not?
A usable definition of "types" that does not require more paper than phonebook.
- Solved many times over. But Top, who injected this comment, is too stupid to use types no matter what the definition is.
Which paradigm/technique is "objectively better" outside of performance issues. ComputerScience has failed miserably in settling long-standing HolyWars. I've seen 3 possible responses to this:
- MostHolyWarsTiedToPsychology - Psychology is a "soft" science. This is my top pick (no pun intended). --top
- It's probably possible, but nobody has figured out how to prove such yet.
- It has been proven, but the proofs are too complicated to understand except for the really smart.
- If they are that advanced, put it in formal notation and collect your Nobel/Turing prize. It would be a breakthrough in CS worthy of a prize. Personally, I suspect you mistake your vague internal notions for objective truth due to the failings of the human ego. -t

Why "outside of performance issues?" If you're going to measure whether one paradigm is better than another, I think the best way is to examine whether the easiest and most straightforward way of expressing concepts are also the best way to express them, where "best" refers, among other things, to performance, security, and correctness. As far as proofs go: I think most people who engage in holy wars have married themselves to an idea and are uninterested in properly comprehending alternatives; their decisions have been made, and their minds are closed. YOU, top, are that way about types; even after all your years on this wiki, you still display an appalling degree of ignorance regarding TypeTheory and the application of TypeTheory to language design (and the only reason you're 'displaying' it is because yours is the "vociferous ignorance" sort). I'm certain you've encountered people that way about relational. This falls under 'MostHolyWarsTiedToPsychology', but is more of the 'cult and religion' and selective ignorance on one or both sides than of anything more flattering - it certainly isn't about humans with different value systems giving different precedence to different viewpoints after fairly judging them all.

As to whether 'types' are better? I am strongly convinced they have proven themselves beyond any reasonable doubt better (for performance, correctness, AND security) than the 'no types' option: treating all of computer memory and hard disk as one large BLOB that is twiddled and tweaked. In that sense, even databases and files and processes or services and code vs. data constitute 'types' for parts of the blob. Beyond that, the question only becomes a matter of degree. You seem to be of the opinion that there is some cutoff (an apparently arbitrary one) at which more or stronger typing becomes a bad thing - perhaps because you've experienced a lack of foresight creating artificial limits and accidental complexities, and you'd rather just have a nice big string into which you can dump data rather than be tied to somebody's inadequate model. I happen to be of the opinion that strong typing is always a good thing so long as one chooses the correct types, but even I'll agree that foresight ain't perfect. However, I also come more from the open source community, and I've freedom to rebuild my code at my workplace, so I rarely feel tied to the model: if the model is inadequate, I tweak the model (or rewrite a chunk of it) rather than rail against it. Had I been educated in a culture of proprietary software and unforgiving, inflexible frameworks, perhaps my opinion would be different.

There are already topics where others besides me defend minimal types. I don't think this is the place to delve into those details. But one practical limitation of types that is readily apparent is that they make it difficult to mix and match tools, because sharing types across languages and tools has not been solved. For similar reasons, decent language-neutral GUI engines have failed to appear: they keep tying the engines to a specific OOP language. If one puts a high premium on mixing and matching tools, then types are a problem. If you want to discuss this claim further, visit CrossAppLanguageOopIsRough. --top

RE: "But one practical limitation of types that is readily apparent is that they make it difficult to mix and match tools, because sharing types across languages and tools has not been solved." - I'm afraid, Top, that experience has proven the exact opposite of what you just claimed. Having common file types (mp3, ffmpeg), text formats (ASCII, UTF-8), structures and schema (for both databases and XML), and language are the only things that allow one to mix and match tools. Your path leads to largely opaque proprietary formats - whether they be described in strings of bits or strings of characters - for which every tool needs to be specialized any time you want to look inside and discover something. If one puts a high premium on mixing and matching tools, then types are the only proper answer - especially descriptions of these types in a formalized manner for use between applications (like schema). THAT is what is "readily apparent". And ObjectOrientedProgramming is a completely different issue; that you seem to strongly associate use of types with OOP is, in my mind, another indicator of ignorance. Even relational and queries would do well to have access to much stronger and more complex types than is offered by inadequate relational languages like SQL.
- Those are "agreed-upon formats", not "types" in the usual sense. --top
- Once again, your (vociferous) ignorance is bleated loudly for all to hear. They are types and formats both, FULLY in the usual sense, possessing a structure and a formal description of how their structure relates to their semantics.
- I don't dismiss that "everything can be *viewed* as types", it's yet another EverythingIsa among many others. My point is that type information beyond the trivial (number, string, date) is difficult to share across languages. And, careful with the rudeness, there.
- There is no such thing as 'trivial' type information. Strings, numbers, and dates are neither more nor less difficult to share across languages than music, lambdas, and lists of facts; communication of any concept, no matter what it is (including strings and numbers and dates), requires description of a concrete representation, a transport codec (that associates representation to value) and a semantics that describes the purpose or meaning of the value in its context (thus giving one 'concept'). These aren't just *viewed* as types - they ARE types. Types are not "side flags"; types are structure, property, and semantics. It may be true that you can take advantage of the fact that most languages by design support some common types and codecs (like 'strings' of 'characters' represented in 'ASCII' or 'UTF-8'), but that doesn't mean supporting those types was somehow more 'trivial' than any other type one might choose to support. All it means, Top, is that YOU weren't the person who had to write the code to support it.
- (Moved discussion to FalsifyingExistenceOfTypes. Some of this text needs refactoring after the move, but I'd probably have to slice other author's text to do it properly, but would rather let them decide where to cut it rather than risk changing intended meaing.)
  - Anything that makes a clear distinction can be, and has been in one type-theory or another, a type. A 'distinction' takes an instance and puts it into one of at least two different buckets. In the context of RelationalModel, you can further limit this to distinctions over values (since one doesn't need to worry about mutable things in RelationalModel), and you only need to worry about binary distinctions (due to the nature of all constraints in the RelationalModel). Further, you can limit it based upon the pragmatics of relational model to 'decidable' and maybe even 'polynomial-time decidable' distinctions over values. Thus not everything is a type; instances and individual values aren't types, for example. Predicates and pattern-matches can be used as types.
  - That said, a clear definition of "types" is not, in fact, necessary to continue this discussion at a high level; all that is required is an acceptance that strings, numbers, dates, and other things mentioned in the discussion do, indeed, constitute 'types' - so, unless you're challenging a particular point, your mention of LaynesLaw (which ISN'T a logically valid law) is simply a lame attempt to avoid discussion. Besides, not even TypeTheory defines 'type'; it only defines 'TypeSystem', and individual 'TypeSystem's have the privilege of defining 'type' however they like. So defining 'type' REALLY isn't necessary. We can choose to do so at our convenience. What matters isn't any particular definition of 'type', but rather the properties of 'type' that are common to ALL type systems that might apply in the context of the RelationalModel.
Even Cee and Modern Pascal have been able to share between each other with their type systems (longword, longint, etc) and is the only reason why languages can sanely share DLL's and DSO's without everyone reinventing their own validation systems in the application code (passing strings to a DLL). Type systems define what is what. With no types, you'll be reinventing types through some other validation method.. and just calling them something else like validation descriptions or some reinvented form of a type. A type is not just about TypeTheory but about common sense. What type of file is on your disk? What type of data is that that you are giving me? To argue that ThereAreNoTypes is ludicrous since the world revolves around different types of things. Are humans just a chunk of string? So, Top, I have to say that I don't agree with many of your philosophies. I agree with some, but you are reinventing much. And I agree with above points that types are not just about Classes and Objects... the Third Manifesto even describes types in the relvar system without tying it to todays OOP (more confusingly, in the third manifesto they talk about their model being the new OOP, which was a big mistake for them to make.. since they are vaguely changing the description of the already vague OOP).
- [Really? Where do they talk about their model being the "new OOP"? To my knowledge, their use of "OO" refers to "Other Orthogonal", intended to be reminiscent of Object Oriented while being staunchly and explicitly orthogonal to it.]
- Which is even more confusing, choosing OO when they know very well that acronym has already been taken.. On some web page or in some article on dbdebunk.com I read that "we define objects in the relational model as blah blah" as if now they are reinventing objects. It may have not been in the ThirdManifesto? book itself where I saw them redefining objects. I apologize for the misquote or incorrect citation. It was most likely Fabian or Date on the dbdebunk sites or another article on some site from one of those authors. If I find the link, I will insert_here.
- [It's only confusing if you haven't read TheThirdManifesto. In context, it works.]
- I have read it.
- [I mean the book, not the Wiki page.]
- I have read the book, silly. Regardless, it is still stupid to rename something OOP.. because context does not always work. Example: someone is discussing OOP on a website and doesn't define OOP before each paragraph.. only at the top of some article on another page.. and we don't know which one he is referencing. This is a pointer. OOP is now a pointer to point to anything we want, like Orthogonal Omniscient Pizzas. See RhetoricalIndirection. And pointers go against relational design, since relational purists are more stack and once only oriented (if the academics even know what the stack is). Let's redefine OOP to mean Once Only Programming. Nah, let's not confuse people, thanks.
- [Then the confusion must be yours. I did not find it confusing.]
- Is the above Object/Relational title not the most confusing thing since someone offered a store downtown with a sign saying we make sliced bread when in fact they actually make meat by slaughtering cows in the back alley? The Third Manifesto seems to have more than one book title. The original one (at least the older ones anyway) has/have a different cover/title. But this is all confusing even so: claiming that OO doesn't belong with the relational model, and then selling a book about OO and relational model??? The above book title adds silly confusion by putting in the title something people know as Object relational mapping (but the book isn't about mapping, yes, I know, but it causes confusion). They should just declare it as The RELVAR Book or something obvious and not cause RhetoricalIndirection by including this object stuff if they don't think it belongs with relational model. Oh, but in the book.. they say object orientation has some merit.. and that it can be included into the type system but not the relational model.. which makes a tiny bit of minor ant sized sense, but it also adds confusion: are types really completely separate from the relational model, purely separate yes?
- [That's the first edition. It's now in its third edition, called "Databases, Types, and the Relational Model." The "Object/Relational" moniker was a lot more relevant when the first edition was published, when "pure" object-oriented databases were still being touted as a superior replacement to the relational model, before Object-Relational mapping was broadly known (if at all), and when the term "Object-Relational" was being applied to various database products in a rather careless fashion. Keep in mind that "Object" is a term with considerably broader application than Object Oriented programming. And, yes, types are *essentially* separate from the relational model -- in that the relational model describes values, variables, tuples, relations, and an algebra to manipulate these. However, values and variables must be of some type, hence there is a connection, but the model can be fully considered without reference to these types. How the types are defined essentially does not matter to the relational model, but DateAndDarwen do strongly suggest approaches they feel would be particularly appropriate to use with the RelationalModel, and deprecate those approaches they feel are lacking in appropriate rigour and desirable characteristics. This, I suspect, owes somewhat to historical battles between OO and relational databases, rather than a need to define the characteristics of a particular type system within the model. From TTM ver 3, page 3: "Relations in the relational model have attributes, and those attributes in turn have types. Thus, the relational model certainly assumes that types exist. However, it nowhere says just what those types must be, nor does it have much to say about what properties those types must have; in other words, the relational model and type theory are almost completely independent of each other (they are orthogonal, to use the jargon). In this book, we present a detailed theory of types that complements the relational model, and we explore some of its implications. In particular, we show that our type theory serves as a basis for a comprehensive model of type inheritance."]
- There are, of course, some limits on the properties types in the RelationalModel must have. Among other things, they need to be types over values, and the types themselves must be values: immutable, describable, and possessing reflexive identity. To be implementable in RealLife, the values must be limited to those that can be described in a finite and concrete representation. To be implementable today, said representation MUST be either digital or void (as opposed to time-space-analog-field strength or quantum-bit probability-arrays or something that has yet to be discovered). To be useable in a candidate-key slot, the different values (not merely their representations) must be decidably different... which is a problem for 'function' types (but you can get around that by saying that the type is 'function-descriptor', not 'function', since descriptors are a lot easier to compare for identity). To be practical in a candidate-key position, they must be more than decidably different: said qualification ought be possible in polynomial time, which really puts a damper on unlabeled graphs as values. And it helps if there are features worth indexing.
- Anyhow, it has been a while since I flipped through that book, and I'll admit to having never thoroughly read it, but IIRC DateAndDarwen's proposed model involved use of subsets-as-subtypes - a position I would oppose.

[I suspect much of this perennial HolyWar surrounding types is actually one of terminological usage. What Top describes as "minimal types" or "no types" appears to be strong dynamic typing with, I suspect, heavy reliance on implicit type casts. He appears to want DBMSes with a type system similar to Perl or PHP. It's an interesting idea, though writing as someone who developed accounting and payroll software, it's also scary.]

PHP has dynamic types while Perl has *no* types because it has no side-flags, at least on the scalar level. They are two different animals. Also, I agree that strong typing is better for some apps than others. I've worked in many companies that want complex systems on the cheap, and this is where type-free would be the best. Accounting and payroll cannot be done on the cheap (if you want the company to survive). In those, the bloated anal nature of static/strong typing is almost a must (although some argue that unit tests should be used to check stuff and not the type system). --Top

[Perl has no types? I'm sure LarryWall would be interested to discover that. Once again, you're using "type-free" and "*no* types" to refer to forms of dynamic typing. Your application of terminology is not in accordance with accepted use. Programming in MachineCode has effectively no types (at least from the developer's point of view); most other languages, including many assembly languages, have at least some notion of types even if they're weakly-typed.]

Let me restate it to bypass a definition battle. The default "cells" in DynamicRelational do not have hidden flags or schemas to indicate the "type" of a cell (although schema typing can be incrementally added). PHP does. Perl doesn't for scalars (to the best of my knowledge). PHP acts like it has flags while Perl does not. (I've never seen the insides, so I am basing this on their behavior via tests.) --top

What pray tell is a side flag? Where is the definition of what a side flag is? Is it accepted terminology across programmers? Or does only one single programmer in the entire world know what a side flag is? In order for humans to communicate, we need to precisely define terminology. In math, we know precisely what subtraction and addition is. What is a side flag and who has defined it where?

"Type tag" is probably the more common terminology. I avoided it originally to avoid confusion with markup language "tags".

How does one reason mathematically about concurrent distributed systems?

For some of the work already done, see http://www.cs.brown.edu/people/mph/publications.html, especially: The Topological Structure of Asynchronous Computability (http://www.cs.brown.edu/people/mph/HerlihyS99/p858-herlihy.pdf) and A Classification of Wait-Free Loop Agreement Tasks (http://www.cs.brown.edu/people/mph/HerlihyR02/journal.pdf)

It isn't exactly categorized as logic, but I think the logic for distributed systems will eventually have to be based on something like this.

Can the paradigm differences between the Object Oriented and Relational Database worlds be solved in a uniform fashion?

Is this really a computer science problem? Seems like a people issue.

I don't think it's in the same class as the "P = NP" problem, but some problems do very much involve people issues. My concern is that the problems immediately above this are largely academic problems. What are the important industry problems (if such a distinction is warranted)?

There are two parts to this problem. As an analogy, think of the origin of the relational model. This is clearly an example of the the right "computer science" idea in the right place at the right time. The object/relational holy war is only a computer science problem to the extent that a new, theoretically-grounded paradigm may be necessary to integrate them. Whether such a new paradigm succeeds depends on whether it is practical, and whether it is actually adopted. This is a "people issue".

These are not problems in computer science:

Resolve paradigm battles in a more logical, systematic way
How to keep information work from migrating to nations that didn't invest in creating it?

They aint solutions. Perhaps you really mean ComputerMathProblems??

No. As mentioned above by someone else, the first of these problems is a "people issue". There is nothing in it specific to computer science, even if you make it specific to a particular "paradigm difference" in computing. The second "problem" also has nothing to do with computing. It sounds like an economic problem, if you think it is a problem at all - which is a matter of opinion. The fact that "P ?= NP" has no known solution is not a matter of opinion.

How can our machine architecture change to go from handling about 1 instruction per cycle to 30+? A lot of the performance improvements we've seen in microprocessors have come from deepening pipelines, but longer pipelines are probably not going to continue to improve performance. Performance improvements will have to come from increasing the instructions per cycle.

This is a "problem" in electrical engineering, and has nothing to do with computer science.

You don't think that Computer Science includes the architecture of computers? You think it's only about software? Few would agree with you. Computer Science and Electrical Engineering overlap.

This is probably more of a freak science question, but has it ever occurred to anybody, what with however many googol of computations modern computers do each millisecond these days, there might not be some relativistic effects involved?

Definitely not in the sense you seem to mean. Special relativity does arise in some regards in the design of circuits, since it limits the speed of transmission of signals to roughly 1 foot per nanosecond, and it is an integral part of the quantum physics that governs semiconductor behavior, but it has nothing to do with the clock rate of a computer nor its MIP rating.

Is information processing scalable with respect to speed?

Since you seem to be asking this in the context of Special Relativity, the answer is a clear "yes". There are fundamental limits that physics places on the maximum rate of computation, but that is a different subject more closely related to thermodynamics.

It is very much a computer science problem. It will take work in compilers (instruction reordering, optimization, etc), architecture (speculative threading, relaxed memory models, etc), and programming languages to solve it.

The field of computer science is largely an awkward marriage between engineering and mathematics, and many arguments in computer science often assume one background or the other. The very term "open problem" implies this page is mainly for the mathematics stream of computer science. So I would suggest that the major cultural problem in computer science is how to better blend its mathematical and engineering heritage.

Is this a real problem? Certainly it gets played out a lot by some, but is it really any different than the tensions between management and developers or customers and suppliers? It seems like historically the greatest contributors to the field have mastered both the practical and theoretical sides.

[We might also need to differentiate between computer science and programming science. Computer science could be about the circuit boards in your computer. Programming science, should be about the science of programming a computer. In order to do programming, one does not need to know about capacitors, resistors, and solder that is in your computer. Computer science unfortunately is the term used in university, when in fact it should be "programming science". Computer science should be about building computers and electronics, not about programming. Some prefer to call it "computing science" instead of "computer science", but they are very similarly spelled, so are often confused. Why isn't there a computer science course that teaches one how the internals of a computer work, and another programming science course that teaches about programming? Computers are not programs, so isn't program science separate from computer science? One does not need a computer to run a program, one can write a program without a computer (i.e. languages like TutorialDee were created on paper, with no computer, originally.. and are now successfully implemented on an actual computer later)] ]

Tradition. There are a lot of terms that don't make sense, but get stuck in our collective due to use. It's a kind of QwertySyndrome.

How to keep information work from migrating to nations that didn't invest in creating it?

Discussion moved to SoftwareEngineerFolkEconomics.

Various open problems proposed above have been dismissed as "people issues", but to my mind one of the largest problems in Computer "Science" at the moment is the disregarding of people issues. Sure there is a hard core of math encased in a tough engineering wrapping, but then outside that it gets all soft and mushy as a large part of software work, esp programming, is all about people. As has been stated ad tedium, the computer doesn't care whether you wrote your program in assembler or smalltalk, it justs sees the machine code. So everything from programming languages up, including paradigms and interface issues, is obviously a people issue. Whether we broaden the definition of computer science to include these soft and squishy ideas, or fork it into a new discipline, I think this issue needs to be addressed, as more and more the important problems seem to be not "How can we reduce this algorithm from O(n2) to O(n)", but "How can we reduce the development time of this kind of simple app from 3 months to 2 weeks". Once we have named this field we will have legitimised its teaching and studying, and true advances can be made.

We have a name for it, and it is "SoftwareEngineering."

Well said. To me making a faster algorithm is often puzzle solving, instead of problem solving. Problems involve people, and reducing development time and cost is helpful to people.

{Note that I did not write the above. --top}

MarchZeroEight

CategoryMath CategoryScience ComputerScience