Assume you have a list (bag) of text tokens as input. Produce sorted output that has the number of occurrences of each element. Example:
Input: foo bar bar foo glag foo Output: bar 2 foo 3 glag 1You don't have to worry about the output alignment, case sensitivity, or collating sequence of non-alpha characters. The input list can start out in a string, array, or whatever native structure you want. This is to avoiding turning it into a parsing contest.
awk '{ c[$1]++ } END { for (x in c) print x, c[x]; }'You didn't say how it was to be sorted. You can put " | sort" on the end if you want it sorted by key, " | sort -n -k 2" if you want it sorted by number (increasing) or " | sort -rn -k 3" if you want it sorted in decreasing number.
#include <map> #include <iostream> #include <string> int main() { std::string const s[] = { "foo", "bar", "bar", "foo", "glag", "foo" }; std::map<std::string, int> m; for (int i = 0; i < 6; i++){ m[s[i]]++; } for (std::map<std::string, int>::const_iterator it = m.begin(); it != m.end(); it++) { std::cout << (*it).first << '\t' << (*it).second << '\n'; } }Or
#include <set> #include <iostream> #include <string> int main() { std::string const i[] = { "foo", "bar", "bar", "foo", "glag", "foo" }; std::multiset<std::string> s (i, i + 6); std::set<std::string> keys (s.begin(), s.end()); for (std::set<std::string>::const_iterator it = keys.begin(); it != keys.end(); it++) { std::cout << *it << '\t' << s.count(*it) << '\n'; } }Or how about CeePlusPlus with BoostLibraries:
#include <iostream> #include <map> #include <string> #include <boost/lambda/lambda.hpp> #include <boost/lambda/bind.hpp> int main() { using namespace std; using namespace boost; using namespace boost::lambda; string i[] = { "foo", "bar", "bar", "foo", "glag", "foo" }; map<string,int> m; for_each(i, i + 6, bind(&map<string,int>::operator[], var(m), _1)++); for_each(m.begin(), m.end(), cout << bind(&pair<string const,int>::first, _1) << '\t' << bind(&pair<string const,int>::second, _1) << '\n'); }Or C++11:
#include <set> #include <iostream> #include <string> int main() { auto s = std::multiset<std::string>{"foo", "bar", "bar", "foo", "glag", "foo"}; auto keys = std::set<std::string>(s.begin(), s.end()); for(auto key: keys) std::cout << key << '\t' << s.count(key) << '\n'; }
;; Loop (woohoo!) solution... (loop with words = '(foo bar bar foo glag foo) for w in (sort (remove-duplicates words) #'string<) do (format t "~&~A ~A" w (count w words))) ;; FunctionalProgramming solution... (let ((words '(foo bar bar foo glag foo))) (mapc (lambda (w) (format t "~&~A ~A" w (count w words))) (sort (remove-duplicates words) #'string<)))
string[] tokens = { "foo", "bar", "bar", "foo", "glag", "foo" }; var query = from s in tokens group s by s; foreach (var item in query) Console.WriteLine("{0} {1}", item.Key, item.Group.Count());
Dodo http://dodo.sourceforge.net
I think using a predefined sort function is just too easy. This version uses only basic functions to get the job done! Written in dodo0 sublanguage (almost like assembler o_O)
# Import a few utility functions clojure('compare', 2) -> comp clojure('vector', 0) -> newList clojure('nth', 2) -> itemAt clojure('dec', 1) -> dec clojure('inc', 1) -> inc clojure('count', 1) -> count clojure('conj', 2) -> append # Return the biggest item in a list fun max -> list, return ( fun loop -> i, break ( '='(i, 0) -> zero if (zero) -> itemAt(list, i) -> item break(item);# nothing to compare with, select item | dec(i) -> k loop(k) -> ref# get max of other items itemAt(list, i) -> item comp(item, ref) -> order '>'(order, 0) -> bigger if (bigger) -> break(item)# current item is bigger | break(ref)# max of others is bigger ) | loop count(list) -> n dec(n) -> n loop(n, return) ) | max # Return a list of items and their count fun countList -> list, return ( 'empty?'(list) -> empty if (empty) -> return(list)# empty list empty result | max(list) -> biggest # Count occurrences and list non-occurrences fun scan -> i, end, done ( '='(i, end) -> stop if (stop) -> newList() -> l done(0, l);# finished scanning the list | inc(i) -> k scan(k, end) -> n, rest# scan rest of list for occurrences itemAt(list, i) -> item '='(item, biggest) -> match if (match) -> inc(n) -> n# occurrence of biggest item: increase count done(n, rest); | append(rest, item) -> nonmatching# append item to list of non-occurrences done(n, nonmatching) ) | scan count(list) -> n scan(0, n) -> cnt, rest# scan list for number of occurrences countList(rest) -> otherCounts# get counts for rest of the list append(otherCounts, biggest) -> myCounts append(myCounts, cnt, return)# return list with biggest item and its count added ) | countList 'vector'("foo", "bar", "bar", "foo", "glag", "foo") -> input countList(input) -> result println("Count of words in", input, ":", result) -> exit()Result:
Count of words in [foo bar bar foo glag foo] : [bar 2 foo 3 glag 1]
;; In the CommonLisp style: (require 'cl) (loop with words = '(foo bar bar foo glag foo) for w in (sort (remove-duplicates words) #'string<) collect (list w (count w words)))
import Data.Map as Map main = getLine >>= mapM_ printItem . bagSum . words where printItem (word, count) = putStrLn $ word ++ " " ++ (show count) bagSum = Map.toAscList . (foldr updateFunc Map.empty) where updateFunc key map = Map.insertWith (+) key 1 map-- DavidWahler?
A little more perusal of the standard libraries turns up a much more elegant (and possibly more efficient) implementation of the bagSum function:
bagSum = map itemFunc . group . sort where itemFunc l = (head l, length l)The first version is a straightforward translation of the Perl/Awk implementations, which use a hashtable with a counter for each word. The second version is isomorphic to the Unix shell version.
See BagSumInJava.
Using the GoogleCollections? library,
Multiset<String> strings = HashMultiset.create(); strings.addAll(Arrays.asList("foo", "bar", "bar", "foo", "glag", "foo")); for (Multiset.Entry<String> e : strings.entrySet()) { System.out.println(e.getElement() + "\t" + e.getCount()); }
var bag = ('foo bar bar foo glag foo').split(/\s+/).sort(); var bagsum = {}; for (var i=0; i<bag.length; i++) { var item = bag[i]; bagsum[item] = bagsum[item] ? bagsum[item] + 1 : 1; } for (var item in bagsum) { document.writeln(item + ' ' + bagsum[item] + '<br>'); }-- ElizabethWiethoff
There are two approaches to this:
Given a list of the form:
a=. ;: 'foo bar bar foo glag foo'First, the classic one-liner:
c,.;/+/b=/c=.(1,-.(}:=}.)b)#b=./:~aOr, more maintainable code:
NB. Sort tokens b=. /:~a NB. Unique tokens c=. (1 , -. (}: = }.) b) # b NB. Count'em and show'em c ,. ;/ +/ b =/ c-- MarcThibault
To get the unique tokens you could use "nub". And with tacit programming:
cnt=.[: +/ =/ sk=. [: /:~ ~. bagsum=.(] ,. [: <"0 cnt) skor as a one-liner,
bagsum=.(],.[:<"0[:+/=/)([:/:~~.)Run the code with,
bagsum a,where a is given above.
--JuneKim
let str_list = Str.split (Str.regexp "[ \t]+") "foo bar bar foo glag foo" ;; let hash_tbl = Hashtbl.create 10 ;; List.iter (fun x -> if Hashtbl.mem hash_tbl x then Hashtbl.replace hash_tbl x ((Hashtbl.find hash_tbl x) + 1) else Hashtbl.add hash_tbl x 1 ) str_list ;; Hashtbl.iter (Printf.printf "%s : %d\n") hash_tbl ;;-- ErikDeCastroLopo?
PurelyFunctional version
module StrMap = Map.Make(String) let bag_sum list = let find elem map = try StrMap.find elem map with Not_found -> 0 in let add elem map = StrMap.add elem (find elem map + 1) map in let map = List.fold_left (fun map elem -> add elem map) StrMap.empty list in List.rev (StrMap.fold (fun str count list -> ((str, count) :: list)) map [])
#!/usr/bin/perl use strict ; use warnings ; my %bag ; $bag{$_}++ for qw( foo bar bar foo glag foo ) ; print "$_ $bag{$_}\n" for sort keys %bag ;-- ChristofferHammarstrom
PHP's array_count_values() is perfect for this task:
<?php $bag = array('foo', 'bar', 'bar', 'foo', 'glag', 'foo'); $bagsum = array_count_values($bag); print_r($bagsum); ?>If you wanted to do it manually:
<?php $bag = array('foo', 'bar', 'bar', 'foo', 'glag', 'foo'); sort($bag); foreach ($bag as $item) { $bagsum[$item]++; } print_r($bagsum); ?>PHP's much-maligned assocative arrays, which function like hashes but preserve ordering, are plenty useful here for recording the output.
# "Classy" solution... class Bag(dict): def __init__(self, alist): for elem in alist: self.add(elem) def add(self, elem): self[elem] = self.get(elem, 0) + 1 def __str__(self): out = ['%-8s %3d' % (key, val) for (key, val) in sorted(self.items())] return '\n'.join(out) print Bag('foo bar bar foo glag foo'.split()) # Pythonic ListComprehension and loop solution... bag = 'foo bar bar foo glag foo'.split() bagsum = dict([(elem, bag.count(elem)) for elem in bag]) for key, val in sorted(bagsum.items()): print '%-8s %3d' % (key, val) # Sort of FunctionalProgramming solution... def count(elem, bagsum={}): bagsum[elem] = bagsum.get(elem, 0) + 1 return bagsum def sortKeys(adict): result = adict.items() result.sort() return result def output(pair): print '%-10s %3d' % pair def bagsum(abag): map(output, sortKeys(map(count, abag)[0])) bagsum('foo bar bar foo glag foo'.split())The Bag class derived from dict and the (sort of) FunctionalProgramming solution each use a dictionary to do the counting, and should complete the count in O(N) time. The ListComprehension solution, however, is different. It uses a dictionary only to eleminate duplicate elements in the counted list. (Python 2.4 will have a "set" type containing no duplicates.) You don't really want to use the list.count method for each element in the bag list, for that would take O(N**2) time.
<s>Python 2.5 should have collections.bag, which will allow this section to be cleaned up further...</s>
Using sorted and groupby (from Python 2.6+ and 3.x) to do the hard lifting (like the Haskell example):
Bag = 'foo bar bar foo glag foo'.split() for Key, Copies in groupby(sorted(Bag)): print Key, len(list(Copies))A simple sort and tally algorithm, no intermediate set, bag or dictionary required:
words = 'foo bar bar foo glag foo'.split() words.sort() prev, count = words[0], 1 for word in words[1:]: if word == prev: count += 1 else: print prev, count prev, count = word, 1 print prev, countYou can use the "defaultdict" class in Python 2.5 and later:
from collections import defaultdict words = 'foo bar bar foo glag foo'.split() sums = defaultdict(int) # when key is not found, it binds it to int(), which is 0 for word in words: sums[word] += 1 for key, count in sums.iteritems(): print "%s\t%s" % (key, count)Using the "Counter" class (which is like the "Bag" above) in Python 3.1 and later:
from collections import Counter words = 'foo bar bar foo glag foo'.split() sums = Counter(words) for key, count in sums.items(): print("%s\t%s" % (key, count))
bag <- c("foo", "bar", "bar", "foo", "glag", "foo") print(table(bag))Alternately,
print(tapply(bag, bag, length))
sums = Hash.new(0) %w{ foo bar bar foo glag foo }.each { |w| sums[w] = sums[w] + 1 } sums.keys.sort.each { |w| puts "#{w}\t#{sums[w]}" }-- JasonArhart
Using "group_by" from Ruby 1.9:
words = %w{ foo bar bar foo glag foo } sums = words.group_by {|x| x} .each_pair {|k, g| puts"#{k}\t#{g.length}" }
def bagSum(input: String): Iterable[(String, Int)] = { val counts = collection.mutable.Map[String, Int]() for (t <- input split ' ') counts(t) = counts.getOrElse(t, 0) + 1 util.Sorting.stableSort(counts toSeq, ((e: (String, Int)) => e._1)) } for ((token, sum) <- bagSum("foo bar bar foo glag foo")) println(token + "\t" + sum)Using "groupBy" from Scala 2.8:
def bagSum(input: String): Iterable[(String, Int)] = util.Sorting.stableSort( (input.split(' ') groupBy identity) map {case (t, g) => t -> g.size} toSeq, ((e: (String, Int)) => e._1)) for ((token, sum) <- bagSum("foo bar bar foo glag foo")) println(token + "\t" + sum)-- JasonArhart
(define (bagsum bag) (define (bs-adder thing alist) (cond ((null? alist) (list (cons thing 1))) ((eq? thing (caar alist)) (cons (cons thing (+ 1 (cdar alist))) (cdr alist))) (else (cons (car alist) (bs-adder thing (cdr alist)))))) (define (bs-helper bag alist) (cond ((null? bag) alist) (else (bs-helper (cdr bag) (bs-adder (car bag) alist))))) (bs-helper bag '())) (bagsum '(a b c d a a a b b c)) => ((a . 4) (b . 3) (c . 2) (d . 1))Here is my attempt at a (non-functional) SchemeLanguage version:
(define (bagsum bag) (let ((blist '())) (for-each (lambda (x) (if (assoc x blist) (set-cdr! (assoc x blist) (cons (+ 1 (cadr (assoc x blist))) '())) (set! blist (append blist (list (list x 1)))))) bag) blist)) (bagsum '(foo bar bar foo glag foo)) -> ((foo 3) (bar 2) (glag 1))I am sure a functional (and better) version could be written.
Nothing wrong with the imperative version. It should be quite efficient except for the fact that you are computing (assoc x blist) twice. I would suggest instead something equivalent to
... (cond ((assoc x blist) => (lambda (pair) (set-cdr! pair ....))) (else ....) ...Also, the specification does not require you to use append - you might as well cons the new piece before the rest.
Here is a functional version (uses SRFI 1):
(define (partitions bag seed) (if (null? bag) seed (let-values (((p rest) (partition (lambda (elem) (eqv? elem (car bag))) bag))) (partitions rest (cons p seed))))) (define (bagsum bag) (map (lambda (p) (list (first p) (length p))) (partitions bag '()))) (bagsum '(foo bar bar foo glag foo)) ;==> ((glag 1) (bar 2) (foo 3))
| sums | sums := Dictionary new. ('foo bar bar foo glag foo' findTokens: ' ') do: [:each | sums at: each put: (sums at: each ifAbsent: 0) + 1 ]. sums keys asSortedCollection do: [:each | Transcript show: each; space; show: (sums at: each); cr ].I'm sure there's an easier way to do this. Maybe some SmugSmalltalkWeenie will come along and show me how.
-- JasonArhart
It has been a few years, but Smalltalk has a Bag class, so the Weenie code would look something like the following (which is untested and probably wrong). If addAll: does not exist, it must be replaced by a loop. I sort and print associations, which are key value pairs. -- StanSilver
| bag | bag := Bag new addAll: #(foo bar bar foo glag foo) bag associations asSortedCollection do: [ :each | Transcript show: each; cr ]In the Squeak version, it's as simple as (#(foo bar bar foo glag foo) as: Bag) orderedCounts *alt-P*
structure StrMap = BinaryMapFn (struct type ord_key = string val compare = String.compare end) fun bag_sum list = let fun find (elem, map) = getOpt (StrMap.find (map, elem), 0) fun add (elem, map) = StrMap.insert (map, elem, find (elem, map) + 1) val map = foldl add StrMap.empty list in StrMap.listItemsi map end
Minimal:
SELECT item, count(*) FROM bag GROUP BY itemCleaner:
SELECT item, count(*) as Occurs FROM bag GROUP BY item ORDER BY item
I didn't originally notice the bit about sorting. Here is the corrected version.
foreach word [list foo bar bar foo glag foo] { if {![info exists wc($word)]} { set wc($word) 1 } else { incr wc($word) } } foreach word [lsort [array names wc]] { puts "$word $wc($word)" }-- KristofferLawson
Dim tokens() = {"foo", "bar", "bar", "foo", "glag", "foo"} Dim query = From token In tokens Group By token Into Count() For Each item In query Console.WriteLine("{0} {1}", item.token, item.Count) NextThat's almost like embedded SQL. VB is trying to be an SQL-ified ExBase now, eh?
sed 's/ */ /g' | sort | uniq -c | sed 's/^ *\([^ ]*\)\([^ ]*\)/\2 \1/'(tested with bash on Linux)
Whichever WikiGnome "helpfully" removed a newline from this pipeline completely broke it. Don't change code without testing it! Understanding it would be a good idea, too. -- dm
Another:
tr ' ' \\012 | sort | uniq -cIf you want to take into account too many spaces (as the first solution above does), use this:
tr ' ' \\012 | grep -vx '' | sort | uniq -cI'd rather disregard the output column order, it's insignificant and hard to do as specified above (that the numbers are in the same column). If it is a must, add the following to the pipeline:
sed -e 's#^ *\([0-9]*\) *\(.*\)$#<tr><td>\2</td><td>\1</td></tr>#' -e '1s#^#<table>#' -e '$s#$#</table>#' | w3m -dump -T text/htmlThe column ordering requirement is the only reason the first solution was at all coomplicated, otherwise we were doing pretty much the same thing -- except your tr is WikiGnome-proof. :-S
Your final solution is really throwing caution to the winds, blech. :-)
How about instead:
tr ' ' \\012 | grep -vx '' | sort | uniq -c | awk '{ print $2 " " $1; }'
Windows cmd Script
for %a in (foo bar bar foo glag foo) do @set /a Ans_%a+=1then
set Ans_to see the answers
Objective-C's NSCountedSet class was made for this task:
NSArray *strings = [NSArray arrayWithObjects:@"foo", @"bar", @"bar", @"foo", @"glag", @"foo", nil]; NSCountedSet *set = [NSCountedSet setWithArray:strings]; for (id o in set) { NSLog(@"%@\t%u", o, [set countForObject:o]); }
TutorialDee (using the RelProject implementation)
SUMMARIZE RELATION { TUPLE {i 1, s 'foo'}, TUPLE {i 2, s 'bar'}, TUPLE {i 3, s 'bar'}, TUPLE {i 4, s 'foo'}, TUPLE {i 5, s 'glag'}, TUPLE {i 6, s 'foo'} } BY {s} ADD (COUNT() AS n) ORDER (ASC s)
I created a language for expressing mathematical ideas in ASCII -- cllaed PINAPL (PINAPL Is Not A Programing Language) or MATHS.And in that, if I understand my definitions, then for list l:#T,
+bag(l)should do it.
-- Richard Botting
FUNCTION SortedWords?(cWordlist)
LOCAL ARRAY aWords[1], aWordCnts[1] LOCAL ctr, tot, output ctr = 0 tot = 0 output = "" CLEAR
tot = ALINES(aWords,ALLTRIM(cWordlist),12," ")
CREATE CURSOR words (oneword C(100)) FOR ctr = 1 TO tot INSERT INTO words (oneword) VALUES (aWords(ctr)) ENDFOR SELECT DISTINCT PADR(oneword,100," ") FROM words INTO ARRAY aWordCnts ORDER BY 1 FOR ctr = 1 TO ALEN(aWordCnts) COUNT FOR oneword == aWordCnts(ctr) TO subtot output = RTRIM(aWordCnts(ctr)) + " " + TRANSFORM(subtot) ? output ENDFOR RETURN ENDFUNC * Execution: DO SortedWords? WITH "foo bar bar foo glag foo"I'm pretty sure there is a more concise way to get such in ExBase. For one, the intro lets us assume the original list is in a table ("native format"), so we don't have to worry about parsing. I'll have to dig out the ol' manual.
Yes, I realize the exercise allows you to start with the data in any format. However, you can't pass a cursor to a function, procedure, or method in VisualFoxPro. I was thinking that if you're actually going to do such a thing, you need an impetus for getting it started. Assuming that we are operating in a single data session, any cursor is always in scope anywhere. I suppose I could adopt the conceit of passing a table name to the function, opening the table, and going from there. In any case, that's why I took the liberty of starting with the data as it was presented in the original string.
If the data's already in the table the easiest way would still be to use SELECT DISTINCT to get a second table of sorted unique values, then SCAN it and get the counts from the first table. You could also use a SELECT expression to get the counts rather than the ExBase COUNT FOR...TO...
In that case though, then the "language" being used is really more SQL and not so much VisualFoxPro.
On the other hand, doing it entirely in an array structure would be painful.
Still, I wanted to show some of the versatility of the hybrid nature of the language.
Maybe I'm thinking of another dialect. Where did I put those old books?
GNU Core Utils
sed 's/\s+/\n/g' <input.txt | sort |uniq -c|awk '{print $2 "\t" $1}'The awk bit is just to put the count after the item and the sed part is just to split a single line into separate lines, so the real work is just the "sort|uniq -c" part.
MathematicaLanguage almost has a function for that built in (missing only the sort). For a list of strings
With[{list={"foo","bar","bar","foo","glag","foo"}}, TableForm?@Sort@Tally[list]]Or, a little lower-level:
tallyFn[list_]:=Map[{#, Count[list, #]} &, Union[list]] tallyFn[{"foo","bar","bar","foo","glag","foo"}] // TableForm?
$array = ["foo", "bar", "bar", "foo", "glag", "foo"]; $tally = array_count_values($array); ksort($tally); // print_r($tally); // Just dumps the output. TSV layout follows foreach($tally as $key => $count) echo $key, "\t", $count, "\n";Again, because it feels like excessive laziness to use a native "bag sum" function,
$array = ["foo", "bar", "bar", "foo", "glag", "foo"]; $tally = array_fill_keys($array, 0); ksort($tally); foreach($array as $word) { $tally[$word]++; } print_r($tally);