Bag Sum In Many Programming Languages

Assume you have a list (bag) of text tokens as input. Produce sorted output that has the number of occurrences of each element. Example:

Input: foo bar bar foo glag foo

Output: bar 2 foo 3 glag 1

You don't have to worry about the output alignment, case sensitivity, or collating sequence of non-alpha characters. The input list can start out in a string, array, or whatever native structure you want. This is to avoiding turning it into a parsing contest.

AwkLanguage

 awk '{ c[$1]++ } END { for (x in c) print x, c[x]; }'

You didn't say how it was to be sorted. You can put " | sort" on the end if you want it sorted by key, " | sort -n -k 2" if you want it sorted by number (increasing) or " | sort -rn -k 3" if you want it sorted in decreasing number.

CeePlusPlus

 #include <map>
 #include <iostream>
 #include <string>

 int main() {
  std::string const s[] = { "foo", "bar", "bar", "foo", "glag", "foo" };
  std::map<std::string, int> m;
  for (int i = 0; i < 6; i++){
m[s[i]]++;
  }
  for (std::map<std::string, int>::const_iterator it = m.begin(); it != m.end(); it++) {
std::cout << (*it).first << '\t' << (*it).second << '\n';
  }
 }

 #include <set>
 #include <iostream>
 #include <string>
 int main() {
  std::string const i[] = { "foo", "bar", "bar", "foo", "glag", "foo" };
  std::multiset<std::string> s (i, i + 6);
  std::set<std::string> keys (s.begin(), s.end());
  for (std::set<std::string>::const_iterator it = keys.begin(); it != keys.end(); it++) {
std::cout << *it << '\t' << s.count(*it) << '\n';
  }
 }

Or how about CeePlusPlus with BoostLibraries:

 #include <iostream>
 #include <map>
 #include <string>
 #include <boost/lambda/lambda.hpp>
 #include <boost/lambda/bind.hpp>

 int main() {
using namespace std; using namespace boost; using namespace boost::lambda;
string i[] = { "foo", "bar", "bar", "foo", "glag", "foo" };
map<string,int> m;
for_each(i, i + 6, bind(&map<string,int>::operator[], var(m), _1)++);
for_each(m.begin(), m.end(),
cout << bind(&pair<string const,int>::first, _1) << '\t'
 << bind(&pair<string const,int>::second, _1) << '\n');
 }

Or C++11:

 #include <set>
 #include <iostream>
 #include <string>
 int main() {
     auto s = std::multiset<std::string>{"foo", "bar", "bar", "foo", "glag", "foo"};
     auto keys = std::set<std::string>(s.begin(), s.end());
     for(auto key: keys)
         std::cout << key << '\t' << s.count(key) << '\n';
 }

CommonLisp

 ;; Loop (woohoo!) solution...
 (loop with words = '(foo bar bar foo glag foo)
for w in (sort (remove-duplicates words) #'string<)
do (format t "~&~A ~A" w (count w words)))

 ;; FunctionalProgramming solution...
 (let ((words '(foo bar bar foo glag foo)))
(mapc (lambda (w) (format t "~&~A ~A" w (count w words)))
 (sort (remove-duplicates words) #'string<)))

CeeSharp:

  string[] tokens = { "foo", "bar", "bar", "foo", "glag", "foo" };
  var query = from s in tokens group s by s;
  foreach (var item in query)
Console.WriteLine("{0} {1}", item.Key, item.Group.Count());

Dodo http://dodo.sourceforge.net

I think using a predefined sort function is just too easy. This version uses only basic functions to get the job done! Written in dodo0 sublanguage (almost like assembler o_O)

  # Import a few utility functions
  clojure('compare', 2) -> comp
  clojure('vector', 0) -> newList
  clojure('nth', 2) -> itemAt
  clojure('dec', 1) -> dec
  clojure('inc', 1) -> inc
  clojure('count', 1) -> count
  clojure('conj', 2) -> append

  # Return the biggest item in a list
  fun max -> list, return
  (
fun loop -> i, break
(
'='(i, 0) -> zero
if (zero) ->
itemAt(list, i) -> item
break(item);# nothing to compare with, select item
|
dec(i) -> k
loop(k) -> ref# get max of other items
itemAt(list, i) -> item
comp(item, ref) -> order
'>'(order, 0) -> bigger
if (bigger) ->
  break(item)# current item is bigger
|
  break(ref)# max of others is bigger
)
| loop

count(list) -> n
dec(n) -> n
loop(n, return)
  )
  | max

  # Return a list of items and their count
  fun countList -> list, return
  (
'empty?'(list) -> empty
if (empty) ->
return(list)# empty list empty result
|
max(list) -> biggest
# Count occurrences and list non-occurrences
fun scan -> i, end, done
(
'='(i, end) -> stop
if (stop) ->
  newList() -> l
  done(0, l);# finished scanning the list
|
  inc(i) -> k
  scan(k, end) -> n, rest# scan rest of list for occurrences
  itemAt(list, i) -> item
  '='(item, biggest) -> match
  if (match) ->
inc(n) -> n# occurrence of biggest item: increase count
done(n, rest);
  |
append(rest, item) -> nonmatching# append item to list of non-occurrences
done(n, nonmatching)
)
| scan

count(list) -> n
scan(0, n) -> cnt, rest# scan list for number of occurrences
countList(rest) -> otherCounts# get counts for rest of the list
append(otherCounts, biggest) -> myCounts
append(myCounts, cnt, return)# return list with biggest item and its count added
  )
  | countList

  'vector'("foo", "bar", "bar", "foo", "glag", "foo") -> input
  countList(input) -> result
  println("Count of words in", input, ":", result) ->
  exit()

Result:

  Count of words in [foo bar bar foo glag foo] : [bar 2 foo 3 glag 1]

EmacsLisp:

 ;; In the CommonLisp style:
 (require 'cl)
 (loop with words = '(foo bar bar foo glag foo)
for w in (sort (remove-duplicates words) #'string<)
collect (list w (count w words)))

HaskellLanguage

 import Data.Map as Map

 main = getLine >>= mapM_ printItem . bagSum . words
where printItem (word, count) = putStrLn $ word ++ " " ++ (show count)

 bagSum = Map.toAscList . (foldr updateFunc Map.empty) 
where updateFunc key map =
Map.insertWith (+) key 1 map

-- DavidWahler?

A little more perusal of the standard libraries turns up a much more elegant (and possibly more efficient) implementation of the bagSum function:

 bagSum = map itemFunc . group . sort
where itemFunc l = (head l, length l)

The first version is a straightforward translation of the Perl/Awk implementations, which use a hashtable with a counter for each word. The second version is isomorphic to the Unix shell version.

JavaLanguage

See BagSumInJava.

Using the GoogleCollections? library,

  Multiset<String> strings = HashMultiset.create();
  strings.addAll(Arrays.asList("foo", "bar", "bar", "foo", "glag", "foo"));

  for (Multiset.Entry<String> e : strings.entrySet()) {
System.out.println(e.getElement() + "\t" + e.getCount());
  }

JavaScript

 var bag = ('foo bar bar foo glag foo').split(/\s+/).sort();
 var bagsum = {};
 for (var i=0; i<bag.length; i++) {
var item = bag[i];
bagsum[item] = bagsum[item] ? bagsum[item] + 1 : 1;
 }
 for (var item in bagsum) {
document.writeln(item + ' ' + bagsum[item] + '<br>');
 }

-- ElizabethWiethoff

JayLanguage

There are two approaches to this:

Given a list of the form:

a=. ;: 'foo bar bar foo glag foo'

First, the classic one-liner:

c,.;/+/b=/c=.(1,-.(}:=}.)b)#b=./:~a

Or, more maintainable code:

NB. Sort tokens
b=. /:~a
NB. Unique tokens
c=. (1 , -. (}: = }.) b) # b
NB. Count'em and show'em
c ,. ;/ +/ b =/ c

-- MarcThibault

To get the unique tokens you could use "nub". And with tacit programming:

cnt=.[: +/ =/
sk=. [: /:~ ~.
bagsum=.(] ,. [: <"0 cnt) sk

or as a one-liner,

bagsum=.(],.[:<"0[:+/=/)([:/:~~.)

Run the code with,

bagsum a

,where a is given above.

--JuneKim

OcamlLanguage

 let str_list = Str.split (Str.regexp "[ \t]+") "foo bar bar  foo glag foo" ;;

 let hash_tbl = Hashtbl.create 10 ;;

 List.iter (fun x ->
if Hashtbl.mem hash_tbl x then 
Hashtbl.replace hash_tbl x ((Hashtbl.find hash_tbl x) + 1)
else
Hashtbl.add hash_tbl x 1
)
str_list ;;

 Hashtbl.iter (Printf.printf "%s : %d\n") hash_tbl ;;

-- ErikDeCastroLopo?

PurelyFunctional version

module StrMap = Map.Make(String)
let bag_sum list =
let find elem map =
try StrMap.find elem map
with Not_found -> 0
in let add elem map =
StrMap.add elem (find elem map + 1) map
in
let map = List.fold_left (fun map elem -> add elem map)
  StrMap.empty list in
List.rev
(StrMap.fold
(fun str count list ->
((str, count) :: list))
map [])

PerlLanguage

 #!/usr/bin/perl

 use strict ;
 use warnings ;

 my %bag ;
 $bag{$_}++ for qw( foo bar bar foo glag foo ) ;
 print "$_ $bag{$_}\n" for sort keys %bag ;

-- ChristofferHammarstrom

PhpLanguage

PHP's array_count_values() is perfect for this task:

 <?php
 $bag = array('foo', 'bar', 'bar', 'foo', 'glag', 'foo');
 $bagsum = array_count_values($bag);
 print_r($bagsum);
 ?>

If you wanted to do it manually:

 <?php
 $bag = array('foo', 'bar', 'bar', 'foo', 'glag', 'foo');
 sort($bag);
 foreach ($bag as $item) {
  $bagsum[$item]++;
 }
 print_r($bagsum);
 ?>

PHP's much-maligned assocative arrays, which function like hashes but preserve ordering, are plenty useful here for recording the output.

PythonLanguage

 # "Classy" solution...
 class Bag(dict):
def __init__(self, alist):
for elem in alist:
self.add(elem)

def add(self, elem):
self[elem] = self.get(elem, 0) + 1

def __str__(self):
out = ['%-8s %3d' % (key, val)
for (key, val) in sorted(self.items())]
return '\n'.join(out)

 print Bag('foo bar bar foo glag foo'.split())

 # Pythonic ListComprehension and loop solution...
 bag = 'foo bar bar foo glag foo'.split()
 bagsum = dict([(elem, bag.count(elem)) for elem in bag])
 for key, val in sorted(bagsum.items()):
print '%-8s %3d' % (key, val)

 # Sort of FunctionalProgramming solution...
 def count(elem, bagsum={}):
bagsum[elem] = bagsum.get(elem, 0) + 1
return bagsum
 def sortKeys(adict):
result = adict.items()
result.sort()
return result
 def output(pair): print '%-10s %3d' % pair
 def bagsum(abag):
map(output, sortKeys(map(count, abag)[0]))
 bagsum('foo bar bar foo glag foo'.split())

The Bag class derived from dict and the (sort of) FunctionalProgramming solution each use a dictionary to do the counting, and should complete the count in O(N) time. The ListComprehension solution, however, is different. It uses a dictionary only to eleminate duplicate elements in the counted list. (Python 2.4 will have a "set" type containing no duplicates.) You don't really want to use the list.count method for each element in the bag list, for that would take O(N**2) time.

-- ElizabethWiethoff

<s>Python 2.5 should have collections.bag, which will allow this section to be cleaned up further...</s>

Using sorted and groupby (from Python 2.6+ and 3.x) to do the hard lifting (like the Haskell example):

 Bag = 'foo bar bar foo glag foo'.split()
 for Key, Copies in groupby(sorted(Bag)):
print Key, len(list(Copies))

A simple sort and tally algorithm, no intermediate set, bag or dictionary required:

  words = 'foo bar bar foo glag foo'.split()
  words.sort()
  prev, count = words[0], 1
  for word in words[1:]:
if word == prev:
count += 1
else:
print prev, count
prev, count = word, 1
  print prev, count

You can use the "defaultdict" class in Python 2.5 and later:

 from collections import defaultdict
 words = 'foo bar bar foo glag foo'.split()
 sums = defaultdict(int) # when key is not found, it binds it to int(), which is 0
 for word in words:
sums[word] += 1
 for key, count in sums.iteritems():
print "%s\t%s" % (key, count)

Using the "Counter" class (which is like the "Bag" above) in Python 3.1 and later:

 from collections import Counter
 words = 'foo bar bar foo glag foo'.split()
 sums = Counter(words)
 for key, count in sums.items():
print("%s\t%s" % (key, count))

ArrLanguage

bag <- c("foo", "bar", "bar", "foo", "glag", "foo")
print(table(bag))

Alternately,

print(tapply(bag, bag, length))

RubyLanguage

 sums = Hash.new(0)
 %w{ foo bar bar foo glag foo }.each { |w| sums[w] = sums[w] + 1 }
 sums.keys.sort.each { |w| puts "#{w}\t#{sums[w]}" }

-- JasonArhart

Using "group_by" from Ruby 1.9:

 words = %w{ foo bar bar foo glag foo }
 sums = words.group_by {|x| x} .each_pair {|k, g| puts"#{k}\t#{g.length}" }

ScalaLanguage

 def bagSum(input: String): Iterable[(String, Int)] = {
val counts = collection.mutable.Map[String, Int]()
for (t <- input split ' ') counts(t) = counts.getOrElse(t, 0) + 1
util.Sorting.stableSort(counts toSeq, ((e: (String, Int)) => e._1))
 }

 for ((token, sum) <- bagSum("foo bar bar foo glag foo"))
println(token + "\t" + sum)

Using "groupBy" from Scala 2.8:

 def bagSum(input: String): Iterable[(String, Int)] =
util.Sorting.stableSort(
(input.split(' ') groupBy identity) map {case (t, g) => t -> g.size} toSeq,
((e: (String, Int)) => e._1))

 for ((token, sum) <- bagSum("foo bar bar foo glag foo"))
println(token + "\t" + sum)

-- JasonArhart

SchemeLanguage

 (define (bagsum bag)
(define (bs-adder thing alist)
(cond ((null? alist)
(list (cons thing 1)))
((eq? thing (caar alist))
(cons (cons thing (+ 1 (cdar alist))) (cdr alist)))
(else
(cons (car alist) (bs-adder thing (cdr alist))))))
(define (bs-helper bag alist)
(cond ((null? bag) alist)
(else (bs-helper (cdr bag) (bs-adder (car bag) alist)))))
(bs-helper bag '()))

 (bagsum '(a b c d a a a b b c)) => ((a . 4) (b . 3) (c . 2) (d . 1))

Here is my attempt at a (non-functional) SchemeLanguage version:

  (define (bagsum bag)
(let ((blist '()))
 (for-each 
(lambda (x)
(if (assoc x blist)
 (set-cdr! (assoc x blist)
(cons (+ 1 (cadr (assoc x blist))) '()))
 (set! blist (append blist (list (list x 1))))))
bag)
 blist))

  (bagsum '(foo bar bar foo glag foo)) -> ((foo 3) (bar 2) (glag 1))

I am sure a functional (and better) version could be written.

-- JonathanArkell

Nothing wrong with the imperative version. It should be quite efficient except for the fact that you are computing (assoc x blist) twice. I would suggest instead something equivalent to

...
(cond ((assoc x blist) => (lambda (pair) (set-cdr! pair ....)))
 (else ....)
...

Also, the specification does not require you to use append - you might as well cons the new piece before the rest.

Here is a functional version (uses SRFI 1):

  (define (partitions bag seed)
(if (null? bag)
seed
(let-values 
(((p rest) 
(partition (lambda (elem) 
(eqv? elem (car bag))) 
 bag)))
  (partitions rest (cons p seed))))) 

  (define (bagsum bag)
(map (lambda (p) (list (first p) (length p)))
 (partitions bag '())))

  (bagsum '(foo bar bar foo glag foo))  ;==> ((glag 1) (bar 2) (foo 3))

SmalltalkLanguage

 | sums |
 sums := Dictionary new.
 ('foo bar bar foo glag foo' findTokens: ' ')
do: [:each | sums at: each put: (sums at: each ifAbsent: 0) + 1 ].
 sums keys asSortedCollection
do: [:each | Transcript show: each; space; show: (sums at: each); cr ].

I'm sure there's an easier way to do this. Maybe some SmugSmalltalkWeenie will come along and show me how.

-- JasonArhart

It has been a few years, but Smalltalk has a Bag class, so the Weenie code would look something like the following (which is untested and probably wrong). If addAll: does not exist, it must be replaced by a loop. I sort and print associations, which are key value pairs. -- StanSilver

 | bag |
 bag := Bag new addAll: #(foo bar bar foo glag foo)
 bag associations asSortedCollection do: [ :each | Transcript show: each; cr ]

In the Squeak version, it's as simple as (#(foo bar bar foo glag foo) as: Bag) orderedCounts *alt-P*

SmlNjLanguage

structure StrMap = BinaryMapFn (struct
type ord_key = string
val compare = String.compare
  end)
fun bag_sum list =
let
fun find (elem, map) =
getOpt (StrMap.find (map, elem), 0)
fun add (elem, map) =
StrMap.insert (map, elem, find (elem, map) + 1)
val map = foldl add StrMap.empty list
in
StrMap.listItemsi map
end

SqlLanguage

Minimal:

  SELECT item, count(*)
  FROM bag
  GROUP BY item

Cleaner:

  SELECT item, count(*) as Occurs
  FROM bag
  GROUP BY item
  ORDER BY item

ToolCommandLanguage

I didn't originally notice the bit about sorting. Here is the corrected version.

 foreach word [list foo bar bar foo glag foo] {
if {![info exists wc($word)]} {
set wc($word) 1
} else {
incr wc($word)
}
 }

 foreach word [lsort [array names wc]] {
puts "$word $wc($word)"
 }

-- KristofferLawson

VisualBasicNine

  Dim tokens() = {"foo", "bar", "bar", "foo", "glag", "foo"}
  Dim query = From token In tokens Group By token Into Count()

  For Each item In query
Console.WriteLine("{0} {1}", item.token, item.Count)
  Next

That's almost like embedded SQL. VB is trying to be an SQL-ified ExBase now, eh?

UnixShell

  sed 's/  */  /g' | sort | uniq -c | sed 's/^ *\([^ ]*\)\([^ ]*\)/\2 \1/'

(tested with bash on Linux)

Whichever WikiGnome "helpfully" removed a newline from this pipeline completely broke it. Don't change code without testing it! Understanding it would be a good idea, too. -- dm

Another:

 tr ' ' \\012 | sort | uniq -c

If you want to take into account too many spaces (as the first solution above does), use this:

 tr ' ' \\012 | grep -vx '' | sort | uniq -c

I'd rather disregard the output column order, it's insignificant and hard to do as specified above (that the numbers are in the same column). If it is a must, add the following to the pipeline:

 sed -e 's#^ *\([0-9]*\) *\(.*\)$#<tr><td>\2</td><td>\1</td></tr>#'  -e '1s#^#<table>#' -e '$s#$#</table>#' | w3m -dump -T text/html

The column ordering requirement is the only reason the first solution was at all coomplicated, otherwise we were doing pretty much the same thing -- except your tr is WikiGnome-proof. :-S

Your final solution is really throwing caution to the winds, blech. :-)

How about instead:

  tr ' ' \\012 | grep -vx '' | sort | uniq -c | awk '{ print $2 " " $1; }'

Windows cmd Script

  for %a in (foo bar bar foo glag foo) do @set /a Ans_%a+=1

then

  set Ans_

to see the answers

ObjectiveCee

Objective-C's NSCountedSet class was made for this task:

  NSArray *strings = [NSArray arrayWithObjects:@"foo", @"bar", @"bar", @"foo", @"glag", @"foo", nil];
  NSCountedSet *set = [NSCountedSet setWithArray:strings];

  for (id o in set) {
NSLog(@"%@\t%u", o, [set countForObject:o]);
  }

TutorialDee (using the RelProject implementation)

 SUMMARIZE 
RELATION {
TUPLE {i 1, s 'foo'},
TUPLE {i 2, s 'bar'},
TUPLE {i 3, s 'bar'},
TUPLE {i 4, s 'foo'},
TUPLE {i 5, s 'glag'},
TUPLE {i 6, s 'foo'}
} 
  BY {s} ADD (COUNT() AS n) ORDER (ASC s)

I created a language for expressing mathematical ideas in ASCII -- cllaed PINAPL (PINAPL Is Not A Programing Language) or MATHS.And in that, if I understand my definitions, then for list l:#T,

+bag(l)

should do it.

-- Richard Botting

VisualFoxPro

  FUNCTION SortedWords?(cWordlist)

Declarations and initializations are not strictly necessary for the
code to work but it's good practice.

LOCAL ARRAY aWords[1], aWordCnts[1]

LOCAL ctr, tot, output

ctr = 0
tot = 0
output = ""

CLEAR

Assume the separator is always a space.

tot = ALINES(aWords,ALLTRIM(cWordlist),12," ")

Not many words longer than 100 characters...

CREATE CURSOR words (oneword C(100))

FOR ctr = 1 TO tot
  INSERT INTO words (oneword) VALUES (aWords(ctr))
ENDFOR

SELECT DISTINCT PADR(oneword,100," ") FROM words INTO ARRAY aWordCnts ORDER BY 1

FOR ctr = 1 TO ALEN(aWordCnts)
  COUNT FOR oneword == aWordCnts(ctr) TO subtot

  output = RTRIM(aWordCnts(ctr)) + " " + TRANSFORM(subtot)

  ? output
ENDFOR

RETURN
  ENDFUNC

  * Execution:
  DO SortedWords? WITH "foo bar bar foo glag foo"

I'm pretty sure there is a more concise way to get such in ExBase. For one, the intro lets us assume the original list is in a table ("native format"), so we don't have to worry about parsing. I'll have to dig out the ol' manual.

Yes, I realize the exercise allows you to start with the data in any format. However, you can't pass a cursor to a function, procedure, or method in VisualFoxPro. I was thinking that if you're actually going to do such a thing, you need an impetus for getting it started. Assuming that we are operating in a single data session, any cursor is always in scope anywhere. I suppose I could adopt the conceit of passing a table name to the function, opening the table, and going from there. In any case, that's why I took the liberty of starting with the data as it was presented in the original string.

If the data's already in the table the easiest way would still be to use SELECT DISTINCT to get a second table of sorted unique values, then SCAN it and get the counts from the first table. You could also use a SELECT expression to get the counts rather than the ExBase COUNT FOR...TO...

In that case though, then the "language" being used is really more SQL and not so much VisualFoxPro.

On the other hand, doing it entirely in an array structure would be painful.

Still, I wanted to show some of the versatility of the hybrid nature of the language.

Maybe I'm thinking of another dialect. Where did I put those old books?

GNU Core Utils

  sed 's/\s+/\n/g' <input.txt | sort |uniq -c|awk '{print $2 "\t" $1}'

The awk bit is just to put the count after the item and the sed part is just to split a single line into separate lines, so the real work is just the "sort|uniq -c" part.

MathematicaLanguage almost has a function for that built in (missing only the sort). For a list of strings

  With[{list={"foo","bar","bar","foo","glag","foo"}}, TableForm?@Sort@Tally[list]]

Or, a little lower-level:

  tallyFn[list_]:=Map[{#, Count[list, #]} &, Union[list]]
  tallyFn[{"foo","bar","bar","foo","glag","foo"}] // TableForm?

PhpLanguage

  $array = ["foo", "bar", "bar", "foo", "glag", "foo"];
  $tally = array_count_values($array);
  ksort($tally);

  //  print_r($tally); // Just dumps the output. TSV layout follows
  foreach($tally as $key => $count)
     echo $key, "\t", $count, "\n";

Again, because it feels like excessive laziness to use a native "bag sum" function,

  $array = ["foo", "bar", "bar", "foo", "glag", "foo"];

  $tally = array_fill_keys($array, 0);
  ksort($tally);

  foreach($array as $word)
  {
    $tally[$word]++;
  }
  print_r($tally);

JulyZeroFive

CategoryInManyProgrammingLanguages