Putting the 'role' back in role-playing games since 2002.
Donate to Codex
Good Old Games
  • Welcome to rpgcodex.net, a site dedicated to discussing computer based role-playing games in a free and open fashion. We're less strict than other forums, but please refer to the rules.

    "This message is awaiting moderator approval": All new users must pass through our moderation queue before they will be able to post normally. Until your account has "passed" your posts will only be visible to yourself (and moderators) until they are approved. Give us a week to get around to approving / deleting / ignoring your mundane opinion on crap before hassling us about it. Once you have passed the moderation period (think of it as a test), you will be able to post normally, just like all the other retards.

Parrot Count :M

  • Thread starter a cut of domestic sheep prime
  • Start date

a cut of domestic sheep prime

Guest
I want to count the number of parrots in a given thread for research porpoises.

Is there some web app I can use for this? Thank you for your time and answering this very important question. :M
 

a cut of domestic sheep prime

Guest
It can be easily done with Python.

fgGj2nG.png


K4UoLbc.jpg
:M:M:M
 
Last edited by a moderator:

JRIz

Augur
Joined
Aug 17, 2015
Messages
502
Easy for unix master race:
Code:
remove-trailing-slashes() {
  local ret="${1%[!/]*}"
  echo "${ret}${1:${#ret}:1}"
}
threadUrl="$(remove-trailing-slashes "$1")"
parrotRegex='img src="/forums/smiles/('\
'bird\.gif|'\
'wally_the_prestigious_monocled_bird\.gif'\
')"'

pageNumber=1
totalCount=0

{
for i in $(seq 100); do
  pageUrl="$threadUrl"/
  if [ $i -gt 1 ]; then
    pageUrl="$threadUrl"/page-$i
  fi

  parrotCount="$(curl -i "$pageUrl" 2>/dev/null | 
      tee >(sed '2,$d' >3) | grep -E -o "$parrotRegex" | wc -l)"
  if grep 307 <3 >/dev/null; then
    break
  fi
  totalCount=$(($totalCount + $parrotCount))
done
}<3

echo $totalCount

With parallelism!
Code:
import os, urllib.request, sys, re, collections, multiprocessing as mp

ParrotType = collections.namedtuple("ParrotType", ["groupName", "description"])
Result = collections.namedtuple("Result", ["pageNumber", "counts"])
ProcessPoolSize = 5

ParrotRegex = re.compile(r'img src="/forums/smiles/('
r'(?P<normal>bird\.gif)|'
r'(?P<hat>wally_the_prestigious_monocled_bird\.gif)'
r')"')
Types = [ParrotType("hat", "Parrot with hat"),
    ParrotType("normal", "Normal parrot")]


def countInString(ParrotRegex, Types, counts, string):
  for match in ParrotRegex.finditer(string):
    i, parrotType = next((i, parrotType) for i, parrotType in 
        enumerate(Types) if match.group(parrotType.groupName) != None)
    counts[i] += 1

def printResults(counts):
  for i, parrotType in enumerate(Types):
    print("{0}: {1}".format(parrotType.description, counts[i]))
  print("Total: " + str(sum(counts)))

def addCounts(counts1, counts2):
  return [x1 + x2 for x1, x2 in zip(counts1, counts2)]

def countOnWebPage(ParrotRegex, Types, pageUrl):
  with urllib.request.urlopen(pageUrl) as request:
    if request.url != pageUrl:
      return None
    ret = [0 for _ in Types]
    countInString(ParrotRegex, Types, ret, request.read().decode("utf-8"))
    return ret

def countIntoQueue(ParrotRegex, Types, pageNumbersQueue, resultsQueue, 
    threadUrl):
  processingRequests = True
  while True:
    requestedPageNo = pageNumbersQueue.get()
    if requestedPageNo is None:
      pageNumbersQueue.task_done()
      return

    if processingRequests:
      pageUrl = threadUrl + "/" if requestedPageNo == 1 else \
          "{0}/page-{1}".format(threadUrl, requestedPageNo)
      counts = countOnWebPage(ParrotRegex, Types, pageUrl)
      if counts is None:
        processingRequests = False
      resultsQueue.put(Result(requestedPageNo, counts))
    pageNumbersQueue.task_done()

def poisonQueue(queue):
  for i in range(ProcessPoolSize):
    queue.put(None)

counts = [0 for _ in Types]
threadUrl = sys.argv[1].strip("/")

resultsQueue = mp.Queue()
pageNumbersQueue = mp.JoinableQueue()
highestRequestedPageNumber = 0

def updateJobs(highestPageNoWithResponse):
  global highestRequestedPageNumber
  highestPageNumberToTry = highestPageNoWithResponse + ProcessPoolSize
  for i in range(highestRequestedPageNumber + 1, highestPageNumberToTry + 1):
    pageNumbersQueue.put(i)
  highestRequestedPageNumber = highestPageNumberToTry

for _ in range(ProcessPoolSize):
  counterProcess = mp.Process(target=countIntoQueue, args=(ParrotRegex, Types, 
    pageNumbersQueue, resultsQueue, threadUrl))
  counterProcess.start()
updateJobs(0)

while True:
  result = resultsQueue.get()
  if result.counts == None:
    poisonQueue(pageNumbersQueue)
    pageNumbersQueue.join()
    break
  else:
    updateJobs(result.pageNumber)
    counts = addCounts(counts, result.counts)
while not resultsQueue.empty():
  result = resultsQueue.get()
  if result.counts != None:
    counts = addCounts(counts, result.counts)

printResults(counts)

Call both with thread URL:
Code:
$ python script.py 'http://www.rpgcodex.net/forums/index.php?threads/parrot-count-m.102131'
Parrot with hat: 25
Normal parrot: 34
Total: 59

You probably shouldn't call them on megathreads, especially the second, unless your research is for the greater good of humanity.
 

a cut of domestic sheep prime

Guest
Easy for unix master race:
Code:
remove-trailing-slashes() {
  local ret="${1%[!/]*}"
  echo "${ret}${1:${#ret}:1}"
}
threadUrl="$(remove-trailing-slashes "$1")"
parrotRegex='img src="/forums/smiles/('\
'bird\.gif|'\
'wally_the_prestigious_monocled_bird\.gif'\
')"'

pageNumber=1
totalCount=0

{
for i in $(seq 100); do
  pageUrl="$threadUrl"/
  if [ $i -gt 1 ]; then
    pageUrl="$threadUrl"/page-$i
  fi

  parrotCount="$(curl -i "$pageUrl" 2>/dev/null |
      tee >(sed '2,$d' >3) | grep -E -o "$parrotRegex" | wc -l)"
  if grep 307 <3 >/dev/null; then
    break
  fi
  totalCount=$(($totalCount + $parrotCount))
done
}<3

echo $totalCount

With parallelism!
Code:
import os, urllib.request, sys, re, collections, multiprocessing as mp

ParrotType = collections.namedtuple("ParrotType", ["groupName", "description"])
Result = collections.namedtuple("Result", ["pageNumber", "counts"])
ProcessPoolSize = 5

ParrotRegex = re.compile(r'img src="/forums/smiles/('
r'(?P<normal>bird\.gif)|'
r'(?P<hat>wally_the_prestigious_monocled_bird\.gif)'
r')"')
Types = [ParrotType("hat", "Parrot with hat"),
    ParrotType("normal", "Normal parrot")]


def countInString(ParrotRegex, Types, counts, string):
  for match in ParrotRegex.finditer(string):
    i, parrotType = next((i, parrotType) for i, parrotType in
        enumerate(Types) if match.group(parrotType.groupName) != None)
    counts[i] += 1

def printResults(counts):
  for i, parrotType in enumerate(Types):
    print("{0}: {1}".format(parrotType.description, counts[i]))
  print("Total: " + str(sum(counts)))

def addCounts(counts1, counts2):
  return [x1 + x2 for x1, x2 in zip(counts1, counts2)]

def countOnWebPage(ParrotRegex, Types, pageUrl):
  with urllib.request.urlopen(pageUrl) as request:
    if request.url != pageUrl:
      return None
    ret = [0 for _ in Types]
    countInString(ParrotRegex, Types, ret, request.read().decode("utf-8"))
    return ret

def countIntoQueue(ParrotRegex, Types, pageNumbersQueue, resultsQueue,
    threadUrl):
  processingRequests = True
  while True:
    requestedPageNo = pageNumbersQueue.get()
    if requestedPageNo is None:
      pageNumbersQueue.task_done()
      return

    if processingRequests:
      pageUrl = threadUrl + "/" if requestedPageNo == 1 else \
          "{0}/page-{1}".format(threadUrl, requestedPageNo)
      counts = countOnWebPage(ParrotRegex, Types, pageUrl)
      if counts is None:
        processingRequests = False
      resultsQueue.put(Result(requestedPageNo, counts))
    pageNumbersQueue.task_done()

def poisonQueue(queue):
  for i in range(ProcessPoolSize):
    queue.put(None)

counts = [0 for _ in Types]
threadUrl = sys.argv[1].strip("/")

resultsQueue = mp.Queue()
pageNumbersQueue = mp.JoinableQueue()
highestRequestedPageNumber = 0

def updateJobs(highestPageNoWithResponse):
  global highestRequestedPageNumber
  highestPageNumberToTry = highestPageNoWithResponse + ProcessPoolSize
  for i in range(highestRequestedPageNumber + 1, highestPageNumberToTry + 1):
    pageNumbersQueue.put(i)
  highestRequestedPageNumber = highestPageNumberToTry

for _ in range(ProcessPoolSize):
  counterProcess = mp.Process(target=countIntoQueue, args=(ParrotRegex, Types,
    pageNumbersQueue, resultsQueue, threadUrl))
  counterProcess.start()
updateJobs(0)

while True:
  result = resultsQueue.get()
  if result.counts == None:
    poisonQueue(pageNumbersQueue)
    pageNumbersQueue.join()
    break
  else:
    updateJobs(result.pageNumber)
    counts = addCounts(counts, result.counts)
while not resultsQueue.empty():
  result = resultsQueue.get()
  if result.counts != None:
    counts = addCounts(counts, result.counts)

printResults(counts)

Call both with thread URL:
Code:
$ python script.py 'http://www.rpgcodex.net/forums/index.php?threads/parrot-count-m.102131'
Parrot with hat: 25
Normal parrot: 34
Total: 59

You probably shouldn't call them on megathreads, especially the second, unless your research is for the greater good of humanity.
While I totally appreciate it, I have next to no idea how to actually run it. I've downloaded python 3 and saved the scripts as a .py files, but when I try to run either of them, I get an error. Either an "invalid syntax error" on the first one. Or something about "freezing" on the second one.

Basically, I need a step by step of how to run this on windows... Assume you are talking to someone who knows nothing about python - because I know nothing about python. :M
 

Nevill

Arcane
Joined
Jun 6, 2009
Messages
11,211
Shadorwun: Hong Kong
Thank you, kind sirs, for the scripts. I will have to try them out later as I, too, share the OP's interest in this particular kind of SCIENCE.
 

JRIz

Augur
Joined
Aug 17, 2015
Messages
502
While I totally appreciate it, I have next to no idea how to actually run it. I've downloaded python 3 and saved the scripts as a .py files, but when I try to run either of them, I get an error. Either an "invalid syntax error" on the first one. Or something about "freezing" on the second one.

Basically, I need a step by step of how to run this on windows... Assume you are talking to someone who knows nothing about python - because I know nothing about python. :M

The first script is bash which you won't be able to easily run on Windows. Concerning the python script, I haven't initially tested it on Windows but now I noticed the error you described. To make it work on Windows you gotta insert another line and indent the rest so the lower part of the file becomes:

Code:
def poisonQueue(queue):
  for i in range(ProcessPoolSize):
    queue.put(None)

if __name__ == "__main__":
  counts = [0 for _ in Types]
  threadUrl = sys.argv[1].strip("/")

  resultsQueue = mp.Queue()
  pageNumbersQueue = mp.JoinableQueue()
  highestRequestedPageNumber = 0

  def updateJobs(highestPageNoWithResponse):
    global highestRequestedPageNumber
    highestPageNumberToTry = highestPageNoWithResponse + ProcessPoolSize
    for i in range(highestRequestedPageNumber + 1, highestPageNumberToTry + 1):
      pageNumbersQueue.put(i)
    highestRequestedPageNumber = highestPageNumberToTry

  for _ in range(ProcessPoolSize):
    counterProcess = mp.Process(target=countIntoQueue, args=(ParrotRegex, Types,
      pageNumbersQueue, resultsQueue, threadUrl))
    counterProcess.start()
  updateJobs(0)

  while True:
    result = resultsQueue.get()
    if result.counts == None:
      poisonQueue(pageNumbersQueue)
      pageNumbersQueue.join()
      break
    else:
      updateJobs(result.pageNumber)
      counts = addCounts(counts, result.counts)
  while not resultsQueue.empty():
    result = resultsQueue.get()
    if result.counts != None:
      counts = addCounts(counts, result.counts)

  printResults(counts)

Code:
import os, urllib.request, sys, re, collections, multiprocessing as mp

ParrotType = collections.namedtuple("ParrotType", ["groupName", "description"])
Result = collections.namedtuple("Result", ["pageNumber", "counts"])
ProcessPoolSize = 5

ParrotRegex = re.compile(r'img src="/forums/smiles/('
r'(?P<normal>bird\.gif)|'
r'(?P<hat>wally_the_prestigious_monocled_bird\.gif)'
r')"')
Types = [ParrotType("hat", "Parrot with hat"),
    ParrotType("normal", "Normal parrot")]


def countInString(ParrotRegex, Types, counts, string):
  for match in ParrotRegex.finditer(string):
    i, parrotType = next((i, parrotType) for i, parrotType in
        enumerate(Types) if match.group(parrotType.groupName) != None)
    counts[i] += 1

def printResults(counts):
  for i, parrotType in enumerate(Types):
    print("{0}: {1}".format(parrotType.description, counts[i]))
  print("Total: " + str(sum(counts)))

def addCounts(counts1, counts2):
  return [x1 + x2 for x1, x2 in zip(counts1, counts2)]

def countOnWebPage(ParrotRegex, Types, pageUrl):
  with urllib.request.urlopen(pageUrl) as request:
    if request.url != pageUrl:
      return None
    ret = [0 for _ in Types]
    countInString(ParrotRegex, Types, ret, request.read().decode("utf-8"))
    return ret

def countIntoQueue(ParrotRegex, Types, pageNumbersQueue, resultsQueue,
    threadUrl):
  processingRequests = True
  while True:
    requestedPageNo = pageNumbersQueue.get()
    if requestedPageNo is None:
      pageNumbersQueue.task_done()
      return

    if processingRequests:
      pageUrl = threadUrl + "/" if requestedPageNo == 1 else \
          "{0}/page-{1}".format(threadUrl, requestedPageNo)
      counts = countOnWebPage(ParrotRegex, Types, pageUrl)
      if counts is None:
        processingRequests = False
      resultsQueue.put(Result(requestedPageNo, counts))
    pageNumbersQueue.task_done()

def poisonQueue(queue):
  for i in range(ProcessPoolSize):
    queue.put(None)

if __name__ == "__main__":
  counts = [0 for _ in Types]
  threadUrl = sys.argv[1].strip("/")

  resultsQueue = mp.Queue()
  pageNumbersQueue = mp.JoinableQueue()
  highestRequestedPageNumber = 0

  def updateJobs(highestPageNoWithResponse):
    global highestRequestedPageNumber
    highestPageNumberToTry = highestPageNoWithResponse + ProcessPoolSize
    for i in range(highestRequestedPageNumber + 1, highestPageNumberToTry + 1):
      pageNumbersQueue.put(i)
    highestRequestedPageNumber = highestPageNumberToTry

  for _ in range(ProcessPoolSize):
    counterProcess = mp.Process(target=countIntoQueue, args=(ParrotRegex, Types,
      pageNumbersQueue, resultsQueue, threadUrl))
    counterProcess.start()
  updateJobs(0)

  while True:
    result = resultsQueue.get()
    if result.counts == None:
      poisonQueue(pageNumbersQueue)
      pageNumbersQueue.join()
      break
    else:
      updateJobs(result.pageNumber)
      counts = addCounts(counts, result.counts)
  while not resultsQueue.empty():
    result = resultsQueue.get()
    if result.counts != None:
      counts = addCounts(counts, result.counts)

  printResults(counts)

You obviously managed to see output from the script. It suffices to run
Code:
$ C:\Python34\python.exe script.py "http://www.rpgcodex.net/forums/index.php?threads/parrot-count-m.102131"
in cmd.exe assuming this default location of python.exe.

Come to think of it, those parrots that aren't technically codex smileys won't be counted but this can be easily done by changing the regex. Do you want to count parrots in quotes, too? The script does that currently.
 
Last edited:

Nevill

Arcane
Joined
Jun 6, 2009
Messages
11,211
Shadorwun: Hong Kong
you know what to do. :M
Install linux. :negative:

The closest I got is this thread:
http://stackoverflow.com/questions/18204782/runtimeerror-on-windows-trying-python-multiprocessing

And this post:
multiprocessing will fork on systems which support it (i.e. not Windows) and re-execute the file otherwise (on Windows).

Of course, re-executing the file re-starts a process, since you didn't put that in the if __name__ == '__main__': section (which exists for precisely this reason).
Apparently, one is supposed to use the if __name__ == '__main__': check to avoid starting an infinite number of subprocesses, but as I've only learned about python today I am unable to understand where it should go. Perhaps JRIz could help? :stupid:
 

a cut of domestic sheep prime

Guest
*sigh* i've been meaning to install virtual box on my new pc anyway... Of course, that will take forever so my vital research will be delayed until i have the time
 

a cut of domestic sheep prime

Guest
your insightful comentary has persuded me. just finished installing freya - don't give me bs about <insert distro here> distribution being better. it looked pretty.

edit: installing python. linux's tagline should be "by autists for autists". they spend all this time on interface and I still have to pull up a terminal to get anything meaningful done.

ffs, just make it so I can double click to run/install things. how hard is that?
 

imweasel

Guest
If you know how to write Python scripts then you don't need to install Linux, because Python works great with Windows too. Just FYI.
 

a cut of domestic sheep prime

Guest
I know. I have it on windows, but nevill was saying that the multiprocessing wouldn't work in windows


edit: if you can get this to run, please let us know how. already I probably could have gone through and just counted the parrots myself... :M
 

imweasel

Guest
Multiprocessing (multithreading) in Python works for both Linux and Windows operating systems... but that shouldn't really interest you anyway, since don't need more than a single thread for your simple task.
 

a cut of domestic sheep prime

Guest
Multiprocessing (multithreading) in Python works for both Linux and Windows operating systems... but that shouldn't really interest you anyway, since don't need more than a single thread for your simple task.
see, i have no idea of what any of it means anyway.

do you know how to make this script work on windows? because I have failed to get it to work on linux
 

Luka-boy

Arcane
Joined
Sep 24, 2014
Messages
1,642
Location
Asspain
So I take this the official lovebird thread?


Lovebirds are awesome.

I am totally not a lovebird typing this.

:M

:M:M:M:M:M:M:M:M:M:M:M:M:M:M:M:M:M:M:M:M:M
:M:M+M+M:M+M:M+M:M+M+M:M:M+M+M+M:M+M+M+M:M
:M+M:M:M:M+M:M+M:M+M:M+M:M:M+M:M:M+M+M:M:M
:M+M:M:M:M+M:M+M:M+M:M+M:M:M+M:M:M:M:M+M:M
:M:M+M+M:M:M+M:M:M+M:M+M:M:M+M:M:M+M+M+M:M
:M:M:M:M:M:M:M:M:M:M:M:M:M:M:M:M:M:M:M:M:M
 

JRIz

Augur
Joined
Aug 17, 2015
Messages
502
Seriously, this is typical Windows user criticism :obviously:.
your insightful comentary has persuded me. just finished installing freya - don't give me bs about <insert distro here> distribution being better. it looked pretty.
It does indeed look pretty. A pity, though, that it “diminishes the need to access the terminal” as simple python scripts are designed for the terminal. That's why it has one nonetheless. Terminal is your friend.

edit: installing python. linux's tagline should be "by autists for autists". they spend all this time on interface and I still have to pull up a terminal to get anything meaningful done.

ffs, just make it so I can double click to run/install things. how hard is that?
On Windows, you double-click to install one thing. On Linux, you type one line to install everything you will ever need, with updates for all eternity.

I know. I have it on windows, but nevill was saying that the multiprocessing wouldn't work in windows


edit: if you can get this to run, please let us know how. already I probably could have gone through and just counted the parrots myself... :M
Multiprocessing works all right on Windows. The script will too with the fix. I have no moderator approval, and I must scream, it seems.

Apparently, one is supposed to use the if __name__ == '__main__': check to avoid starting an infinite number of subprocesses, but as I've only learned about python today I am unable to understand where it should go. Perhaps JRIz could help? :stupid:
You figured it out :).

Multiprocessing (multithreading) in Python works for both Linux and Windows operating systems... but that shouldn't really interest you anyway, since don't need more than a single thread for your simple task.
Saying that in 2015. Moar threads -> faster -> better. Multiprocessing uses processes, not threads, by the way.

I get a pleasant feeling from knowing to have forced people to install Linux in this thread.
 

As an Amazon Associate, rpgcodex.net earns from qualifying purchases.
Back
Top Bottom