(en) MasterBot

De Wikipast
Version datée du 25 avril 2019 à 15:52 par Assi (discussion | contributions)
(diff) ← Version précédente | Voir la version actuelle (diff) | Version suivante → (diff)
Aller à la navigation Aller à la recherche
Language Français English

Description

The MasterBot is responsible for managing and orchestrating wikipast bots. On the one hand, it allows to launch the bots individually, by entering parameters from a dashboard. On the other hand, it allows to personalize bot launch sequences, by specifying the parameters (i.e. frequency, launch, order, etc.) of each bot.

Start the bots individually

At first, the idea is to have the code of each bot on a specific page (or section of a page). This allows to launch the bots from the code available on the page of each (or even from Github). Then, just have a script to launch each bot (with the necessary parameters, eg the page to translate for the translatorBot). It should be able to run all this on a page "dashboard" on wikipast or with a simple web application.

The scripts

Each bot has its own script that takes care of the following tasks:

  1. Parser command line parameters
  2. Get the bot code (from wikipast or Github) (TODO)
  3. Launch the bot with the given parameters

The graphical interface

I used the Wooey [1] GUI which allows to launch Python scripts. I tested it all with TranslatorBot on the Lausanne and David Bowie pages. center | 450px|File: Wooey home.png | center | 450px center | 450px|File: Wooey done.png | center | 450px

Code

 <Nowiki> import requests import re from bs4 import BeautifulSoup from googletrans import Translator

  1. the function takes a table of strings as an argument containing the names of the pages to translate

def translate (* names):     user = 'testbot'     passw = 'dhbot2017'     baseurl = 'http: //wikipast.epfl.ch/wikipast/'     summary = 'Wikipastbot update'     translator = Translator ()

    # this parameter is the target language in which we want to translate     target_lang = 'en'     target_language = 'English'

    # login request     payload = { 'action', 'query', 'format' 'json' 'utf8': , 'meta': 'tokens', 'type', 'login'}     r1 = requests.post (baseurl + 'api.php', data = payload)

    # login confirm     login_token = r1.json () ['query'] [ 'token'] ['logintoken']     payload = { 'action', 'login', 'format' 'json' 'utf8': , 'lgname': user 'lgpassword': passw 'lgtoken': login_token}     r2 = requests.post (baseurl + 'api.php', data = payload, cookies = r1.cookies)

    # get edit token2     params3 =? format = json & action = query & meta = & continue = tokens'     r3 = requests.get (baseurl + 'api.php' + params3, cookies = r2.cookies)     edit_token = r3.json () ['query'] [ 'token'] ['csrftoken']

    edit_cookie = r2.cookies.copy ()     edit_cookie.update (r3.cookies)

    # we fetch the text we want to translate     for name in names:         result = requests.post (baseurl + 'api.php? action = query & titles =' + name + '& export & exportnowrap')         soup = BeautifulSoup (result.text, "lxml")         code =         for primitive in soup.findAll ("text"):             code + = primitive.string

        # create names with english prefix         en_name = "(" + target_lang + ") _" + translator.translate (name, src = 'en', dest = target_lang) .text

        # add a table if it still does not exist         if (code! = and code [0]! = '{' and code [0]! = '|'):            code2 = '{| class = "wikitable" \ n | Language \ n | '+ "' English '\ n | " + target_language + " \ n |} \ n" + code            payload2 = { 'action', 'edit', 'assert': 'user', 'format' 'json' 'utf8': , 'text': code2, 'summary': summary, 'title': name, 'token': edit_token}            r5 = requests.post (baseurl + 'api.php' data = payload2, cookies = edit_cookie)

        # save the links of sources that we will not translate         sources = []

       i=0
       while i< len(code):
           if (code[i-1]! = '[' and code [j] == '[' and code [i + 1]! = '['):

                j = i + 2                 while (code [i:j+1]! = ']'):                     j + = 1                 sources.append (code [i:j+1])                 code = code.replace (code [last], "[k:last]", 1)                 i = j + 1             else:                 i + = 1

        # translate the whole text by chunk of approx. 5000 characters.         length = len (code)         chain =         punto = '.'         k = 0         decrease = 1         last = k + 5000         while last <length:             if code [k+5000-diminution] == punto:                 string + = translator.translate (code [k:k+5000-diminution], src = 'fr', dest = target_lang) .text                 k = last             else:                 while code [last:length]! = punto:                     decrease + = 1                 string + = translator.translate (code [i], src = 'fr', dest = target_lang) .text                 k = k + 5000-decrease + 1                 decrease = 1             last + = 5000         last - = 5000         string + = translator.translate (code [' and translated_text[i+1], src = 'fr', dest = target_lang)if (translated_text [i+2] == '[j] ==' ['and translated_text [i+2:j]. isalpha ()):                 j = i                 while (translated_text [i]! = ']'):                     j + = 1                 m = translated_text [i]