Instance Public Timeline with Python

Yesterday I realized, I want to help others to start their life on Mastodon. I tried to find the best way to get all the messages from a given instance, but Local Timeline was a mess because of all the bots (I don’t really want to mute them because it’s useful, like @CNN, @NASA or @WIRED. So, I come up with an idea… What if I have a tool for that.

That’s it, now I have a tool for that. There is a python package named toot, I already have contribution on this project, so I though if I can’t use it for calling instance timeline, then whatever, I can make a Pull Request and job done.

At the end, I could make the API call I wanted, so I’m happy. What should that script to:

Pull off the last N messages from the Local Timeline
Filter out known bots
Filter out all non-english posts, at least if they are marked as a non-english post
An easy way to extend the bot list
An easy way to open the toot in browser

Of course, later I can make improvements like, save the last message ID and fetch only new messages, but I did not wanted to spend more then 10 minutes on it. Just a quick tool. If I would make a bigger plan, then it has a high risk I will never finish it, or even start it.

Pre-requirements

[I] ❯ pip install --user bs4 toot

The script

#!/usr/bin/env python3

import html
from bs4 import BeautifulSoup
from toot import api, http

# https://github.com/tootsuite/documentation/blob/master/Using-the-API/API.md#timelines
LIMIT_MAX = 40
PAGES = 3
BASE_URL = 'https://quey.org'

def fetch_local_timeline():
    url = '/api/v1/timelines/public'
    params = {'local': True, 'limit': LIMIT_MAX}
    for _ in range(0, PAGES):
        response = http.anon_get(
            "{:s}{:s}".format(BASE_URL, url),
            params
        )
        yield from response.json()
        url = api._get_next_path(response.headers)

ignore_list = [
    953,    # time
    955,    # NU
    6000,   # landscape
    13560,  # CNN
    16776,  # check
    16913,  # Stock
    29161,  # HispaBot
    59827,  # YouTube
    60670,  # iDownloadBlog
    82049,  # Wired
]

for msg in fetch_local_timeline():
    if int(msg['account']['id']) in ignore_list:
        continue
    if msg['language'] != 'en':
        continue

    content = BeautifulSoup(msg['content'], 'html.parser').get_text()
    if len(content) < 1:
        continue

    content = content.replace('&apos', "'")

    print()
    print(">>> [{:6d}] [{:s}] {:s}".format(int(msg['account']['id']),
                                           msg['language'],
                                           msg['account']['username']))
    #print(html.unescape(content))
    print(content)
    print(">>> {:s}/web/statuses/{:s}".format(BASE_URL, msg['id']))

Output

[I] ❯ quey-public

>>> [ 97845] [en] nullcollision
Looking for an OCR API for Japanese. Only thing that has worked so far
is Google Vision API but it's damn expensive. Tesseract works but it
has a huge error rate (I'm trying to make subtitles for a game in real
time).
>>> https://quey.org/web/statuses/101986330994407735

>>> [ 67884] [en] auramun
I hate that joint meme, I fucking hate it, who photoshops a human hand
instead of a puppy paw, it's terrible, I hate it, puppies shouldn't
smoke nor speak Spanish, 0/10 I hate it
>>> https://quey.org/web/statuses/101985765401920740

>>> [ 97845] [en] nullcollision
guess I'll work on my side projects
>>> https://quey.org/web/statuses/101985594245381351

>>> [    52] [en] snder
Good morning people of planet earth. It’s Thursday so don’t get hit by
lightning!
>>> https://quey.org/web/statuses/101985589140925567

So that way, it’s easy to copy and add a given user into the known bots list, I can simply click on the link at the end of each toot to open in browser. So that’s it, not so much, but works. And I know I could do it without simple HTTP requests without the toot package, but meh, toot is already installed so whatever.

Extra Credits

All the users in the sample output:

The instance I use: Quey.org