Yesterday I realized, I want to help others to start their life on Mastodon. I tried to find the best way to get all the messages from a given instance, but Local Timeline was a mess because of all the bots (I don’t really want to mute them because it’s useful, like @CNN, @NASA or @WIRED. So, I come up with an idea… What if I have a tool for that.
That’s it, now I have a tool for that. There is a python package named toot, I already have contribution on this project, so I though if I can’t use it for calling instance timeline, then whatever, I can make a Pull Request and job done.
At the end, I could make the API call I wanted, so I’m happy. What should that script to:
- Pull off the last N messages from the Local Timeline
- Filter out known bots
- Filter out all non-english posts, at least if they are marked as a non-english post
- An easy way to extend the bot list
- An easy way to open the toot in browser
Of course, later I can make improvements like, save the last message ID and fetch only new messages, but I did not wanted to spend more then 10 minutes on it. Just a quick tool. If I would make a bigger plan, then it has a high risk I will never finish it, or even start it.
Pre-requirements
[I] ❯ pip install --user bs4 toot
The script
#!/usr/bin/env python3
import html
from bs4 import BeautifulSoup
from toot import api, http
# https://github.com/tootsuite/documentation/blob/master/Using-the-API/API.md#timelines
LIMIT_MAX = 40
PAGES = 3
BASE_URL = 'https://quey.org'
def fetch_local_timeline():
url = '/api/v1/timelines/public'
params = {'local': True, 'limit': LIMIT_MAX}
for _ in range(0, PAGES):
response = http.anon_get(
"{:s}{:s}".format(BASE_URL, url),
params
)
yield from response.json()
url = api._get_next_path(response.headers)
ignore_list = [
953, # time
955, # NU
6000, # landscape
13560, # CNN
16776, # check
16913, # Stock
29161, # HispaBot
59827, # YouTube
60670, # iDownloadBlog
82049, # Wired
]
for msg in fetch_local_timeline():
if int(msg['account']['id']) in ignore_list:
continue
if msg['language'] != 'en':
continue
content = BeautifulSoup(msg['content'], 'html.parser').get_text()
if len(content) < 1:
continue
content = content.replace('&apos', "'")
print()
print(">>> [{:6d}] [{:s}] {:s}".format(int(msg['account']['id']),
msg['language'],
msg['account']['username']))
#print(html.unescape(content))
print(content)
print(">>> {:s}/web/statuses/{:s}".format(BASE_URL, msg['id']))
Output
[I] ❯ quey-public
>>> [ 97845] [en] nullcollision
Looking for an OCR API for Japanese. Only thing that has worked so far
is Google Vision API but it's damn expensive. Tesseract works but it
has a huge error rate (I'm trying to make subtitles for a game in real
time).
>>> https://quey.org/web/statuses/101986330994407735
>>> [ 67884] [en] auramun
I hate that joint meme, I fucking hate it, who photoshops a human hand
instead of a puppy paw, it's terrible, I hate it, puppies shouldn't
smoke nor speak Spanish, 0/10 I hate it
>>> https://quey.org/web/statuses/101985765401920740
>>> [ 97845] [en] nullcollision
guess I'll work on my side projects
>>> https://quey.org/web/statuses/101985594245381351
>>> [ 52] [en] snder
Good morning people of planet earth. It’s Thursday so don’t get hit by
lightning!
>>> https://quey.org/web/statuses/101985589140925567
So that way, it’s easy to copy and add a given user into the known bots list, I can simply click on the link at the end of each toot to open in browser. So that’s it, not so much, but works. And I know I could do it without simple HTTP requests without the toot package, but meh, toot is already installed so whatever.
Extra Credits
All the users in the sample output:
The instance I use: Quey.org