Skip to content

High memory usage with github page as sample #41

@entrptaher

Description

@entrptaher

image

import requests
from mlscraper.html import Page
from mlscraper.samples import Sample, TrainingSet
from mlscraper.training import train_scraper

# fetch the page to train
einstein_url = 'https://github.com/lorey/mlscraper/issues/38'
resp = requests.get(einstein_url)
assert resp.status_code == 200

# create a sample for Albert Einstein
# please add at least two samples in practice to get meaningful rules!
training_set = TrainingSet()
page = Page(resp.content)
sample = Sample(page, {'title': 'Scraper not found error'})
training_set.add_sample(sample)

# train the scraper with the created training set
scraper = train_scraper(training_set)

# scrape another page
resp = requests.get('https://github.com/lorey/mlscraper/issues/27')
result = scraper.get(Page(resp.content))
print(result)

Metadata

Metadata

Assignees

Labels

invalidThis doesn't seem right

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions