Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there any alternative for 'snapshot'? #151

Open
powerfulTrouser opened this issue Jan 13, 2018 · 1 comment
Open

Is there any alternative for 'snapshot'? #151

powerfulTrouser opened this issue Jan 13, 2018 · 1 comment

Comments

@powerfulTrouser
Copy link

I'm a student and I'm trying to follow this site

http://www.automatingosint.com/blog/2016/09/dark-web-osint-part-four-using-scikit-learn-to-find-hidden-service-clones/

to use machine learning to analysis dark web.
But I had found that 'snapshot' became unavailable.
Then I found an issue said this function had been moved to dat_0
My dat_0 file is about 10G.
I tried to parse it by python and kaitai struct but failed.
onions.py.txt
parsedat.py.txt
Is there any way to at least implement the analysis from the website?
(use old version onionscan or some tutorial of how to achieve same goal by new onionscan or somewhat)

Thanks!

@powerfulTrouser
Copy link
Author

Finally I use python to parse dat_0 to many many many json file

`# coding:utf-8
import json
import sys
import os
import stat

i = 0
knife = '{"Page":{"Status":'

def is_json(myjson):
try:
json_object = json.loads(myjson)
except ValueError as e:
try:
json_object = json.loads(myjson.rsplit('}', 2)[0] + '}')
except ValueError as e:
print(e)
print(myjson)
return 0
print(myjson.rsplit('}', 2)[0] + '}')
return myjson.rsplit('}', 2)[0] + '}'
return myjson

with open('/Home/dat_0.json') as f:
for line in f:
for frag in s.split(knife):
if len(frag) is 0 and '{' not in frag:
del frag
else:
frag = frag.rsplit('}', 1)[0]
frag = knife + frag + '}'
frag = str(frag)
if is_json(frag) is not 0:
result_json = json.loads(is_json(frag))
if result_json['Page']['Status'] != 403 and result_json['Page']['Status'] != 404:
print("下一個")
path = ('/Home/parse dat-1/' +
result_json['URL'].encode('utf8')[7:-1].replace('/', '斜線')+'.json')
try:
f = open(path, 'w+')
except IOError as e:
path = ('/Home/parse dat-1/' +
'有問題'+str(i)+'.json')
i = i + 1
print(e)
f = open(path, 'w+')
f.write(frag)
f.close()
`
It won't generate json file which status is 403 or 404.
I use '{"Page":{"Status":' to split the file, wondering there's any better cut string.
This is not a beautiful solution, but it works however.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant