This commit is contained in:
columndeeply
2022-08-28 15:16:58 +02:00
commit ae3f103fc4
8 changed files with 3701736 additions and 0 deletions

2
.gitignore vendored Normal file
View File

@ -0,0 +1,2 @@
merged
*_clean

54
README.md Normal file
View File

@ -0,0 +1,54 @@
This is the result of merging and cleaning up a bunch of porn-blocking lists I've found scattered through the web. It currently has 3.701.606 unique domains and a redirect to the "Safe Browsing" versions of Google, DuckDuckGo, Bing and YouTube.
The list is split into 90MB chunks to avoid GitHub's [file size limit](https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-large-files-on-github). Here are the raw links to all parts of the list:
> https://raw.githubusercontent.com/columndeeply/hosts/main/hosts00
> https://raw.githubusercontent.com/columndeeply/hosts/main/hosts01
If you need a single file list it can be found on the [releases page](https://github.com/columndeeply/hosts/releases). I'll be updated once a month.
# Contributing
Since this list is just a fused version of other lists and the domains aren't manually checked it'll probably have false positives. If a site is listed here and you think it shouldn't please let me know by opening a new issue or making a pull request. The same goes if you think a site should be added to the list. In both cases please use the appropriate tag on the issue title: ``[Addition Request]`` or ``[Removal Request]``.
### By making an issue
If you're submitting more than a couple dozen domains please make a pull request or use a pastebin site. If it's just a few domains then just paste them on the issue. If you took the domains from another list please say so and leave a link to their page so it can be added to the "Sources" table below.
### By making a pull request
When making a pull request you should add a file on the "contributions" directory with your domains. I'll then merge your pull request and merge the file into the main list. **Please do not make changes directly to the main list.** If you're not sure how to list the domains take a look at the [example](https://github.com/columndeeply/hosts/blob/main/contributions/example.txt).
### Invalid/Inactive domains
I'm not removing these from the list. Please do not submit domains for removal just because they are no longer active. Removing them would mean having to check every X months to see if they are back. I'll only whitelist domains that are active and point to a non-pornography related site.
# Scripts
I've added the two scripts I use to maintain this list to the repository. Feel free to use them anyway you want. Also, if you know a way to make them more clean (not have to use .tmp files would be nice) or any general improvements please let me know.
### cleanup.sh
Give it a list of files with domains and it'll try to clean them a bit. It removes all empty lines, comments, multiple whitespaces, tabs, trailing whitespaces and some more stuff. Once that's done it'll replace any IP at the beginning of each line with ``127.0.0.1`` (or add it if there isn't one). Then it removes any domain that exists in the whitelist, removes any duplicates and sorts the resulting list.
### merger.sh
Used to merge clean lists with the main one. Should be run like this ``sh merger.sh "*_clean"`` to make sure it only adds the lists created by the ``cleanup.sh`` script. It merges all given files with the main one, removes duplicates, sorts the merged list and once that's done it splits it into 90MB chunks.
# Sources
## Repos
| Link | Last update | Comments |
|---|---|---|
| [11201010's list](https://github.com/11201010/anti-porn-hosts-file/blob/master/HOSTS.txt) | 2020/08/03 | Forked from [4skinSkywalker's list](https://github.com/4skinSkywalker/Anti-Porn-HOSTS-File), looks abandoned. |
| [1boii's list](https://github.com/1boii/hosts/blob/main/hosts) | 2022/03/11 | |
| [4skinSkywalker's list](https://github.com/4skinSkywalker/Anti-Porn-HOSTS-File/blob/master/HOSTS.txt) | 2022/05/29 | |
| [Bon-Appetit's list](https://github.com/Bon-Appetit/porn-domains/blob/master/block.txt) | 2021/12/24 | |
| [EnergizedProtection's lists](https://github.com/EnergizedProtection/block) | | Using the "HOSTS RAW" list for "Porn" and "Porn Lite Extension". 255,621 and 51,065 entries. |
| [My Privacy DNS's list](https://mypdns.org/my-privacy-dns/porn-records/-/blob/master/active_domains/output/merged_results/domains/ACTIVE/list) | 2021/12/03 | |
| [Sinfonietta's list](https://github.com/Sinfonietta/hostfiles/blob/master/pornography-hosts) | 2022/08/24 | Seems to pull from [StevenBlack's list](https://github.com/StevenBlack/hosts/blob/master/alternates/porn/hosts). |
| [StevenBlack's list](https://github.com/StevenBlack/hosts/blob/master/alternates/porn/hosts) | 2022/08/26 | |
| [blocklistproject's list](https://github.com/blocklistproject/Lists/blob/master/porn.txt) | 2022/06/21 | |
| [cbuijs's list](https://github.com/cbuijs/ut1/blob/master/adult/domains.24733) | 2022/08/28 | |
| [chadmyfield's list](https://github.com/chadmayfield/my-pihole-blocklists/blob/master/lists/pi_blocklist_porn_all.list) | 2020/09/11 | Archived. |
| [clefspeare13's list](https://mypdns.org/clefspeare13/pornhosts/-/tree/master/download_here/0.0.0.0) | 2022/04/06 | |
| [mrvivacious's list](https://github.com/mrvivacious/PorNo-_Porn_Blocker/tree/master/lists/Urls) | 2022/04/04 | Split by first letter of domain. ``cat * > merged`` to fuse. |
| [thisisu's list](https://github.com/thisisu/hosts_adultxxx/blob/master/hosts) | 2022/08/28 | |
| [tiuxo's list](https://github.com/tiuxo/hosts/blob/master/porn) | 2020/12/06 | Looks abandoned. |
## Random lists
- https://gist.githubusercontent.com/sibaspage/5248d7600a24284f580219b29d178c49/raw/b35fdaf7a8685b536da0022102e125df70c50eb1/pornsite-list.txt
- https://booru.org/top (Filtered by NSFW and sorted by number of images. First 10 pages only since after that they had less than 200 images each, not worth the effort to parse them.)
- http://controlc.com/99125ac6 (Posted by [/u/lojack_or_nojack](https://teddit.net/r/NoFap/comments/924t6w/an_updated_list_of_porn_sites_to_block_in_your/).)

View File

@ -0,0 +1,7 @@
# Example list for contributors.
# List your domains one on each line and just the domain/subdomain. Please don't add full URLs and don't add lines with "http"/"https".
example.org
www.example.com
example.net
example.org
www.example.edu

2668962
hosts00 Normal file

File diff suppressed because it is too large Load Diff

1032644
hosts01 Normal file

File diff suppressed because it is too large Load Diff

32
scripts/cleanup.sh Executable file
View File

@ -0,0 +1,32 @@
#!/usr/bin/env sh
# Removes a bunch of stuff and adds the IPs if missing.
# All empty lines, anything pointing to localhost, comments, trailing whitespaces, multiple spaces, tabs, etc. get removed.
# All domains should point to 127.0.0.1
# If any of the domains exist in the whitelist it's ignored.
# Once everything is clean the file gets stored and any duplicate gets removed.
#
for file in "$@"; do
[ ! -f "$file" ] && echo "$file does not exist" && continue
# Remove comments, stuff pointing to localhost, whitespaces, tabs, etc.
sed -i 's/#.*$//g;/^$/d;/localhost$/d;/::/d;/local$/d;/localdomain$/d;/broadcasthost$/d;/0.0.0.0$/d;/^[[:space:]]*$/d;s/[ \t]*$//g;s/^[ \t]*//g;s/[[:blank:]]/ /g' "$file"
# All domains should point to 127.0.0.1
## Add it to all lines not starting with an IP address
sed -i -r '/^([0-9]{1,3}\.){3}[0-9]{1,3}/! s/^/127.0.0.1 /' "$file"
## Replace all IPs with 127.0.0.1
sed -i -r 's/^([0-9]{1,3}\.){3}[0-9]{1,3} /127.0.0.1 /g' "$file"
# Remove multiple spaces (If anybody knows how to do this with sed please let me know)
tr -s ' ' < "$file" > "$file.tmp"
# Remove any domain that exists in the whitelist
grep -v -x -f ../whitelist "$file.tmp" > "$file.tmp2"
# Remove duplicates
sort < "$file.tmp2" | uniq > "$file"_clean
# Remove the old files
rm "$file.tmp" "$file.tmp2" "$file"
done

35
scripts/merger.sh Executable file
View File

@ -0,0 +1,35 @@
#!/usr/bin/env sh
# Merges all files given as parameters with the main one. Once merged it removes all duplicates and sorts the list.
# Should be run after cleaning the lists with cleanup.sh. To merge only clean files run it like this:
## sh merger.sh "*_clean"
#
# Remove temp file from previous runs if needed
rm -f merged.tmp
# Check that all parameters are existing files
for file in "$@"; do
[ ! -f "$file" ] && echo "$file does not exist. Exiting..." && exit
done
# Merge all files into one
cat "$@" ../hosts* > merged.tmp
# Remove the old lists
rm -f ../hosts*
# Remove duplicates
sort < merged.tmp | uniq > merged
# Split the merged file into 90MB chunks to avoid GitHub's limit
split merged hosts -C 90MB -d
# Move the new hosts files and the merged list to the parent directory
mv hosts* ..
mv merged ..
# Show how many changes in total
git diff --stat ../hosts*
# Remove tmp files
rm -- merged.tmp *_clean

0
whitelist Normal file
View File