Joseph Milazzo 150e67031a
v0.5.6 - Performance Part 2 (Is that a new scan loop?) (#1500)
* New Scan Loop (#1447)

* Staging the code for the new scan loop.

* Implemented a basic idea of changes on drives triggering scan loop. Issues: 1. Scan by folder does not work, 2. Queuing system is very hacky and needs a separate thread, 3. Performance degregation could be very real.

* Started writing unit test for new loop code

* Implemented a basic method to scan a folder path with ignore support (not implemented, code in place)

* Added some code to the parser to build out the idea of processing series in batches based on some top level folder.

* Scan Series now uses the new code (folder based parsing) and now handles the LocalizedSeries issue.

* Got library scan working with the new folder-based scan loop. Updated code to set FolderPath (for improved scan times and partial scan support).

* Wrote some notes on update library scan loop.

* Removed migration for merge

* Reapplied the SeriesFolder migration after merge

* Refactored a check that used multiple db calls into one.

* Made lots of progress on ignore support, but some confusion on underlying library. Ticket created. On hold till then.

* Updated Scan Library and Scan Series to exit early if no changes are on the underlying folders that need to be scanned.

* Implemented the ability to have .kavitaignore files within your directories and Kavita will parse them and ignore files and directories based on rules within them.

* Fixed an issue where ignore files nested wouldn't stack with higher level ignores

* Wrote out some basic code that showcases how we can scan series or library based on file events on the underlying system. Very buggy, needs lots of edge case testing and logging and dupplication checking.

* Things are working kinda. I'm getting lost in my own code and complexity. I'm not sure it's worth it.

* Refactored ScanFiles out to Directory Service.

* Refactored more code out to keep the code clean.

* More unit tests

* Refactored the signature of ParsedSeries to use IList. Started writing unit tests and reworked the UpdateLibrary to work how it used to with new scan loop code (note: using async update library/series does not work).

* Fixed the bug where processSeriesInfos was being invoked twice per series and made the code work very similar to old code (except loose leaf files dont work) but with folder based scanning.

* Prep for unit tests (updating broken ones with new implementations)

* Just some notes. Not sure I want to finish this work.

* Refactored the LibraryWatcher with some comments and state variables.

* Undid the migrations in case I don't move forward with this branch

* Started to clean the code and prepare for finishing this work.

* Fixed a bad merge

* Updated signatures to cleanup the code and commit to the new strategy for scanning.

* Swapped out the code with async processing of series on a small library

* The new scan loop is working in both Sync and Async methods. The code is slow and not optimized. This represents a good point to start polling and applying optimizations.

* Refactored UpdateSeries out of Scanner and into a dedicated file.

* Refactored how ProcessTasks are awaited to allow more async

* Fixed an issue where side nav item wouldn't show correct highlight and migrated to OnPush

* Moved where we start to stopwatch to encapsulate the full scan

* Cleaned up SignalR events to report correctly (still needs a redesign)

* Remove the "remove" code until I figure it out

* Put in extremely expensive series deletion code for library scan.

* Have Genre and Tag update the DB immediately to avoid dup issues

* Taking a break

* Moving to a lock with People was successful. Need to apply to others.

* Refactored code for series level and tag and genre with new locking strategy.

* New scan loop works. Next up optimization

* Swapped out the Kavita log with svg for faster load

* Refactored metadata updates to occur when the series are being updated.

* Code cleanup

* Added a new type of generic message (Info) to inform the user.

* Code cleanup

* Implemented an optimization which prevents any I/O (other than an attribute lookup) for Library/Series Scan. This can bring a recently updated library on network storage (650 series) to fully process in 2 seconds.

Fixed a bug where File Analysis was running everytime for each non-epub file.

* Fixed ARM x64 builds not being able to view PDF cover images due to a bad update in DocNet.

* Some code cleanup

* Added experimental signalr update code to have a more natural refresh of library-detail page

* Hooked in ability to send new series events to UI

* Moved all scan (file scan only) tasks into Scan Queue. Made it so scheduled ScanLibraries will now check if any existing task is being run and reschedule for 3 hours, and 10 mins for scan series.

* Implemented the info event in the events widget and added a clear all button to dismiss all infos and errors.  Added --event-widget-info-bg-color

* Remove --drawer-background-color since it's not used

* When new series added, inject directly into the view.

* Some debug code cleanup

* Fixed up the unit tests

* Ensure all config directories exist on startup

* Disabled Library Watching (that will go in next build)

* Ensure update for series is admin only

* Lots of code changes, scan series kinda works, specials are splitting, optimizations are failing. Demotivated on this work again.

* Removed SeriesFolder migration

* Added the SeriesFolder migration

* Added a new pipe for dates so we can provide some nicer defaults. Added folder path to the series detail.

* The scan optimizations now work for NTFS systems.

* Removed a TODO

* Migrated all the times to use DateTime.Now and not Utc.

* Refactored some repo calls to use the includes flag pattern

* Implemented a check for the library scan optimization check to validate if the library was updated (type change, library rename, folder change, or series deleted) and let the optimization be bypassed.

* Added another optimization which will use just folder attribute of last write time if the drive is not NTFS.

* Fixed a unit test

* Some code cleanup

* Bump versions by dotnet-bump-version.

* Misc UI Fixes (#1450)

* Fixed collection cover images not rendering

* added a try/catch on sending email, so we fail silently if it doesn't send.

* Fixed Go Back not returning to last scroll position due to layoutmode change resetting, despite nothing changing.

* Fixed a bug where when turning between pages on default mode, the height calculations could get skewed.

* Fixed a missing case for card item where it wouldn't show tooltip title for series.

* Bump versions by dotnet-bump-version.

* New Scan Loop Fixes (#1452)

* Refactored ScanSeries to avoid a lot of extra work and fixed a bug where Scan Series would invoke the processing twice.

Refactored the series selection code during process such that we use Localized Name as well, for cases where the original name was changed.

Undid an optimization around Last Write time, since Linux file systems match how NTFS works.

* Fixed part of the query

* Added a NormalizedLocalizedName for quick searching in which a series needs grouping. Reworked scan loop code a bit to ensure we don't do extra work.

Tweaked the widget logic to help display better and not show "Nothing going on here".

* Fixed a bug where archives with ._ files would be counted as valid files, while they are actually just metadata files on Mac's.

* Fixed a broken unit test

* Bump versions by dotnet-bump-version.

* Simplify parent lookup with Directory.GetParent (#1455)

* Simplify parent lookup with Directory.GetParent

* Address comments

* Bump versions by dotnet-bump-version.

* Scan Loop Fixes (#1459)

* Added Last Folder Scanned time to series info modal.

Tweaked the info event detail modal to have a primary and thus be auto-dismissable

* Added an error event when multiple series are found in processing a series.

* Fixed a bug where a series could get stuck with other series due to a bad select query.

Started adding the force flag hook for the UI and designing the confirm.

Confirm service now also has ability to hide the close button.

Updated error events and logging in the loop, to be more informative

* Fixed a bug where confirm service wasn't showing the proper body content.

* Hooked up force scan series

* refresh metadata now has force update

* Fixed up the messaging with the prompt on scan, hooked it up properly in the scan library to avoid the check if the whole library needs to even be scanned. Fixed a bug where NormalizedLocalizedName wasn't being calculated on new entities.

Started adding unit tests for this problematic repo method.

* Fixed a bug where we updated NormalizedLocalizedName before we set it.

* Send an info to the UI when series are spread between multiple library level folders.

* Added some logger output when there are no files found in a folder. Return early if there are no files found, so we can avoid some small loops of code.

* Fixed an issue where multiple series in a folder with localized series would cause unintended grouping. This is not supported and hence we will warn them and allow the bad grouping.

* Added a case where scan series fails due to the folder being removed. We will now log an error

* Normalize paths when finding the highest directory till root.

* Fixed an issue with Scan Series where changing a series' folder to a different path but the original series folder existed with another series in it, would cause the series to not be deleted.

* Fixed some bugs around specials causing a series merge issue on scan series.

* Removed a bug marker

* Cleaned up some of the scan loop and removed a test I don't need.

* Remove any prompts for force flow, it doesn't work well. Leave the API as is though.

* Fixed up a check for duplicate ScanLibrary calls

* Bump versions by dotnet-bump-version.

* Scroll Resume (#1460)

* When we navigate from a page then back, resume back on the last scroll key (if clicked)

* Resume jump key position when navigating back to a page. Removed some extra blank space on collection detail when a collection doesn't have a summary or cover image.

* Ignore progress events on series cards

* Added a url to swagger for /, which could be reverse proxy url

* Bump versions by dotnet-bump-version.

* Misc UI fixes (#1461)

* Misc fixes

- Fixed modal being stretched when not needed.
- Fixed Logo vertical align
- Fixed drawer content scroll, and from it being squished due to overridden by bootstrap.

* series detail cover image stretch fix

- Fixes: Fixes series detail cover image being stretched on larger resolutions

* fixing empty lists scrollbar

* Fixing want to read error

* fixing unnecessary scrollbar

* Fixing recently updated tooltip

* Bump versions by dotnet-bump-version.

* Folder Watching (#1467)

* Hooked in a server setting to enable/disable folder watching

* Validated the file rename change event

* Validated delete file works

* Tweaked some logic to determine if a change occurs on a folder or a file.

* Added a note for an upcoming branch

* Some minor changes in the loop that just shift where code runs.

* Implemented ScanFolder api

* Ensure we restart watchers when we modify a library folder.

* Fixed a unit test

* Bump versions by dotnet-bump-version.

* More Scan Loop Bugfixes (#1471)

* Updated scan time for watcher to 30 seconds for non-dev. Moved ScanFolder off the Scan queue as it doesn't need to be there. Updated loggers

* Fixed jumpbar missing

* Tweaked the messaging for CoverGen

* When we return early due to nothing being done on library and series scan, make sure we kick off other tasks that need to occur.

* Fixed a foreign constraint issue on Volumes when we were adding to a new series.

* Fixed a case where when picking normalized series, capitalization differences wouldn't stack when they should.

* Reduced the logging output on dev and prod settings.

* Fixed a bug in the code that finds the highest directory from a file, where we were not checking against a normalized path.

* Cleaned up some code

* Fixed broken unit tests

* Bump versions by dotnet-bump-version.

* More Scan Loop Fixes (#1473)

* Added a ToList() to avoid a bug where a person could be removed from a list while iterating over the list.

* When deleting a series, want to read page will now automatically remove that series from the view.

* Fixed a series lookup which was ignoring format

* Ignore XML comment warnings

* Removed a note since it was already working that way

* Fixed unit test

* Bump versions by dotnet-bump-version.

* Misc UI Fixes (#1477)

* Tweaked a Migration to log correctly only if something is going to be done.

* Refactored Reading List Controller code into a dedicated service and cleaned up some methods that aren't needed anymore.

* Fixed a bug where adding a new item to a reading list wasn't adding it at the end.

* Fixed an issue where collection page would re-render the same covers on multiple items.

* Fixed a missing margin-top which made the page extras drawer not render correctly and hence unclosable on small screens.

* Added some timeout on manage users screen to give data time to flush.

Added a dedicated token log for account flows, in case url encoding plays a part (but from testing it doesn't).

* Reverted back to building for ES6 instead of es2020 for old Safari 12.5.5 browsers (10MB difference in build size).

* Cleaned up the logic in removing series not found during scan loop.

* Tweaked the timings for Library Watcher to 1 min and reprocess queue every 30 seconds.

* Bump versions by dotnet-bump-version.

* Added fixes for libvips (#1479)

* Bump versions by dotnet-bump-version.

* Tachiyomi + Fixes (#1481)

* Fixed a bootstrap bug

* Fixed repeating images on collection detail

* Fixed up some logic in library watcher which wasn't processing all of the queue.

* When parsing non-epubs in Book library, use Manga parsing for Volume support to better support Light Novels

* Fixed some bugs with the tachiyomi plugin api's for progress tracking

* Bump versions by dotnet-bump-version.

* Adding Health controller (#1480)

* Adding Health controller

- Added: Added API endpoint for a health check to streamline docker healthy status.

* review comment fixes

* Bump versions by dotnet-bump-version.

* Simplify Folder Watcher (#1484)

* Refactored Library Watcher to use Hangfire under the hood.

* Support .kavitaignore at root level.

* Refactored a lot of the library watching code to process faster and handle when FileSystemWatcher runs out of internal buffer space. It's still not perfect, but good enough for basic use.

* Make folder watching as experimental and default it to off by default.

* Revert #1479

* Tweaked the messaging for OPDS to remove a note about download role.

Moved some code closer to where it's used.

* Cleaned up how the events widget reports

* Fixed a null issue when deleting series in the UI

* Cleaned up some debug code

* Added more information for when we skip a scan

* Cleaned up some logging messages in CoverGen tasks

* More log message tweaks

* Added some debug to help identify a rare issue

* Fixed a bug where save bookmarks as webp could get reset to false when saving other server settings

* Updated some documentation on library watcher.

* Make LibraryWatcher fire every 5 mins

* Bump versions by dotnet-bump-version.

* Sort series by chapter number only when some chapters have no volume (#1487)

* Sort series by chapter number only when some chapters have no volume information

* Implement a Default static instance of ChapterSortComparer

* Further use Default static Comparers

* Add missing ToLit() as per comments

* SQLite Hangfire  (#1488)

* Update to use SQLIte for Hangfire to retain information on tasks

* Updated all external links to have noopener noreferrer

* When watching folders, ensure the folders exist before creating watchers.

* Tweaked the messaging for Email Service and added link to the project.

* Bump versions by dotnet-bump-version.

* Bump versions by dotnet-bump-version.

* Fixed typeahead not working correctly (#1490)

* Bump versions by dotnet-bump-version.

* Release Testing Day 1 (#1491)

* Fixed a bug where typeahead wouldn't automatically show results on relationship screen without an additional click.

* Tweaked the code which checks if a modification occured to check on seconds rather than minutes

* Clear cache will now clear temp/ directory as well.

* Fixed an issue where Chrome was caching api responses when it shouldn't had.

* Added a cleanup temp code

* Ensure genres get removed during series scan when removed from metadata.

* Fixed a bug where all epubs with a volume would show as Volume 0 in reading list

* When a scan is in progress, don't let the user delete the library.

* Bump versions by dotnet-bump-version.

* Scan Loop Last Write Time Change (#1492)

* Refactored invite user flow to separate error handling on create user flow and email flow. This should help users that have unique situations.

* Switch to using files to check LastWriteTime. Debug code in for Robbie to test on rclone

* Updated Parser namespace. Changed the LastWriteTime to check all files and folders.

* Bump versions by dotnet-bump-version.

* Release Testing Day 2 (#1493)

* Added a no data section to collection detail.

* Remove an optimization for skipping the whole library scan as it wasn't reliable

* When resetting password, ensure the input is colored correctly

* Fixed setting new password after resetting, throwing an error despite it actually being successful.

Fixed incorrect messaging for Password Reset page.

* Fixed a bug where reset password would show the side nav button and skew the page.

Updated a lot of references to use Typed version for formcontrols.

* Removed a migration from 0.5.0, 6 releases ago.

* Added a null check so we don't throw an exception when connecting with signalR on unauthenticated users.

* Bump versions by dotnet-bump-version.

* Fixed a bug where a series with a relationship couldn't be deleted. (#1495)

* Bump versions by dotnet-bump-version.

* Release Testing Day 3 (#1496)

* Tweaked log messaging for library scan when no files were scanned.

* When a theme that is set gets removed due to a scan, inform the user to refresh.

* Fixed a typo and make Darkness -> Brightness

* Make download theme files allowed to be invoked by non-authenticated users, to allow new users to get the default theme.

* Hide all series side nav item if there are no libraries exposed to the user

* Fixed an API for Tachiyomi when syncing progress

* Fixed dashboard not responding to Series Removed and Added events.

Ensure we send SeriesRemoved events when they are deleted.

* Reverted Hangfire SQLite due to aborted jobs being resumed, when they shouldnt. Fixed some scan loop issues where cover gen wouldn't be invoked always on new libraries.

* Bump versions by dotnet-bump-version.

* Updating series detail cover style (#1498)

# FIxed
- Fixed: Fixed an issue with series detail cover when scaled down.

* Bump versions by dotnet-bump-version.

* Version bump

* v0.5.6 Release (#1499)

Co-authored-by: tjarls <tjarls@gmail.com>
Co-authored-by: Robbie Davis <robbie@therobbiedavis.com>
Co-authored-by: Chris Plaatjes <kizaing@gmail.com>
2022-09-02 05:52:51 -07:00

1090 lines
45 KiB
C#
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

using System;
using System.Collections.Immutable;
using System.IO;
using System.Linq;
using System.Text.RegularExpressions;
using API.Entities.Enums;
namespace API.Services.Tasks.Scanner.Parser
{
public static class Parser
{
public const string DefaultChapter = "0";
public const string DefaultVolume = "0";
private static readonly TimeSpan RegexTimeout = TimeSpan.FromMilliseconds(500);
public const string ImageFileExtensions = @"^(\.png|\.jpeg|\.jpg|\.webp|\.gif)";
public const string ArchiveFileExtensions = @"\.cbz|\.zip|\.rar|\.cbr|\.tar.gz|\.7zip|\.7z|\.cb7|\.cbt";
private const string BookFileExtensions = @"\.epub|\.pdf";
public const string MacOsMetadataFileStartsWith = @"._";
public const string SupportedExtensions =
ArchiveFileExtensions + "|" + ImageFileExtensions + "|" + BookFileExtensions;
private const RegexOptions MatchOptions =
RegexOptions.IgnoreCase | RegexOptions.Compiled | RegexOptions.CultureInvariant;
/// <summary>
/// Matches against font-family css syntax. Does not match if url import has data: starting, as that is binary data
/// </summary>
/// <remarks>See here for some examples https://developer.mozilla.org/en-US/docs/Web/CSS/@font-face</remarks>
public static readonly Regex FontSrcUrlRegex = new Regex(@"(?<Start>(?:src:\s?)?(?:url|local)\((?!data:)" + "(?:[\"']?)" + @"(?!data:))"
+ "(?<Filename>(?!data:)[^\"']+?)" + "(?<End>[\"']?" + @"\);?)",
MatchOptions, RegexTimeout);
/// <summary>
/// https://developer.mozilla.org/en-US/docs/Web/CSS/@import
/// </summary>
public static readonly Regex CssImportUrlRegex = new Regex("(@import\\s([\"|']|url\\([\"|']))(?<Filename>[^'\"]+)([\"|']\\)?);",
MatchOptions | RegexOptions.Multiline, RegexTimeout);
/// <summary>
/// Misc css image references, like background-image: url(), border-image, or list-style-image
/// </summary>
/// Original prepend: (background|border|list-style)-image:\s?)?
public static readonly Regex CssImageUrlRegex = new Regex(@"(url\((?!data:).(?!data:))" + "(?<Filename>(?!data:)[^\"']*)" + @"(.\))",
MatchOptions, RegexTimeout);
private const string XmlRegexExtensions = @"\.xml";
private static readonly Regex ImageRegex = new Regex(ImageFileExtensions,
MatchOptions, RegexTimeout);
private static readonly Regex ArchiveFileRegex = new Regex(ArchiveFileExtensions,
MatchOptions, RegexTimeout);
private static readonly Regex ComicInfoArchiveRegex = new Regex(@"\.cbz|\.cbr|\.cb7|\.cbt",
MatchOptions, RegexTimeout);
private static readonly Regex XmlRegex = new Regex(XmlRegexExtensions,
MatchOptions, RegexTimeout);
private static readonly Regex BookFileRegex = new Regex(BookFileExtensions,
MatchOptions, RegexTimeout);
private static readonly Regex CoverImageRegex = new Regex(@"(?<![[a-z]\d])(?:!?)(?<!back)(?<!back_)(?<!back-)(cover|folder)(?![\w\d])",
MatchOptions, RegexTimeout);
private static readonly Regex NormalizeRegex = new Regex(@"[^\p{L}0-9\+]",
MatchOptions, RegexTimeout);
/// <summary>
/// Recognizes the Special token only
/// </summary>
private static readonly Regex SpecialTokenRegex = new Regex(@"SP\d+",
MatchOptions, RegexTimeout);
private static readonly Regex[] MangaVolumeRegex = new[]
{
// Dance in the Vampire Bund v16-17
new Regex(
@"(?<Series>.*)(\b|_)v(?<Volume>\d+-?\d+)( |_)",
MatchOptions, RegexTimeout),
// NEEDLESS_Vol.4_-Simeon_6_v2[SugoiSugoi].rar
new Regex(
@"(?<Series>.*)(\b|_)(?!\[)(vol\.?)(?<Volume>\d+(-\d+)?)(?!\])",
MatchOptions, RegexTimeout),
// Historys Strongest Disciple Kenichi_v11_c90-98.zip or Dance in the Vampire Bund v16-17
new Regex(
@"(?<Series>.*)(\b|_)(?!\[)v(?<Volume>\d+(-\d+)?)(?!\])",
MatchOptions, RegexTimeout),
// Kodomo no Jikan vol. 10, [dmntsf.net] One Piece - Digital Colored Comics Vol. 20.5-21.5 Ch. 177
new Regex(
@"(?<Series>.*)(\b|_)(vol\.? ?)(?<Volume>\d+(\.\d)?(-\d+)?(\.\d)?)",
MatchOptions, RegexTimeout),
// Killing Bites Vol. 0001 Ch. 0001 - Galactica Scanlations (gb)
new Regex(
@"(vol\.? ?)(?<Volume>\d+(\.\d)?)",
MatchOptions, RegexTimeout),
// Tonikaku Cawaii [Volume 11].cbz
new Regex(
@"(volume )(?<Volume>\d+(\.\d)?)",
MatchOptions, RegexTimeout),
// Tower Of God S01 014 (CBT) (digital).cbz
new Regex(
@"(?<Series>.*)(\b|_|)(S(?<Volume>\d+))",
MatchOptions, RegexTimeout),
// vol_001-1.cbz for MangaPy default naming convention
new Regex(
@"(vol_)(?<Volume>\d+(\.\d)?)",
MatchOptions, RegexTimeout),
// Chinese Volume: 第n卷 -> Volume n, 第n册 -> Volume n, 幽游白书完全版 第03卷 天下 or 阿衰online 第1册
new Regex(
@"第(?<Volume>\d+)(卷|册)",
MatchOptions, RegexTimeout),
// Chinese Volume: 卷n -> Volume n, 册n -> Volume n
new Regex(
@"(卷|册)(?<Volume>\d+)",
MatchOptions, RegexTimeout),
// Korean Volume: 제n권 -> Volume n, n권 -> Volume n, 63권#200.zip -> Volume 63 (no chapter, #200 is just files inside)
new Regex(
@"제?(?<Volume>\d+)권",
MatchOptions, RegexTimeout),
// Korean Season: 시즌n -> Season n,
new Regex(
@"시즌(?<Volume>\d+\-?\d+)",
MatchOptions, RegexTimeout),
// Korean Season: 시즌n -> Season n, n시즌 -> season n
new Regex(
@"(?<Volume>\d+(\-|~)?\d+?)시즌",
MatchOptions, RegexTimeout),
// Korean Season: 시즌n -> Season n, n시즌 -> season n
new Regex(
@"시즌(?<Volume>\d+(\-|~)?\d+?)",
MatchOptions, RegexTimeout),
// Japanese Volume: n巻 -> Volume n
new Regex(
@"(?<Volume>\d+(?:(\-)\d+)?)巻",
MatchOptions, RegexTimeout),
};
private static readonly Regex[] MangaSeriesRegex = new[]
{
// Grand Blue Dreaming - SP02
new Regex(
@"(?<Series>.*)(\b|_|-|\s)(?:sp)\d",
MatchOptions, RegexTimeout),
// [SugoiSugoi]_NEEDLESS_Vol.2_-_Disk_The_Informant_5_[ENG].rar, Yuusha Ga Shinda! - Vol.tbd Chapter 27.001 V2 Infection ①.cbz
new Regex(
@"^(?<Series>.*)( |_)Vol\.?(\d+|tbd)",
MatchOptions, RegexTimeout),
// Mad Chimera World - Volume 005 - Chapter 026.cbz (couldn't figure out how to get Volume negative lookaround working on below regex),
// The Duke of Death and His Black Maid - Vol. 04 Ch. 054.5 - V4 Omake
new Regex(
@"(?<Series>.+?)(\s|_|-)+(?:Vol(ume|\.)?(\s|_|-)+\d+)(\s|_|-)+(?:(Ch|Chapter|Ch)\.?)(\s|_|-)+(?<Chapter>\d+)",
MatchOptions,
RegexTimeout),
// Ichiban_Ushiro_no_Daimaou_v04_ch34_[VISCANS].zip, VanDread-v01-c01.zip
new Regex(
@"(?<Series>.*)(\b|_)v(?<Volume>\d+-?\d*)(\s|_|-)",
MatchOptions,
RegexTimeout),
// Gokukoku no Brynhildr - c001-008 (v01) [TrinityBAKumA], Black Bullet - v4 c17 [batoto]
new Regex(
@"(?<Series>.*)( - )(?:v|vo|c|chapters)\d",
MatchOptions, RegexTimeout),
// Kedouin Makoto - Corpse Party Musume, Chapter 19 [Dametrans].zip
new Regex(
@"(?<Series>.*)(?:, Chapter )(?<Chapter>\d+)",
MatchOptions, RegexTimeout),
// Please Go Home, Akutsu-San! - Chapter 038.5 - Volume Announcement.cbz, My Charms Are Wasted on Kuroiwa Medaka - Ch. 37.5 - Volume Extras
new Regex(
@"(?<Series>.+?)(\s|_|-)(?!Vol)(\s|_|-)((?:Chapter)|(?:Ch\.))(\s|_|-)(?<Chapter>\d+)",
MatchOptions, RegexTimeout),
// [dmntsf.net] One Piece - Digital Colored Comics Vol. 20 Ch. 177 - 30 Million vs 81 Million.cbz
new Regex(
@"(?<Series>.*) (\b|_|-)(vol)\.?(\s|-|_)?\d+",
MatchOptions, RegexTimeout),
// [xPearse] Kyochuu Rettou Volume 1 [English] [Manga] [Volume Scans]
new Regex(
@"(?<Series>.*) (\b|_|-)(vol)(ume)",
MatchOptions,
RegexTimeout),
//Knights of Sidonia c000 (S2 LE BD Omake - BLAME!) [Habanero Scans]
new Regex(
@"(?<Series>.*)(\bc\d+\b)",
MatchOptions, RegexTimeout),
//Tonikaku Cawaii [Volume 11], Darling in the FranXX - Volume 01.cbz
new Regex(
@"(?<Series>.*)(?: _|-|\[|\()\s?vol(ume)?",
MatchOptions, RegexTimeout),
// Momo The Blood Taker - Chapter 027 Violent Emotion.cbz, Grand Blue Dreaming - SP02 Extra (2019) (Digital) (danke-Empire).cbz
new Regex(
@"^(?<Series>(?!Vol).+?)(?:(ch(apter|\.)(\b|_|-|\s))|sp)\d",
MatchOptions, RegexTimeout),
// Historys Strongest Disciple Kenichi_v11_c90-98.zip, Killing Bites Vol. 0001 Ch. 0001 - Galactica Scanlations (gb)
new Regex(
@"(?<Series>.*) (\b|_|-)(v|ch\.?|c|s)\d+",
MatchOptions, RegexTimeout),
// Hinowa ga CRUSH! 018 (2019) (Digital) (LuCaZ).cbz
new Regex(
@"(?<Series>.*)\s+(?<Chapter>\d+)\s+(?:\(\d{4}\))\s",
MatchOptions, RegexTimeout),
// Goblin Slayer - Brand New Day 006.5 (2019) (Digital) (danke-Empire)
new Regex(
@"(?<Series>.*) (-)?(?<Chapter>\d+(?:.\d+|-\d+)?) \(\d{4}\)",
MatchOptions, RegexTimeout),
// Noblesse - Episode 429 (74 Pages).7z
new Regex(
@"(?<Series>.*)(\s|_)(?:Episode|Ep\.?)(\s|_)(?<Chapter>\d+(?:.\d+|-\d+)?)",
MatchOptions, RegexTimeout),
// Akame ga KILL! ZERO (2016-2019) (Digital) (LuCaZ)
new Regex(
@"(?<Series>.*)\(\d",
MatchOptions, RegexTimeout),
// Tonikaku Kawaii (Ch 59-67) (Ongoing)
new Regex(
@"(?<Series>.*)(\s|_)\((c\s|ch\s|chapter\s)",
MatchOptions, RegexTimeout),
// Fullmetal Alchemist chapters 101-108
new Regex(
@"(?<Series>.+?)(\s|_|\-)+?chapters(\s|_|\-)+?\d+(\s|_|\-)+?",
MatchOptions, RegexTimeout),
// It's Witching Time! 001 (Digital) (Anonymous1234)
new Regex(
@"(?<Series>.+?)(\s|_|\-)+?\d+(\s|_|\-)\(",
MatchOptions, RegexTimeout),
//Ichinensei_ni_Nacchattara_v01_ch01_[Taruby]_v1.1.zip must be before [Suihei Kiki]_Kasumi_Otoko_no_Ko_[Taruby]_v1.1.zip
// due to duplicate version identifiers in file.
new Regex(
@"(?<Series>.*)(v|s)\d+(-\d+)?(_|\s)",
MatchOptions, RegexTimeout),
//[Suihei Kiki]_Kasumi_Otoko_no_Ko_[Taruby]_v1.1.zip
new Regex(
@"(?<Series>.*)(v|s)\d+(-\d+)?",
MatchOptions, RegexTimeout),
// Black Bullet (This is very loose, keep towards bottom)
new Regex(
@"(?<Series>.*)(_)(v|vo|c|volume)( |_)\d+",
MatchOptions, RegexTimeout),
// [Hidoi]_Amaenaideyo_MS_vol01_chp02.rar
new Regex(
@"(?<Series>.*)( |_)(vol\d+)?( |_)(?:Chp\.? ?\d+)",
MatchOptions, RegexTimeout),
// Mahoutsukai to Deshi no Futekisetsu na Kankei Chp. 1
new Regex(
@"(?<Series>.*)( |_)(?:Chp.? ?\d+)",
MatchOptions, RegexTimeout),
// Corpse Party -The Anthology- Sachikos game of love Hysteric Birthday 2U Chapter 01
new Regex(
@"^(?!Vol)(?<Series>.*)( |_)Chapter( |_)(\d+)",
MatchOptions, RegexTimeout),
// Fullmetal Alchemist chapters 101-108.cbz
new Regex(
@"^(?!vol)(?<Series>.*)( |_)(chapters( |_)?)\d+-?\d*",
MatchOptions, RegexTimeout),
// Umineko no Naku Koro ni - Episode 1 - Legend of the Golden Witch #1
new Regex(
@"^(?!Vol\.?)(?<Series>.*)( |_|-)(?<!-)(episode|chapter|(ch\.?) ?)\d+-?\d*",
MatchOptions, RegexTimeout),
// Baketeriya ch01-05.zip
new Regex(
@"^(?!Vol)(?<Series>.*)ch\d+-?\d?",
MatchOptions, RegexTimeout),
// Magi - Ch.252-005.cbz
new Regex(
@"(?<Series>.*)( ?- ?)Ch\.\d+-?\d*",
MatchOptions, RegexTimeout),
// [BAA]_Darker_than_Black_Omake-1.zip
new Regex(
@"^(?!Vol)(?<Series>.*)(-)\d+-?\d*", // This catches a lot of stuff ^(?!Vol)(?<Series>.*)( |_)(\d+)
MatchOptions, RegexTimeout),
// Kodoja #001 (March 2016)
new Regex(
@"(?<Series>.*)(\s|_|-)#",
MatchOptions, RegexTimeout),
// Baketeriya ch01-05.zip, Akiiro Bousou Biyori - 01.jpg, Beelzebub_172_RHS.zip, Cynthia the Mission 29.rar, A Compendium of Ghosts - 031 - The Third Story_ Part 12 (Digital) (Cobalt001)
new Regex(
@"^(?!Vol\.?)(?!Chapter)(?<Series>.+?)(\s|_|-)(?<!-)(ch|chapter)?\.?\d+-?\d*",
MatchOptions, RegexTimeout),
// [BAA]_Darker_than_Black_c1 (This is very greedy, make sure it's close to last)
new Regex(
@"^(?!Vol)(?<Series>.*)( |_|-)(ch?)\d+",
MatchOptions, RegexTimeout),
// Japanese Volume: n巻 -> Volume n
new Regex(
@"(?<Series>.+?)第(?<Volume>\d+(?:(\-)\d+)?)巻",
MatchOptions, RegexTimeout),
};
private static readonly Regex[] ComicSeriesRegex = new[]
{
// Tintin - T22 Vol 714 pour Sydney
new Regex(
@"(?<Series>.+?)\s?(\b|_|-)\s?((vol|tome|t)\.?)(?<Volume>\d+(-\d+)?)",
MatchOptions, RegexTimeout),
// Invincible Vol 01 Family matters (2005) (Digital)
new Regex(
@"(?<Series>.+?)(\b|_)((vol|tome|t)\.?)(\s|_)(?<Volume>\d+(-\d+)?)",
MatchOptions, RegexTimeout),
// Batman Beyond 2.0 001 (2013)
new Regex(
@"^(?<Series>.+?\S\.\d) (?<Chapter>\d+)",
MatchOptions, RegexTimeout),
// 04 - Asterix the Gladiator (1964) (Digital-Empire) (WebP by Doc MaKS)
new Regex(
@"^(?<Volume>\d+)\s(-\s|_)(?<Series>.*(\d{4})?)( |_)(\(|\d+)",
MatchOptions, RegexTimeout),
// 01 Spider-Man & Wolverine 01.cbr
new Regex(
@"^(?<Volume>\d+)\s(?:-\s)(?<Series>.*) (\d+)?",
MatchOptions, RegexTimeout),
// Batman & Wildcat (1 of 3)
new Regex(
@"(?<Series>.*(\d{4})?)( |_)(?:\((?<Volume>\d+) of \d+)",
MatchOptions, RegexTimeout),
// Teen Titans v1 001 (1966-02) (digital) (OkC.O.M.P.U.T.O.-Novus), Aldebaran-Antares-t6
new Regex(
@"^(?<Series>.+?)(?: |_|-)(v|t)\d+",
MatchOptions, RegexTimeout),
// Amazing Man Comics chapter 25
new Regex(
@"^(?<Series>.+?)(?: |_)c(hapter) \d+",
MatchOptions, RegexTimeout),
// Amazing Man Comics issue #25
new Regex(
@"^(?<Series>.+?)(?: |_)i(ssue) #\d+",
MatchOptions, RegexTimeout),
// Batman Wayne Family Adventures - Ep. 001 - Moving In
new Regex(
@"^(?<Series>.+?)(\s|_|-)(?:Ep\.?)(\s|_|-)+\d+",
MatchOptions, RegexTimeout),
// Batgirl Vol.2000 #57 (December, 2004)
new Regex(
@"^(?<Series>.+?)Vol\.?\s?#?(?:\d+)",
MatchOptions, RegexTimeout),
// Batman & Robin the Teen Wonder #0
new Regex(
@"^(?<Series>.*)(?: |_)#\d+",
MatchOptions, RegexTimeout),
// Batman & Catwoman - Trail of the Gun 01, Batman & Grendel (1996) 01 - Devil's Bones, Teen Titans v1 001 (1966-02) (digital) (OkC.O.M.P.U.T.O.-Novus)
new Regex(
@"^(?<Series>.+?)(?: \d+)",
MatchOptions, RegexTimeout),
// Scott Pilgrim 02 - Scott Pilgrim vs. The World (2005)
new Regex(
@"^(?<Series>.+?)(?: |_)(?<Chapter>\d+)",
MatchOptions, RegexTimeout),
// The First Asterix Frieze (WebP by Doc MaKS)
new Regex(
@"^(?<Series>.*)(?: |_)(?!\(\d{4}|\d{4}-\d{2}\))\(",
MatchOptions, RegexTimeout),
// spawn-123, spawn-chapter-123 (from https://github.com/Girbons/comics-downloader)
new Regex(
@"^(?<Series>.+?)-(chapter-)?(?<Chapter>\d+)",
MatchOptions, RegexTimeout),
// MUST BE LAST: Batman & Daredevil - King of New York
new Regex(
@"^(?<Series>.*)",
MatchOptions, RegexTimeout),
};
private static readonly Regex[] ComicVolumeRegex = new[]
{
// Teen Titans v1 001 (1966-02) (digital) (OkC.O.M.P.U.T.O.-Novus)
new Regex(
@"^(?<Series>.*)(?: |_)(t|v)(?<Volume>\d+)",
MatchOptions, RegexTimeout),
// Batgirl Vol.2000 #57 (December, 2004)
new Regex(
@"^(?<Series>.+?)(?:\s|_)(v|vol|tome|t)\.?(\s|_)?(?<Volume>\d+)",
MatchOptions, RegexTimeout),
// Chinese Volume: 第n卷 -> Volume n, 第n册 -> Volume n, 幽游白书完全版 第03卷 天下 or 阿衰online 第1册
new Regex(
@"第(?<Volume>\d+)(卷|册)",
MatchOptions, RegexTimeout),
// Chinese Volume: 卷n -> Volume n, 册n -> Volume n
new Regex(
@"(卷|册)(?<Volume>\d+)",
MatchOptions, RegexTimeout),
// Korean Volume: 제n권 -> Volume n, n권 -> Volume n, 63권#200.zip
new Regex(
@"제?(?<Volume>\d+)권",
MatchOptions, RegexTimeout),
// Japanese Volume: n巻 -> Volume n
new Regex(
@"(?<Volume>\d+(?:(\-)\d+)?)巻",
MatchOptions, RegexTimeout),
};
private static readonly Regex[] ComicChapterRegex = new[]
{
// Batman & Wildcat (1 of 3)
new Regex(
@"(?<Series>.*(\d{4})?)( |_)(?:\((?<Chapter>\d+) of \d+)",
MatchOptions, RegexTimeout),
// Batman Beyond 04 (of 6) (1999)
new Regex(
@"(?<Series>.+?)(?<Chapter>\d+)(\s|_|-)?\(of",
MatchOptions, RegexTimeout),
// Batman Beyond 2.0 001 (2013)
new Regex(
@"^(?<Series>.+?\S\.\d) (?<Chapter>\d+)",
MatchOptions, RegexTimeout),
// Teen Titans v1 001 (1966-02) (digital) (OkC.O.M.P.U.T.O.-Novus)
new Regex(
@"^(?<Series>.+?)(?: |_)v(?<Volume>\d+)(?: |_)(c? ?)(?<Chapter>(\d+(\.\d)?)-?(\d+(\.\d)?)?)(c? ?)",
MatchOptions, RegexTimeout),
// Batman & Robin the Teen Wonder #0
new Regex(
@"^(?<Series>.+?)(?:\s|_)#(?<Chapter>\d+)",
MatchOptions, RegexTimeout),
// Batman 2016 - Chapter 01, Batman 2016 - Issue 01, Batman 2016 - Issue #01
new Regex(
@"^(?<Series>.+?)((c(hapter)?)|issue)(_|\s)#?(?<Chapter>(\d+(\.\d)?)-?(\d+(\.\d)?)?)",
MatchOptions, RegexTimeout),
// Invincible 070.5 - Invincible Returns 1 (2010) (digital) (Minutemen-InnerDemons).cbr
new Regex(
@"^(?<Series>.+?)(?:\s|_)(c? ?(chapter)?)(?<Chapter>(\d+(\.\d)?)-?(\d+(\.\d)?)?)(c? ?)-",
MatchOptions, RegexTimeout),
// Batgirl Vol.2000 #57 (December, 2004)
new Regex(
@"^(?<Series>.+?)(?:vol\.?\d+)\s#(?<Chapter>\d+)",
MatchOptions,
RegexTimeout),
// Batman & Catwoman - Trail of the Gun 01, Batman & Grendel (1996) 01 - Devil's Bones, Teen Titans v1 001 (1966-02) (digital) (OkC.O.M.P.U.T.O.-Novus)
new Regex(
@"^(?<Series>.+?)(?: (?<Chapter>\d+))",
MatchOptions, RegexTimeout),
// Saga 001 (2012) (Digital) (Empire-Zone)
new Regex(
@"(?<Series>.+?)(?: |_)(c? ?)(?<Chapter>(\d+(\.\d)?)-?(\d+(\.\d)?)?)\s\(\d{4}",
MatchOptions, RegexTimeout),
// Amazing Man Comics chapter 25
new Regex(
@"^(?!Vol)(?<Series>.+?)( |_)c(hapter)( |_)(?<Chapter>\d*)",
MatchOptions, RegexTimeout),
// Amazing Man Comics issue #25
new Regex(
@"^(?!Vol)(?<Series>.+?)( |_)i(ssue)( |_) #(?<Chapter>\d*)",
MatchOptions, RegexTimeout),
// spawn-123, spawn-chapter-123 (from https://github.com/Girbons/comics-downloader)
new Regex(
@"^(?<Series>.+?)-(chapter-)?(?<Chapter>\d+)",
MatchOptions, RegexTimeout),
};
private static readonly Regex[] ReleaseGroupRegex = new[]
{
// [TrinityBAKumA Finella&anon], [BAA]_, [SlowManga&OverloadScans], [batoto]
new Regex(@"(?:\[(?<subgroup>(?!\s).+?(?<!\s))\](?:_|-|\s|\.)?)",
MatchOptions, RegexTimeout),
// (Shadowcat-Empire),
// new Regex(@"(?:\[(?<subgroup>(?!\s).+?(?<!\s))\](?:_|-|\s|\.)?)",
// MatchOptions),
};
private static readonly Regex[] MangaChapterRegex = new[]
{
// Historys Strongest Disciple Kenichi_v11_c90-98.zip, ...c90.5-100.5
new Regex(
@"(\b|_)(c|ch)(\.?\s?)(?<Chapter>(\d+(\.\d)?)-?(\d+(\.\d)?)?)",
MatchOptions, RegexTimeout),
// [Suihei Kiki]_Kasumi_Otoko_no_Ko_[Taruby]_v1.1.zip
new Regex(
@"v\d+\.(?<Chapter>\d+(?:.\d+|-\d+)?)",
MatchOptions, RegexTimeout),
// Umineko no Naku Koro ni - Episode 3 - Banquet of the Golden Witch #02.cbz (Rare case, if causes issue remove)
new Regex(
@"^(?<Series>.*)(?: |_)#(?<Chapter>\d+)",
MatchOptions, RegexTimeout),
// Green Worldz - Chapter 027, Kimi no Koto ga Daidaidaidaidaisuki na 100-nin no Kanojo Chapter 11-10
new Regex(
@"^(?!Vol)(?<Series>.*)\s?(?<!vol\. )\sChapter\s(?<Chapter>\d+(?:\.?[\d-]+)?)",
MatchOptions, RegexTimeout),
// Hinowa ga CRUSH! 018 (2019) (Digital) (LuCaZ).cbz, Hinowa ga CRUSH! 018.5 (2019) (Digital) (LuCaZ).cbz
new Regex(
@"^(?!Vol)(?<Series>.+?)(?<!Vol)(?<!Vol.)\s(\d\s)?(?<Chapter>\d+(?:\.\d+|-\d+)?)(?:\s\(\d{4}\))?(\b|_|-)",
MatchOptions, RegexTimeout),
// Tower Of God S01 014 (CBT) (digital).cbz
new Regex(
@"(?<Series>.*)\sS(?<Volume>\d+)\s(?<Chapter>\d+(?:.\d+|-\d+)?)",
MatchOptions, RegexTimeout),
// Beelzebub_01_[Noodles].zip, Beelzebub_153b_RHS.zip
new Regex(
@"^((?!v|vo|vol|Volume).)*(\s|_)(?<Chapter>\.?\d+(?:.\d+|-\d+)?)(?<Part>b)?(\s|_|\[|\()",
MatchOptions, RegexTimeout),
// Yumekui-Merry_DKThias_Chapter21.zip
new Regex(
@"Chapter(?<Chapter>\d+(-\d+)?)", //(?:.\d+|-\d+)?
MatchOptions, RegexTimeout),
// [Hidoi]_Amaenaideyo_MS_vol01_chp02.rar
new Regex(
@"(?<Series>.*)(\s|_)(vol\d+)?(\s|_)Chp\.? ?(?<Chapter>\d+)",
MatchOptions, RegexTimeout),
// Vol 1 Chapter 2
new Regex(
@"(?<Volume>((vol|volume|v))?(\s|_)?\.?\d+)(\s|_)(Chp|Chapter)\.?(\s|_)?(?<Chapter>\d+)",
MatchOptions, RegexTimeout),
// Chinese Chapter: 第n话 -> Chapter n, 【TFO汉化&Petit汉化】迷你偶像漫画第25话
new Regex(
@"第(?<Chapter>\d+)话",
MatchOptions, RegexTimeout),
// Korean Chapter: 제n화 -> Chapter n, 가디언즈 오브 갤럭시 죽음의 보석.E0008.7화#44
new Regex(
@"제?(?<Chapter>\d+\.?\d+)(화|장)",
MatchOptions, RegexTimeout),
// Korean Chapter: 第10話 -> Chapter n, [ハレム]ナナとカオル 高校生のSMごっこ 第1話
new Regex(
@"第?(?<Chapter>\d+(?:.\d+|-\d+)?)話",
MatchOptions, RegexTimeout),
};
private static readonly Regex[] MangaEditionRegex = {
// Tenjo Tenge {Full Contact Edition} v01 (2011) (Digital) (ASTC).cbz
new Regex(
@"(\b|_)(?<Edition>Omnibus(( |_)?Edition)?)(\b|_)?",
MatchOptions, RegexTimeout),
// To Love Ru v01 Uncensored (Ch.001-007)
new Regex(
@"(\b|_)(?<Edition>Uncensored)(\b|_)",
MatchOptions, RegexTimeout),
};
private static readonly Regex[] CleanupRegex =
{
// (), {}, []
new Regex(
@"(?<Cleanup>(\{\}|\[\]|\(\)))",
MatchOptions, RegexTimeout),
// (Complete)
new Regex(
@"(?<Cleanup>(\{Complete\}|\[Complete\]|\(Complete\)))",
MatchOptions, RegexTimeout),
// Anything in parenthesis
new Regex(
@"\(.*\)",
MatchOptions, RegexTimeout),
};
private static readonly Regex[] MangaSpecialRegex =
{
// All Keywords, does not account for checking if contains volume/chapter identification. Parser.Parse() will handle.
new Regex(
@"(?<Special>Specials?|OneShot|One\-Shot|Omake|Extra(?:(\sChapter)?[^\S])|Art Collection|Side( |_)Stories|Bonus)",
MatchOptions, RegexTimeout),
};
private static readonly Regex[] ComicSpecialRegex =
{
// All Keywords, does not account for checking if contains volume/chapter identification. Parser.Parse() will handle.
new Regex(
@"(?<Special>Specials?|OneShot|One\-Shot|\d.+?(\W|_|-)Annual|Annual(\W|_|-)\d.+?|Extra(?:(\sChapter)?[^\S])|Book \d.+?|Compendium \d.+?|Omnibus \d.+?|[_\s\-]TPB[_\s\-]|FCBD \d.+?|Absolute \d.+?|Preview \d.+?|Art Collection|Side(\s|_)Stories|Bonus|Hors Série|(\W|_|-)HS(\W|_|-)|(\W|_|-)THS(\W|_|-))",
MatchOptions, RegexTimeout),
};
private static readonly Regex[] EuropeanComicRegex =
{
// All Keywords, does not account for checking if contains volume/chapter identification. Parser.Parse() will handle.
new Regex(
@"(?<Special>Bd(\s|_|-)Fr)",
MatchOptions, RegexTimeout),
};
// If SP\d+ is in the filename, we force treat it as a special regardless if volume or chapter might have been found.
private static readonly Regex SpecialMarkerRegex = new Regex(
@"(?<Special>SP\d+)",
MatchOptions, RegexTimeout
);
private static readonly Regex EmptySpaceRegex = new Regex(
@"(?!=.+)(\s{2,})(?!=.+)",
MatchOptions, RegexTimeout
);
private static readonly ImmutableArray<string> FormatTagSpecialKeywords = ImmutableArray.Create(
"Special", "Reference", "Director's Cut", "Box Set", "Box-Set", "Annual", "Anthology", "Epilogue",
"One Shot", "One-Shot", "Prologue", "TPB", "Trade Paper Back", "Omnibus", "Compendium", "Absolute", "Graphic Novel",
"GN", "FCBD");
private static readonly char[] LeadingZeroesTrimChars = new[] { '0' };
public static MangaFormat ParseFormat(string filePath)
{
if (IsArchive(filePath)) return MangaFormat.Archive;
if (IsImage(filePath)) return MangaFormat.Image;
if (IsEpub(filePath)) return MangaFormat.Epub;
if (IsPdf(filePath)) return MangaFormat.Pdf;
return MangaFormat.Unknown;
}
public static string ParseEdition(string filePath)
{
foreach (var regex in MangaEditionRegex)
{
var matches = regex.Matches(filePath);
foreach (var group in matches.Select(match => match.Groups["Edition"])
.Where(group => group.Success && group != Match.Empty))
{
return group.Value
.Replace("{", "").Replace("}", "")
.Replace("[", "").Replace("]", "")
.Replace("(", "").Replace(")", "");
}
}
return string.Empty;
}
/// <summary>
/// If the file has SP marker.
/// </summary>
/// <param name="filePath"></param>
/// <returns></returns>
public static bool HasSpecialMarker(string filePath)
{
var matches = SpecialMarkerRegex.Matches(filePath);
return matches.Select(match => match.Groups["Special"])
.Any(group => group.Success && group != Match.Empty);
}
public static string ParseMangaSpecial(string filePath)
{
foreach (var regex in MangaSpecialRegex)
{
var matches = regex.Matches(filePath);
foreach (var group in matches.Select(match => match.Groups["Special"])
.Where(group => group.Success && group != Match.Empty))
{
return group.Value;
}
}
return string.Empty;
}
public static string ParseComicSpecial(string filePath)
{
foreach (var regex in ComicSpecialRegex)
{
var matches = regex.Matches(filePath);
foreach (var group in matches.Select(match => match.Groups["Special"])
.Where(group => group.Success && group != Match.Empty))
{
return group.Value;
}
}
return string.Empty;
}
public static string ParseSeries(string filename)
{
foreach (var regex in MangaSeriesRegex)
{
var matches = regex.Matches(filename);
foreach (var group in matches.Select(match => match.Groups["Series"])
.Where(group => group.Success && group != Match.Empty))
{
return CleanTitle(group.Value);
}
}
return string.Empty;
}
public static string ParseComicSeries(string filename)
{
foreach (var regex in ComicSeriesRegex)
{
var matches = regex.Matches(filename);
foreach (var group in matches.Select(match => match.Groups["Series"])
.Where(group => group.Success && group != Match.Empty))
{
return CleanTitle(group.Value, true);
}
}
return string.Empty;
}
public static string ParseVolume(string filename)
{
foreach (var regex in MangaVolumeRegex)
{
var matches = regex.Matches(filename);
foreach (Match match in matches)
{
if (!match.Groups["Volume"].Success || match.Groups["Volume"] == Match.Empty) continue;
var value = match.Groups["Volume"].Value;
var hasPart = match.Groups["Part"].Success;
return FormatValue(value, hasPart);
}
}
return DefaultVolume;
}
public static string ParseComicVolume(string filename)
{
foreach (var regex in ComicVolumeRegex)
{
var matches = regex.Matches(filename);
foreach (var group in matches.Select(match => match.Groups))
{
if (!group["Volume"].Success || group["Volume"] == Match.Empty) continue;
var value = group["Volume"].Value;
var hasPart = group["Part"].Success;
return FormatValue(value, hasPart);
}
}
return DefaultVolume;
}
private static string FormatValue(string value, bool hasPart)
{
if (!value.Contains('-'))
{
return RemoveLeadingZeroes(hasPart ? AddChapterPart(value) : value);
}
var tokens = value.Split("-");
var from = RemoveLeadingZeroes(tokens[0]);
if (tokens.Length != 2) return from;
var to = RemoveLeadingZeroes(hasPart ? AddChapterPart(tokens[1]) : tokens[1]);
return $"{from}-{to}";
}
public static string ParseChapter(string filename)
{
foreach (var regex in MangaChapterRegex)
{
var matches = regex.Matches(filename);
foreach (Match match in matches)
{
if (!match.Groups["Chapter"].Success || match.Groups["Chapter"] == Match.Empty) continue;
var value = match.Groups["Chapter"].Value;
var hasPart = match.Groups["Part"].Success;
return FormatValue(value, hasPart);
}
}
return DefaultChapter;
}
private static string AddChapterPart(string value)
{
if (value.Contains('.'))
{
return value;
}
return $"{value}.5";
}
public static string ParseComicChapter(string filename)
{
foreach (var regex in ComicChapterRegex)
{
var matches = regex.Matches(filename);
foreach (Match match in matches)
{
if (match.Groups["Chapter"].Success && match.Groups["Chapter"] != Match.Empty)
{
var value = match.Groups["Chapter"].Value;
var hasPart = match.Groups["Part"].Success;
return FormatValue(value, hasPart);
}
}
}
return DefaultChapter;
}
private static string RemoveEditionTagHolders(string title)
{
foreach (var regex in CleanupRegex)
{
var matches = regex.Matches(title);
foreach (Match match in matches)
{
if (match.Success)
{
title = title.Replace(match.Value, string.Empty).Trim();
}
}
}
foreach (var regex in MangaEditionRegex)
{
var matches = regex.Matches(title);
foreach (Match match in matches)
{
if (match.Success)
{
title = title.Replace(match.Value, string.Empty).Trim();
}
}
}
return title;
}
private static string RemoveMangaSpecialTags(string title)
{
foreach (var regex in MangaSpecialRegex)
{
var matches = regex.Matches(title);
foreach (var match in matches.Where(m => m.Success))
{
title = title.Replace(match.Value, string.Empty).Trim();
}
}
return title;
}
private static string RemoveEuropeanTags(string title)
{
foreach (var regex in EuropeanComicRegex)
{
var matches = regex.Matches(title);
foreach (var match in matches.Where(m => m.Success))
{
title = title.Replace(match.Value, string.Empty).Trim();
}
}
return title;
}
private static string RemoveComicSpecialTags(string title)
{
foreach (var regex in ComicSpecialRegex)
{
var matches = regex.Matches(title);
foreach (var match in matches.Where(m => m.Success))
{
title = title.Replace(match.Value, string.Empty).Trim();
}
}
return title;
}
/// <summary>
/// Translates _ -> spaces, trims front and back of string, removes release groups
/// <example>
/// Hippos_the_Great [Digital], -> Hippos the Great
/// </example>
/// </summary>
/// <param name="title"></param>
/// <param name="isComic"></param>
/// <returns></returns>
public static string CleanTitle(string title, bool isComic = false)
{
title = RemoveReleaseGroup(title);
title = RemoveEditionTagHolders(title);
title = isComic ? RemoveComicSpecialTags(title) : RemoveMangaSpecialTags(title);
if (isComic)
{
title = RemoveComicSpecialTags(title);
title = RemoveEuropeanTags(title);
}
else
{
title = RemoveMangaSpecialTags(title);
}
title = title.Replace("_", " ").Trim();
if (title.EndsWith("-") || title.EndsWith(","))
{
title = title.Substring(0, title.Length - 1);
}
if (title.StartsWith("-") || title.StartsWith(","))
{
title = title.Substring(1);
}
title = EmptySpaceRegex.Replace(title, " ");
return title.Trim();
}
private static string RemoveReleaseGroup(string title)
{
foreach (var regex in ReleaseGroupRegex)
{
var matches = regex.Matches(title);
foreach (var match in matches.Where(m => m.Success))
{
title = title.Replace(match.Value, string.Empty);
}
}
return title;
}
/// <summary>
/// Pads the start of a number string with 0's so ordering works fine if there are over 100 items.
/// Handles ranges (ie 4-8) -> (004-008).
/// </summary>
/// <param name="number"></param>
/// <returns>A zero padded number</returns>
public static string PadZeros(string number)
{
if (!number.Contains('-')) return PerformPadding(number);
var tokens = number.Split("-");
return $"{PerformPadding(tokens[0])}-{PerformPadding(tokens[1])}";
}
private static string PerformPadding(string number)
{
var num = int.Parse(number);
return num switch
{
< 10 => "00" + num,
< 100 => "0" + num,
_ => number
};
}
public static string RemoveLeadingZeroes(string title)
{
var ret = title.TrimStart(LeadingZeroesTrimChars);
return string.IsNullOrEmpty(ret) ? "0" : ret;
}
public static bool IsArchive(string filePath)
{
return ArchiveFileRegex.IsMatch(Path.GetExtension(filePath));
}
public static bool IsComicInfoExtension(string filePath)
{
return ComicInfoArchiveRegex.IsMatch(Path.GetExtension(filePath));
}
public static bool IsBook(string filePath)
{
return BookFileRegex.IsMatch(Path.GetExtension(filePath));
}
public static bool IsImage(string filePath)
{
return !filePath.StartsWith(".") && ImageRegex.IsMatch(Path.GetExtension(filePath));
}
public static bool IsXml(string filePath)
{
return XmlRegex.IsMatch(Path.GetExtension(filePath));
}
public static float MinNumberFromRange(string range)
{
try
{
if (!Regex.IsMatch(range, @"^[\d-.]+$"))
{
return (float) 0.0;
}
var tokens = range.Replace("_", string.Empty).Split("-");
return tokens.Min(float.Parse);
}
catch
{
return (float) 0.0;
}
}
public static float MaxNumberFromRange(string range)
{
try
{
if (!Regex.IsMatch(range, @"^[\d-.]+$"))
{
return (float) 0.0;
}
var tokens = range.Replace("_", string.Empty).Split("-");
return tokens.Max(float.Parse);
}
catch
{
return (float) 0.0;
}
}
public static string Normalize(string name)
{
return NormalizeRegex.Replace(name, string.Empty).ToLower();
}
/// <summary>
/// Responsible for preparing special title for rendering to the UI. Replaces _ with ' ' and strips out SP\d+
/// </summary>
/// <param name="name"></param>
/// <returns></returns>
public static string CleanSpecialTitle(string name)
{
if (string.IsNullOrEmpty(name)) return name;
var cleaned = SpecialTokenRegex.Replace(name.Replace('_', ' '), string.Empty).Trim();
var lastIndex = cleaned.LastIndexOf('.');
if (lastIndex > 0)
{
cleaned = cleaned.Substring(0, cleaned.LastIndexOf('.')).Trim();
}
return string.IsNullOrEmpty(cleaned) ? name : cleaned;
}
/// <summary>
/// Tests whether the file is a cover image such that: contains "cover", is named "folder", and is an image
/// </summary>
/// <remarks>If the path has "backcover" in it, it will be ignored</remarks>
/// <param name="filename">Filename with extension</param>
/// <returns></returns>
public static bool IsCoverImage(string filename)
{
return IsImage(filename) && CoverImageRegex.IsMatch(filename);
}
/// <summary>
/// Validates that a Path doesn't start with certain blacklisted folders, like __MACOSX, @Recently-Snapshot, etc and that if a full path, the filename
/// doesn't start with ._, which is a metadata file on MACOSX.
/// </summary>
/// <param name="path"></param>
/// <returns></returns>
public static bool HasBlacklistedFolderInPath(string path)
{
return path.Contains("__MACOSX") || path.StartsWith("@Recently-Snapshot") || path.StartsWith("@recycle") || path.StartsWith("._") || Path.GetFileName(path).StartsWith("._") || path.Contains(".qpkg");
}
public static bool IsEpub(string filePath)
{
return Path.GetExtension(filePath).Equals(".epub", StringComparison.InvariantCultureIgnoreCase);
}
public static bool IsPdf(string filePath)
{
return Path.GetExtension(filePath).Equals(".pdf", StringComparison.InvariantCultureIgnoreCase);
}
/// <summary>
/// Cleans an author's name
/// </summary>
/// <remarks>If the author is Last, First, this will not reverse</remarks>
/// <param name="author"></param>
/// <returns></returns>
public static string CleanAuthor(string author)
{
return string.IsNullOrEmpty(author) ? string.Empty : author.Trim();
}
/// <summary>
/// Normalizes the slashes in a path to be <see cref="Path.AltDirectorySeparatorChar"/>
/// </summary>
/// <example>/manga/1\1 -> /manga/1/1</example>
/// <param name="path"></param>
/// <returns></returns>
public static string NormalizePath(string path)
{
return path.Replace(Path.DirectorySeparatorChar, Path.AltDirectorySeparatorChar)
.Replace(@"//", Path.AltDirectorySeparatorChar + string.Empty);
}
/// <summary>
/// Checks against a set of strings to validate if a ComicInfo.Format should receive special treatment
/// </summary>
/// <param name="comicInfoFormat"></param>
/// <returns></returns>
public static bool HasComicInfoSpecial(string comicInfoFormat)
{
return FormatTagSpecialKeywords.Contains(comicInfoFormat);
}
}
}