109 Commits

Author SHA1 Message Date
Joseph Milazzo
d02d2d3cb5
Epub 3.2 Collection Tag support (#308)
* Hooked up logic for collections based on EPUB3.2 Spec and Fixed improper tags in EPUBs since it is XML and we are using HTML to parse it.

* Fixed a bug with src:url url replacing so that it's much cleaner regex
2021-06-15 09:51:37 -05:00
Joseph Milazzo
584348c6ad
Special Marker Changes (#306)
* SP# is now a way to force the file to be a special rather than pushing it into a Specials folder.

* Made it so if there is a Special (for any Parse call), volume and chapters will be ignored.

* Fixed a unit test missing Theory and fixed a regex case
2021-06-14 21:12:37 -05:00
Joseph Milazzo
46b60405b1
Special Markers (#305)
* Removed "Anthology" from being a special parsing keyword as series are being found where "Anthology" is in the series name.

* SP# is now a way to force the file to be a special rather than pushing it into a Specials folder.
2021-06-14 17:35:13 -05:00
Joseph Milazzo
f8aba21acd
Removed "Anthology" from being a special parsing keyword as series are being found where "Anthology" is in the series name. (#304) 2021-06-14 17:21:01 -05:00
Joseph Milazzo
6f124b6f8a
Add try catch on Parser MinimumNumberFromRange in case something weird gets put in here. (#283) 2021-06-07 16:12:07 -05:00
Joseph Milazzo
ce35c4f84a
CB7 Support (#241)
* Added CB7 file extension support
2021-06-01 08:37:46 -05:00
Joseph Milazzo
d7d7f9b529
Collection Support (#234)
* Readme refactored to be more clean and clear, taking inspiration from wiki.js's readme.

* Initial backend for Collections and basic metadata implemented.

* More build flavors for Raspberry Pi users and updated Install since we don't need users to set their own JWT Token Key. Update a typo in appsettings.json file for prod.

* Fixed #224. Sort before getting a First?Last() chatper

* The rough ability to add and get series metadata and tags.

* Fix a bug on getting metadata for when it doesn't exist.

* Fixed a bug where flattening directories with some unique filenames could cause reading order of images to be out of order.

* Added a seed code to ensure all series have SeriesMetdata

* Ensure all instances of opening an epub is using "using" so we don't lock the file. When we have a malformed html file, log the issues and inform the user we can't open the file.

* Book reader now handles @Import "" statements in CSS and inlines the css into css file that references them. This allows for them to be scoped. In addition, if the html or body tag had classes, we now send back a single div with those classes.

* Fixed GetSeriesDtoForCollectionAsync which was not properly returning series

* Implemented cover image for collection tag. Fixed an issue in metadata update call.

* Add check for user access when resolving series for a collection tag. When asking for all tags, if the user is not an admin, only give promotoed tags back.

* Implemented updateTag api

* Implemented the ability to update series the tags have access to.

* Cleanup, sorting, and null check

* More sorting changes

* Ensure we can delete tags when editing a series tags

* Fix order of update to make sure a tag is properly deleted

* Code smells
2021-05-30 17:24:23 -05:00
Joseph Milazzo
77c52717ce
MinimumNumberFromRange exception (#222)
* More regex! Bonus is now a keyword for specials

* Regex enhancement, Sort chapters on next/prev chapter to ensure they always in proper order, and don't set JWT on starup when in development mode.

* Fixes KAVITA-H. Check to ensure non numeric characters are not in range string before attempting to parse a float out.
2021-05-17 09:31:16 -05:00
Joseph Milazzo
308e2b48a0
Bugfixes (#221)
* More regex! Bonus is now a keyword for specials

* Regex enhancement, Sort chapters on next/prev chapter to ensure they always in proper order, and don't set JWT on starup when in development mode.
2021-05-16 16:45:39 -05:00
Joseph Milazzo
2f793af34c
More regex! Bonus is now a keyword for specials (#220) 2021-05-16 13:13:19 -05:00
Joseph Milazzo
03b49a5268
Implemented the ability to change the JWT key on runtime. (#217)
* Implemented the ability to change the JWT key on runtime.

* Added .7z file extension support

* Cleanup

* Added Feathub link

* Code cleanup

* Fixed up a build issue on CI
2021-05-14 08:07:03 -05:00
Joseph Milazzo
beca4a4de5
Bugfix/parser (#214)
* Fixed #211

* Fixed #213. Somehow a + 1 got removed
2021-05-11 15:57:11 -05:00
Joseph Milazzo
c8adaee3eb
Sentry Integration (#212)
* Fixed a parsing case

* Integrated Sentry into the solution with anonymous users. Fixed some parsing issues and added BuildInfo into a separate project.

* Fixed some bad parser regex

* Removed bad reference to NLog

* Cleanup of some files not needed
2021-05-11 14:45:18 -05:00
Joseph Milazzo
e37931b0da
Regex addition (#200) 2021-05-06 16:51:16 -05:00
Joseph Milazzo
9c43833989
Bugfixes/misc (#196)
* Removed an error log statment which wasn't valid. Was showing error when a comicinfo.xml was not found in a directory.

* Fixed #191. Don't overwrite summary information if we already have something set from UI.

* Fixes #192

* Fixed #194 by moving the Take to after the query runs, so we take only distinct series.

* Added another case for Regex parsing for VanDread-v01-c01.zip
2021-05-02 19:46:34 -05:00
Joseph Milazzo
e2e755145c
Book Feedback (#190)
* Remove automatic retry for scanLibraries as if something fails, it wont pass magically. Catch exceptions when opening books for parsing and swallow to ignore the file.

* Delete extra attempts

* Switched to using FirstOrDefault for finding existing series. This will help avoid pointless crashes.

* Updated message when duplicate series are found (not sure how this happens)

* Fixed a negation for deleting volumes where files still exist.

* Implemented the ability to automatically scale the manga reader based on screen size.

* Default to automatic scaling

* Fix an issue where malformed epubs wouldn't be readable due to incorrect keys in the OPF. We now check if key is valid and if not, try to correct it. This makes a page load about a second on malformed books.

* Fixed #176. Refactored the recently added query to be restricted to user's access to libraries.

* Fixed a one off bug with In Progress series

* Implemented the ability to refresh metadata of just a single series directly

* Fixed a parser case where Series c000 (v01) would fail to parse the series

* Fixed #189. In Progress now returns data properly for library access and in multiple libraries.

* Fixed #188 by adding an extra message for bad login and updating UI

* Generate a fallback for table of contents by parsing the toc file (if we can find one)
2021-05-02 10:00:47 -05:00
Joseph Milazzo
a01613f80f
EPUB Support (#178)
* Added book filetype detection and reorganized tests due to size of file

* Added ability to get basic Parse Info from Book and Pages.

* We can now scan books and get them in a library with cover images.

* Take the first image in the epub if the cover isn't set.

* Implemented the ability to unzip the ebup to cache. Implemented a test api to load html files.

* Just some test code to figure out how to approach this.

* Fixed some merge conflicts

* Removed some dead code from merge

* Snapshot: I can now load everything properly into the UI by rewriting the urls before I send them back. I don't notice any lag from this method. It can be optimized further.

* Implemented a way to load the content in the browser not via an iframe.

* Added a note

* Anchor mappings is complete. New anchors are updated so references now resolve to javascript:void() for UI to take care of internally loading and the appropriate page is mapped to it. Anchors that are external have target="_blank" added so they don't force you out of the app and styles are of course inlined.

* Oops i need this

* Table of contents api implemented (rough) and some small enhancements to codebase for books.

* GetBookPageResources now only loads files from within the book. Nested chapter list support and images now use html parsing instead of string parsing.

* Fonts now are remapped to load from endpoint.

* book-resources now uses a key, ensuring the file is in proper format for lookup. Changed chapter list based on structure with one HEADER and nested chapters.

* Properly handle svg resource requests and when there are part anchors that are clickable, make sure we handle them in the UI by adding a kavita-page handler.

* Add Chapter group page even if one isn't set by using first page (without part) from nestedChildren.

* Added extra debug code for issue #163.

* Added new user preferences for books and updated the css so we scope it to our reading section.

* Cleaned up style code

* Implemented ability to save book preferences and some cleanup on existing apis.

* Added an api for checking if a user has read something in a library type before.

* Forgot to make sure the has reading progress is against a user lol.

* Remove cacheservice code for books, sine we use an in-memory method

* Handle svg images as well

* Enhanced cover image extraction to check for a "cover" image if the cover image wasn't set in OPF before falling back to the first image.

* Fixed an issue with special books not properly generating metadata due to not having filename set.

* Cleanup, removed warmup task code from statup/program and changed taskscheduler to schedule tasks on startup only (or if tasks are changed from UI).

* Code cleanup

* Code cleanup

* So much code. Lots of refactors to try to test scanner service. Moved a lot of the queries into Extensions to allow to easier test, even though it's hacky. Support @font-face src:url swaps with ' and ". Source summary information from epubs.

* Well...baseURL needs to come from BE and not from UI lol.

* Adjusted migrations so default values match Entity

* Removed comment

* I think I finally fixed #163! The issue was that when i checked if it had a parserInfo, i wasn't considering that the chapter range might have a - in it (0-6) and so when the code to check if range could parse out a number failed, it treated it like a special and checked range against info's filename.

* Some bugfixes

* Lots of testing, extracting code to make it easier to test. This code is buggy, but fixed a bug where 1) If we changed the normalization code, we would remove the whole db during a scan and 2) We weren't actually removing series properly.

Other than that, code is being extracted to remove duplication and centralize logic.

* More code cleanup and test cleanup to ensure scan loop is working as expected and matches expectaions from tests.

* Cleaned up the code and made it so if I change normalization, which I do in this branch, it wont break existing DBs.

* Some comic parser changes for partial chapter support.

* Added some code for directory service and scanner service along with python code to generate test files (not used yet). Fixed up all the tests.

* Code smells
2021-04-28 16:16:22 -05:00
Joseph Milazzo
09a953be8c
Feature/bugfix and regex (#174)
* Fixed #172

* Fixes #164

* Added a parse test for [Hidoi]_Amaenaideyo_MS_vol01_chp02.rar

* Fix annoying warning about SplitQuery on GetLibraryDtosForUsernameAsync
2021-04-13 14:30:57 -05:00
Joseph Milazzo
d59d60d9ec
Feature/unit tests (#171)
* Removed a duplicate loop that was already done earlier in method.

* Normalize now replaces underscores

* Added more Parser cases, Added test case for SeriesExtension (Name in List), and added MergeNameTest and some TODOs for where tests should go

* Added a test for removal

* Fixed bad merge

Co-authored-by: Andrew Song <asong641@gmail.com>
2021-04-13 10:24:44 -05:00
Joseph Milazzo
6ba00477e7
Cover Image - First and tests (#170)
* Changed how natural sort works to cover more cases

* Changed the name of CoverImage regex for Parser and added more cases.

* Changed how we get result from Task.Run()

* Defer execution of a loop till we really need it and added another TODO for later this iteration.

* Big refactor to cover image code to unify between IOCompression and SharpCompress. Both use methods to find the correct file. This results in one extra loop through entries, but simplifies code signficantly.

In addition, new unit tests for the methods that actually do the logic on choosing cover file and first file.

* Removed dead code

* Added missing doc
2021-04-11 18:15:12 -05:00
Joseph Milazzo
b3ec8e8756
Bugfixes! (#157)
* More cases for parsing regex

* Fixed a bug where chapter cover images weren't being updated due to a missed not.

* Removed a piece of code that was needed for upgrading, since all beta users agreed to wipe db.

* Fixed InProgress to properly respect order and show more recent activity first. Issue is with IEntityDate LastModified not updating in DataContext.

* Updated dependencies to lastest stable.

* LastModified on Volumes wasn't updating, validated it does update when data is changed.

* Fixed #152 - Sorting issue when finding cover image.

* Fixed #151 - Sort files during scan.

* Fixed #161 - Remove files that don't exist from chapters during scan.

* Fixed #155 - Ignore images that start with !, expand cover detection by checking for the word cover as well as folder, and some code cleanup to make code more concise.

* Fixed #153 - Ensure that we persist series name changes and don't override on scanning.

* Fixed a broken unit test
2021-04-06 08:59:44 -05:00
Joseph Milazzo
d3c14863d6
Performance, Scan Loop, Specials, and cleanup (#150)
* More cases for parsing regex

* Fixed a bug where chapter cover images weren't being updated due to a missed not.

* Removed a piece of code that was needed for upgrading, since all beta users agreed to wipe db.

* Fixed InProgress to properly respect order and show more recent activity first. Issue is with IEntityDate LastModified not updating in DataContext.

* Updated dependencies to lastest stable.

* LastModified on Volumes wasn't updating, validated it does update when data is changed.

* Rewrote a check to avoid a small heap object warning.

* Ensure UpdateSeries checks all libraries for unique name.

* Took care of some todos, removed unused imports, on dev go ahead and schedule reoocuring jobs since LiteDB caused the locking issue.

* No Tracking when we aren't using entities.

* Added code to remove abandoned progress rows after a chapter gets deleted.

* RefreshMetadata uses one large query rather than many trips to DB for updating metadata. Significantly faster.

* Fixed a bug where UpdateSeries would always complain about a unique name even when we weren't updating name.

* Files that are linked to a series but can't parse out Vol/Chapter information are properly grouped like other Specials.

* Refresh metadata on UI should call the task directly

* Fixed a bug on updating series to make sure we don't complain if we aren't trying to update the name to an existing name.

* Fixed #142 - Library cards should be sorted.

* Refactored the name of some variables to be more agnostic to comics.

* Implemented ScanLibrary but abandoning it.

* Code Cleanup & removing ScanSeries code.

* Some more tests and new Comparators for natural sorting.

* Fixed #137 - When performing I/O on archives, ignore __MACOSX folders completely.

* Fixed #137 - When performing I/O on archives, ignore __MACOSX folders completely.

* All entities that will show under specials tab should be marked special, rather than just what has a special keyword.

* Don't let specials generate cover images

* Don't let specials generate cover images

* SearchResults should send LocalizedName back since we are searching against it.

* Added some tests around macosx folders found from my actual server.

* Put extra notes about a case where duplicates come about, logger will now tell user about this issue.

* Missed a build issue somehow...

* Some code smells
2021-04-05 08:37:45 -05:00
Joseph Milazzo
a0deafe75b
Parser Enhancement: Fallback to Folder name (#129)
* More cases for parsing regex

* Implemented GetFoldersTillRoot for falling back on parsing when we can't get anything from the filename.

* Implemented a fallback strategy. Not tested on large libraries yet.

* Fallback tested and working great.

* Removed a test case that won't pass and added some trims
2021-03-29 17:37:35 -05:00
Joseph Milazzo
d9246b7351
Parsing Enhancements (#126)
* More cases for parsing regex

* Implemented the ability to parse "Special" keywords.

* Commented out some unit tests

* More parsing cases

* Fixed unit tests

* Fixed typo in build script

* Fixed a bug where if there was a series with same name, but different capitalization, we wouldn't process it's infos.

* Tons of regex updates to handle more cases.

* More regex tweaking to handle as many cases as possible.

* Bad merge caused the comic parser to break. Fixed with some better regex.
2021-03-29 15:15:49 -05:00
Joseph Milazzo
3e031ab458
Lots of Parsing Enhancements (#120)
* More cases for parsing regex

* Implemented the ability to parse "Special" keywords.

* Commented out some unit tests

* More parsing cases

* Fixed unit tests

* Fixed typo in build script
2021-03-28 18:00:05 -05:00
Joseph Milazzo
7e54d332f5
Comic Support (#119)
* Implemented some basic regex for comic support

* Implemented support for comics

* empty filenames, like .test.jpg shouldn't be counted as image types.

* Fixed some regex for Manga's with commas or version tags in parenthesis.
2021-03-28 12:09:42 -05:00
Joseph Milazzo
44c2af88ea Some security issue found in scan. 2021-03-23 14:51:56 -05:00
Joseph Milazzo
d724a8f178 A lot of random changes to try and speed up SharpCompress. 2021-03-23 12:22:50 -05:00
Joseph Milazzo
585e965a85 Fixed some bad test cases that really messed up my codebase. 2021-03-23 12:21:09 -05:00
Joseph Milazzo
b66c6b5714 Fixed some parser unit tests around negative lookaheads 2021-03-23 12:20:30 -05:00
Joseph Milazzo
d543511131 Finished refactoring to SharpCompress. 2021-03-23 12:20:27 -05:00
Joseph Milazzo
a5069158fa Removed tests. For those cases, I was unable to find a good solution. Users will have to manually map or rename. 2021-03-17 17:47:06 -05:00
Joseph Milazzo
52b91a9b92 Public caching causes an issue with cache validation on browser causing images not to be cached correctly. Made private to ensure we get proper images each load. 2021-03-12 18:35:12 -06:00
Joseph Milazzo
265f7dcc8c Implemented ability to generate Series summary from ComicInfo.xml (if present) 2021-02-17 16:41:42 -06:00
Joseph Milazzo
a501e50c98 Clean up and fixed a parsing case. 2021-02-10 12:16:29 -06:00
Joseph Milazzo
40154c8d63 Temp stop point. Rewrote the Scanner service to be much cleaner and slightly more efficient. Code is structured so it can easily be multithreaded. 2021-02-09 15:03:02 -06:00
Joseph Milazzo
39fa750d96 Enhanced the parser to handle more cases and implement some negative lookups when being greedy. 2021-02-08 10:53:59 -06:00
Joseph Milazzo
57f74d3de3 Implemented partial chapter support. Fixed some edge case where if library scan was skipped due to no modififcation on disk, whole library would be removed. Removed above code for testing. 2021-02-07 13:07:07 -06:00
Joseph Milazzo
077e5f798a Lots of cleanup 2021-02-07 12:02:47 -06:00
Joseph Milazzo
e9dfc1bda0 Fixed a bug in IsImage and IsArchive where I was using a contains instead of matching the regex. 2021-02-04 17:39:24 -06:00
Joseph Milazzo
a42e54a078 Lots of work for chapters. This code will be refactored in a chapter rewrite. 2021-01-27 14:14:16 -06:00
Joseph Milazzo
f430595d11 Attempted to Test CacheService, but can't figure it out. 2021-01-26 14:35:50 -06:00
Joseph Milazzo
6b76c8b211 Refactored archive code into a service so that I can write tests for it. 2021-01-26 09:55:15 -06:00
Joseph Milazzo
51d4014e11 Forgot to fix some unit tests. 2021-01-25 16:04:52 -06:00
Joseph Milazzo
fe88467d8b More regex tweaking and use cases for real library. 2021-01-25 14:45:23 -06:00
Joseph Milazzo
7cd0b80ac2 More regex tweaking and use cases for real library. 2021-01-24 14:08:09 -06:00
Joseph Milazzo
8498d25aa7 Fixed some use cases where Edition tags weren't being cleaned up. 2021-01-24 10:57:09 -06:00
Joseph Milazzo
6097a2acf0 Some crazy regex for parsing chapters for poorly named files. 2021-01-24 10:37:02 -06:00
Joseph Milazzo
8683c81361 There is a theme...more regex changes. Moved the logic around parsing and falling back into Parser.Parse() and setup testing for it. 2021-01-24 10:05:53 -06:00
Joseph Milazzo
a315feb569 More Parser tests and more cases! Added ability to parse Editions for Manga (Omnibus, Color, etc). To be stripped from Series if present. Future can be stored on MangaFile. 2021-01-24 08:34:57 -06:00