Kavita/API/Controllers/BookController.cs
Joseph Milazzo a01613f80f
EPUB Support (#178)
* Added book filetype detection and reorganized tests due to size of file

* Added ability to get basic Parse Info from Book and Pages.

* We can now scan books and get them in a library with cover images.

* Take the first image in the epub if the cover isn't set.

* Implemented the ability to unzip the ebup to cache. Implemented a test api to load html files.

* Just some test code to figure out how to approach this.

* Fixed some merge conflicts

* Removed some dead code from merge

* Snapshot: I can now load everything properly into the UI by rewriting the urls before I send them back. I don't notice any lag from this method. It can be optimized further.

* Implemented a way to load the content in the browser not via an iframe.

* Added a note

* Anchor mappings is complete. New anchors are updated so references now resolve to javascript:void() for UI to take care of internally loading and the appropriate page is mapped to it. Anchors that are external have target="_blank" added so they don't force you out of the app and styles are of course inlined.

* Oops i need this

* Table of contents api implemented (rough) and some small enhancements to codebase for books.

* GetBookPageResources now only loads files from within the book. Nested chapter list support and images now use html parsing instead of string parsing.

* Fonts now are remapped to load from endpoint.

* book-resources now uses a key, ensuring the file is in proper format for lookup. Changed chapter list based on structure with one HEADER and nested chapters.

* Properly handle svg resource requests and when there are part anchors that are clickable, make sure we handle them in the UI by adding a kavita-page handler.

* Add Chapter group page even if one isn't set by using first page (without part) from nestedChildren.

* Added extra debug code for issue #163.

* Added new user preferences for books and updated the css so we scope it to our reading section.

* Cleaned up style code

* Implemented ability to save book preferences and some cleanup on existing apis.

* Added an api for checking if a user has read something in a library type before.

* Forgot to make sure the has reading progress is against a user lol.

* Remove cacheservice code for books, sine we use an in-memory method

* Handle svg images as well

* Enhanced cover image extraction to check for a "cover" image if the cover image wasn't set in OPF before falling back to the first image.

* Fixed an issue with special books not properly generating metadata due to not having filename set.

* Cleanup, removed warmup task code from statup/program and changed taskscheduler to schedule tasks on startup only (or if tasks are changed from UI).

* Code cleanup

* Code cleanup

* So much code. Lots of refactors to try to test scanner service. Moved a lot of the queries into Extensions to allow to easier test, even though it's hacky. Support @font-face src:url swaps with ' and ". Source summary information from epubs.

* Well...baseURL needs to come from BE and not from UI lol.

* Adjusted migrations so default values match Entity

* Removed comment

* I think I finally fixed #163! The issue was that when i checked if it had a parserInfo, i wasn't considering that the chapter range might have a - in it (0-6) and so when the code to check if range could parse out a number failed, it treated it like a special and checked range against info's filename.

* Some bugfixes

* Lots of testing, extracting code to make it easier to test. This code is buggy, but fixed a bug where 1) If we changed the normalization code, we would remove the whole db during a scan and 2) We weren't actually removing series properly.

Other than that, code is being extracted to remove duplication and centralize logic.

* More code cleanup and test cleanup to ensure scan loop is working as expected and matches expectaions from tests.

* Cleaned up the code and made it so if I change normalization, which I do in this branch, it wont break existing DBs.

* Some comic parser changes for partial chapter support.

* Added some code for directory service and scanner service along with python code to generate test files (not used yet). Fixed up all the tests.

* Code smells
2021-04-28 16:16:22 -05:00

220 lines
9.5 KiB
C#

using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
using API.DTOs;
using API.Entities.Interfaces;
using API.Extensions;
using API.Interfaces;
using API.Services;
using HtmlAgilityPack;
using Microsoft.AspNetCore.Mvc;
using Microsoft.Extensions.Logging;
using VersOne.Epub;
namespace API.Controllers
{
public class BookController : BaseApiController
{
private readonly ILogger<BookController> _logger;
private readonly IBookService _bookService;
private readonly IUnitOfWork _unitOfWork;
private static readonly string BookApiUrl = "book-resources?file=";
public BookController(ILogger<BookController> logger, IBookService bookService, IUnitOfWork unitOfWork)
{
_logger = logger;
_bookService = bookService;
_unitOfWork = unitOfWork;
}
[HttpGet("{chapterId}/book-info")]
public async Task<ActionResult<string>> GetBookInfo(int chapterId)
{
var chapter = await _unitOfWork.VolumeRepository.GetChapterAsync(chapterId);
var book = await EpubReader.OpenBookAsync(chapter.Files.ElementAt(0).FilePath);
return book.Title;
}
[HttpGet("{chapterId}/book-resources")]
public async Task<ActionResult> GetBookPageResources(int chapterId, [FromQuery] string file)
{
var chapter = await _unitOfWork.VolumeRepository.GetChapterAsync(chapterId);
var book = await EpubReader.OpenBookAsync(chapter.Files.ElementAt(0).FilePath);
var key = BookService.CleanContentKeys(file);
if (!book.Content.AllFiles.ContainsKey(key)) return BadRequest("File was not found in book");
var bookFile = book.Content.AllFiles[key];
var content = await bookFile.ReadContentAsBytesAsync();
Response.AddCacheHeader(content);
var contentType = BookService.GetContentType(bookFile.ContentType);
return File(content, contentType, $"{chapterId}-{file}");
}
[HttpGet("{chapterId}/chapters")]
public async Task<ActionResult<ICollection<BookChapterItem>>> GetBookChapters(int chapterId)
{
// This will return a list of mappings from ID -> pagenum. ID will be the xhtml key and pagenum will be the reading order
// this is used to rewrite anchors in the book text so that we always load properly in FE
var chapter = await _unitOfWork.VolumeRepository.GetChapterAsync(chapterId);
var book = await EpubReader.OpenBookAsync(chapter.Files.ElementAt(0).FilePath);
var mappings = await _bookService.CreateKeyToPageMappingAsync(book);
var navItems = await book.GetNavigationAsync();
var chaptersList = new List<BookChapterItem>();
foreach (var navigationItem in navItems)
{
if (navigationItem.NestedItems.Count > 0)
{
_logger.LogDebug("Header: {Header}", navigationItem.Title);
var nestedChapters = new List<BookChapterItem>();
foreach (var nestedChapter in navigationItem.NestedItems)
{
if (nestedChapter.Link == null) continue;
var key = BookService.CleanContentKeys(nestedChapter.Link.ContentFileName);
if (mappings.ContainsKey(key))
{
nestedChapters.Add(new BookChapterItem()
{
Title = nestedChapter.Title,
Page = mappings[key],
Part = nestedChapter.Link.Anchor ?? string.Empty,
Children = new List<BookChapterItem>()
});
}
}
if (navigationItem.Link == null)
{
var item = new BookChapterItem()
{
Title = navigationItem.Title,
Children = nestedChapters
};
if (nestedChapters.Count > 0)
{
item.Page = nestedChapters[0].Page;
}
chaptersList.Add(item);
}
else
{
var groupKey = BookService.CleanContentKeys(navigationItem.Link.ContentFileName);
if (mappings.ContainsKey(groupKey))
{
chaptersList.Add(new BookChapterItem()
{
Title = navigationItem.Title,
Page = mappings[groupKey],
Children = nestedChapters
});
}
}
}
}
return Ok(chaptersList);
}
[HttpGet("{chapterId}/book-page")]
public async Task<ActionResult<string>> GetBookPage(int chapterId, [FromQuery] int page)
{
var chapter = await _unitOfWork.VolumeRepository.GetChapterAsync(chapterId);
var book = await EpubReader.OpenBookAsync(chapter.Files.ElementAt(0).FilePath);
var mappings = await _bookService.CreateKeyToPageMappingAsync(book);
var counter = 0;
var doc = new HtmlDocument();
var baseUrl = Request.Scheme + "://" + Request.Host + Request.PathBase + "/api/";
var apiBase = baseUrl + "book/" + chapterId + "/" + BookApiUrl;
var bookPages = await book.GetReadingOrderAsync();
foreach (var contentFileRef in bookPages)
{
if (page == counter)
{
var content = await contentFileRef.ReadContentAsync();
if (contentFileRef.ContentType != EpubContentType.XHTML_1_1) return Ok(content);
doc.LoadHtml(content);
var body = doc.DocumentNode.SelectSingleNode("/html/body");
var inlineStyles = doc.DocumentNode.SelectNodes("//style");
if (inlineStyles != null)
{
foreach (var inlineStyle in inlineStyles)
{
var styleContent = await _bookService.ScopeStyles(inlineStyle.InnerHtml, apiBase);
body.PrependChild(HtmlNode.CreateNode($"<style>{styleContent}</style>"));
}
}
var styleNodes = doc.DocumentNode.SelectNodes("/html/head/link");
if (styleNodes != null)
{
foreach (var styleLinks in styleNodes)
{
var key = BookService.CleanContentKeys(styleLinks.Attributes["href"].Value);
var styleContent = await _bookService.ScopeStyles(await book.Content.Css[key].ReadContentAsync(), apiBase);
body.PrependChild(HtmlNode.CreateNode($"<style>{styleContent}</style>"));
}
}
var anchors = doc.DocumentNode.SelectNodes("//a");
if (anchors != null)
{
foreach (var anchor in anchors)
{
BookService.UpdateLinks(anchor, mappings, page);
}
}
var images = doc.DocumentNode.SelectNodes("//img");
if (images != null)
{
foreach (var image in images)
{
if (image.Name != "img") continue;
// Need to do for xlink:href
if (image.Attributes["src"] != null)
{
var imageFile = image.Attributes["src"].Value;
image.Attributes.Remove("src");
image.Attributes.Add("src", $"{apiBase}" + imageFile);
}
}
}
images = doc.DocumentNode.SelectNodes("//image");
if (images != null)
{
foreach (var image in images)
{
if (image.Name != "image") continue;
if (image.Attributes["xlink:href"] != null)
{
var imageFile = image.Attributes["xlink:href"].Value;
image.Attributes.Remove("xlink:href");
image.Attributes.Add("xlink:href", $"{apiBase}" + imageFile);
}
}
}
return Ok(body.InnerHtml);
}
counter++;
}
return BadRequest("Could not find the appropriate html for that page");
}
}
}