1.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 1.2 +++ b/gfx/cairo/libpixman/src/refactor Wed Dec 31 06:09:35 2014 +0100 1.3 @@ -0,0 +1,478 @@ 1.4 +Roadmap 1.5 + 1.6 +- Move all the fetchers etc. into pixman-image to make pixman-compose.c 1.7 + less intimidating. 1.8 + 1.9 + DONE 1.10 + 1.11 +- Make combiners for unified alpha take a mask argument. That way 1.12 + we won't need two separate paths for unified vs component in the 1.13 + general compositing code. 1.14 + 1.15 + DONE, except that the Altivec code needs to be updated. Luca is 1.16 + looking into that. 1.17 + 1.18 +- Delete separate 'unified alpha' path 1.19 + 1.20 + DONE 1.21 + 1.22 +- Split images into their own files 1.23 + 1.24 + DONE 1.25 + 1.26 +- Split the gradient walker code out into its own file 1.27 + 1.28 + DONE 1.29 + 1.30 +- Add scanline getters per image 1.31 + 1.32 + DONE 1.33 + 1.34 +- Generic 64 bit fetcher 1.35 + 1.36 + DONE 1.37 + 1.38 +- Split fast path tables into their respective architecture dependent 1.39 + files. 1.40 + 1.41 +See "Render Algorithm" below for rationale 1.42 + 1.43 +Images will eventually have these virtual functions: 1.44 + 1.45 + get_scanline() 1.46 + get_scanline_wide() 1.47 + get_pixel() 1.48 + get_pixel_wide() 1.49 + get_untransformed_pixel() 1.50 + get_untransformed_pixel_wide() 1.51 + get_unfiltered_pixel() 1.52 + get_unfiltered_pixel_wide() 1.53 + 1.54 + store_scanline() 1.55 + store_scanline_wide() 1.56 + 1.57 +1. 1.58 + 1.59 +Initially we will just have get_scanline() and get_scanline_wide(); 1.60 +these will be based on the ones in pixman-compose. Hopefully this will 1.61 +reduce the complexity in pixman_composite_rect_general(). 1.62 + 1.63 +Note that there is access considerations - the compose function is 1.64 +being compiled twice. 1.65 + 1.66 + 1.67 +2. 1.68 + 1.69 +Split image types into their own source files. Export noop virtual 1.70 +reinit() call. Call this whenever a property of the image changes. 1.71 + 1.72 + 1.73 +3. 1.74 + 1.75 +Split the get_scanline() call into smaller functions that are 1.76 +initialized by the reinit() call. 1.77 + 1.78 +The Render Algorithm: 1.79 + (first repeat, then filter, then transform, then clip) 1.80 + 1.81 +Starting from a destination pixel (x, y), do 1.82 + 1.83 + 1 x = x - xDst + xSrc 1.84 + y = y - yDst + ySrc 1.85 + 1.86 + 2 reject pixel that is outside the clip 1.87 + 1.88 + This treats clipping as something that happens after 1.89 + transformation, which I think is correct for client clips. For 1.90 + hierarchy clips it is wrong, but who really cares? Without 1.91 + GraphicsExposes hierarchy clips are basically irrelevant. Yes, 1.92 + you could imagine cases where the pixels of a subwindow of a 1.93 + redirected, transformed window should be treated as 1.94 + transparent. I don't really care 1.95 + 1.96 + Basically, I think the render spec should say that pixels that 1.97 + are unavailable due to the hierarcy have undefined content, 1.98 + and that GraphicsExposes are not generated. Ie., basically 1.99 + that using non-redirected windows as sources is fail. This is 1.100 + at least consistent with the current implementation and we can 1.101 + update the spec later if someone makes it work. 1.102 + 1.103 + The implication for render is that it should stop passing the 1.104 + hierarchy clip to pixman. In pixman, if a souce image has a 1.105 + clip it should be used in computing the composite region and 1.106 + nowhere else, regardless of what "has_client_clip" says. The 1.107 + default should be for there to not be any clip. 1.108 + 1.109 + I would really like to get rid of the client clip as well for 1.110 + source images, but unfortunately there is at least one 1.111 + application in the wild that uses them. 1.112 + 1.113 + 3 Transform pixel: (x, y) = T(x, y) 1.114 + 1.115 + 4 Call p = GetUntransformedPixel (x, y) 1.116 + 1.117 + 5 If the image has an alpha map, then 1.118 + 1.119 + Call GetUntransformedPixel (x, y) on the alpha map 1.120 + 1.121 + add resulting alpha channel to p 1.122 + 1.123 + return p 1.124 + 1.125 + Where GetUnTransformedPixel is: 1.126 + 1.127 + 6 switch (filter) 1.128 + { 1.129 + case NEAREST: 1.130 + return GetUnfilteredPixel (x, y); 1.131 + break; 1.132 + 1.133 + case BILINEAR: 1.134 + return GetUnfilteredPixel (...) // 4 times 1.135 + break; 1.136 + 1.137 + case CONVOLUTION: 1.138 + return GetUnfilteredPixel (...) // as many times as necessary. 1.139 + break; 1.140 + } 1.141 + 1.142 + Where GetUnfilteredPixel (x, y) is 1.143 + 1.144 + 7 switch (repeat) 1.145 + { 1.146 + case REPEAT_NORMAL: 1.147 + case REPEAT_PAD: 1.148 + case REPEAT_REFLECT: 1.149 + // adjust x, y as appropriate 1.150 + break; 1.151 + 1.152 + case REPEAT_NONE: 1.153 + if (x, y) is outside image bounds 1.154 + return 0; 1.155 + break; 1.156 + } 1.157 + 1.158 + return GetRawPixel(x, y) 1.159 + 1.160 + Where GetRawPixel (x, y) is 1.161 + 1.162 + 8 Compute the pixel in question, depending on image type. 1.163 + 1.164 +For gradients, repeat has a totally different meaning, so 1.165 +UnfilteredPixel() and RawPixel() must be the same function so that 1.166 +gradients can do their own repeat algorithm. 1.167 + 1.168 +So, the GetRawPixel 1.169 + 1.170 + for bits must deal with repeats 1.171 + for gradients must deal with repeats (differently) 1.172 + for solids, should ignore repeats. 1.173 + 1.174 + for polygons, when we add them, either ignore repeats or do 1.175 + something similar to bits (in which case, we may want an extra 1.176 + layer of indirection to modify the coordinates). 1.177 + 1.178 +It is then possible to build things like "get scanline" or "get tile" on 1.179 +top of this. In the simplest case, just repeatedly calling GetPixel() 1.180 +would work, but specialized get_scanline()s or get_tile()s could be 1.181 +plugged in for common cases. 1.182 + 1.183 +By not plugging anything in for images with access functions, we only 1.184 +have to compile the pixel functions twice, not the scanline functions. 1.185 + 1.186 +And we can get rid of fetchers for the bizarre formats that no one 1.187 +uses. Such as b2g3r3 etc. r1g2b1? Seriously? It is also worth 1.188 +considering a generic format based pixel fetcher for these edge cases. 1.189 + 1.190 +Since the actual routines depend on the image attributes, the images 1.191 +must be notified when those change and update their function pointers 1.192 +appropriately. So there should probably be a virtual function called 1.193 +(* reinit) or something like that. 1.194 + 1.195 +There will also be wide fetchers for both pixels and lines. The line 1.196 +fetcher will just call the wide pixel fetcher. The wide pixel fetcher 1.197 +will just call expand, except for 10 bit formats. 1.198 + 1.199 +Rendering pipeline: 1.200 + 1.201 +Drawable: 1.202 + 0. if (picture has alpha map) 1.203 + 0.1. Position alpha map according to the alpha_x/alpha_y 1.204 + 0.2. Where the two drawables intersect, the alpha channel 1.205 + Replace the alpha channel of source with the one 1.206 + from the alpha map. Replacement only takes place 1.207 + in the intersection of the two drawables' geometries. 1.208 + 1. Repeat the drawable according to the repeat attribute 1.209 + 2. Reconstruct a continuous image according to the filter 1.210 + 3. Transform according to the transform attribute 1.211 + 4. Position image such that src_x, src_y is over dst_x, dst_y 1.212 + 5. Sample once per destination pixel 1.213 + 6. Clip. If a pixel is not within the source clip, then no 1.214 + compositing takes place at that pixel. (Ie., it's *not* 1.215 + treated as 0). 1.216 + 1.217 + Sampling a drawable: 1.218 + 1.219 + - If the channel does not have an alpha channel, the pixels in it 1.220 + are treated as opaque. 1.221 + 1.222 + Note on reconstruction: 1.223 + 1.224 + - The top left pixel has coordinates (0.5, 0.5) and pixels are 1.225 + spaced 1 apart. 1.226 + 1.227 +Gradient: 1.228 + 1. Unless gradient type is conical, repeat the underlying (0, 1) 1.229 + gradient according to the repeat attribute 1.230 + 2. Integrate the gradient across the plane according to type. 1.231 + 3. Transform according to transform attribute 1.232 + 4. Position gradient 1.233 + 5. Sample once per destination pixel. 1.234 + 6. Clip 1.235 + 1.236 +Solid Fill: 1.237 + 1. Repeat has no effect 1.238 + 2. Image is already continuous and defined for the entire plane 1.239 + 3. Transform has no effect 1.240 + 4. Positioning has no effect 1.241 + 5. Sample once per destination pixel. 1.242 + 6. Clip 1.243 + 1.244 +Polygon: 1.245 + 1. Repeat has no effect 1.246 + 2. Image is already continuous and defined on the whole plane 1.247 + 3. Transform according to transform attribute 1.248 + 4. Position image 1.249 + 5. Supersample 15x17 per destination pixel. 1.250 + 6. Clip 1.251 + 1.252 +Possibly interesting additions: 1.253 + - More general transformations, such as warping, or general 1.254 + shading. 1.255 + 1.256 + - Shader image where a function is called to generate the 1.257 + pixel (ie., uploading assembly code). 1.258 + 1.259 + - Resampling kernels 1.260 + 1.261 + In principle the polygon image uses a 15x17 box filter for 1.262 + resampling. If we allow general resampling filters, then we 1.263 + get all the various antialiasing types for free. 1.264 + 1.265 + Bilinear downsampling looks terrible and could be much 1.266 + improved by a resampling filter. NEAREST reconstruction 1.267 + combined with a box resampling filter is what GdkPixbuf 1.268 + does, I believe. 1.269 + 1.270 + Useful for high frequency gradients as well. 1.271 + 1.272 + (Note that the difference between a reconstruction and a 1.273 + resampling filter is mainly where in the pipeline they 1.274 + occur. High quality resampling should use a correctly 1.275 + oriented kernel so it should happen after transformation. 1.276 + 1.277 + An implementation can transform the resampling kernel and 1.278 + convolve it with the reconstruction if it so desires, but it 1.279 + will need to deal with the fact that the resampling kernel 1.280 + will not necessarily be pixel aligned. 1.281 + 1.282 + "Output kernels" 1.283 + 1.284 + One could imagine doing the resampling after compositing, 1.285 + ie., for each destination pixel sample each source image 16 1.286 + times, then composite those subpixels individually, then 1.287 + finally apply a kernel. 1.288 + 1.289 + However, this is effectively the same as full screen 1.290 + antialiasing, which is a simpler way to think about it. So 1.291 + resampling kernels may make sense for individual images, but 1.292 + not as a post-compositing step. 1.293 + 1.294 + Fullscreen AA is inefficient without chained compositing 1.295 + though. Consider an (image scaled up to oversample size IN 1.296 + some polygon) scaled down to screen size. With the current 1.297 + implementation, there will be a huge temporary. With chained 1.298 + compositing, the whole thing ends up being equivalent to the 1.299 + output kernel from above. 1.300 + 1.301 + - Color space conversion 1.302 + 1.303 + The complete model here is that each surface has a color 1.304 + space associated with it and that the compositing operation 1.305 + also has one associated with it. Note also that gradients 1.306 + should have associcated colorspaces. 1.307 + 1.308 + - Dithering 1.309 + 1.310 + If people dither something that is already dithered, it will 1.311 + look terrible, but don't do that, then. (Dithering happens 1.312 + after resampling if at all - what is the relationship 1.313 + with color spaces? Presumably dithering should happen in linear 1.314 + intensity space). 1.315 + 1.316 + - Floating point surfaces, 16, 32 and possibly 64 bit per 1.317 + channel. 1.318 + 1.319 + Maybe crack: 1.320 + 1.321 + - Glyph polygons 1.322 + 1.323 + If glyphs could be given as polygons, they could be 1.324 + positioned and rasterized more accurately. The glyph 1.325 + structure would need subpixel positioning though. 1.326 + 1.327 + - Luminance vs. coverage for the alpha channel 1.328 + 1.329 + Whether the alpha channel should be interpreted as luminance 1.330 + modulation or as coverage (intensity modulation). This is a 1.331 + bit of a departure from the rendering model though. It could 1.332 + also be considered whether it should be possible to have 1.333 + both channels in the same drawable. 1.334 + 1.335 + - Alternative for component alpha 1.336 + 1.337 + - Set component-alpha on the output image. 1.338 + 1.339 + - This means each of the components are sampled 1.340 + independently and composited in the corresponding 1.341 + channel only. 1.342 + 1.343 + - Have 3 x oversampled mask 1.344 + 1.345 + - Scale it down by 3 horizontally, with [ 1/3, 1/3, 1/3 ] 1.346 + resampling filter. 1.347 + 1.348 + Is this equivalent to just using a component alpha mask? 1.349 + 1.350 + Incompatible changes: 1.351 + 1.352 + - Gradients could be specified with premultiplied colors. (You 1.353 + can use a mask to get things like gradients from solid red to 1.354 + transparent red. 1.355 + 1.356 +Refactoring pixman 1.357 + 1.358 +The pixman code is not particularly nice to put it mildly. Among the 1.359 +issues are 1.360 + 1.361 +- inconsistent naming style (fb vs Fb, camelCase vs 1.362 + underscore_naming). Sometimes there is even inconsistency *within* 1.363 + one name. 1.364 + 1.365 + fetchProc32 ACCESS(pixman_fetchProcForPicture32) 1.366 + 1.367 + may be one of the uglies names ever created. 1.368 + 1.369 + coding style: 1.370 + use the one from cairo except that pixman uses this brace style: 1.371 + 1.372 + while (blah) 1.373 + { 1.374 + } 1.375 + 1.376 + Format do while like this: 1.377 + 1.378 + do 1.379 + { 1.380 + 1.381 + } 1.382 + while (...); 1.383 + 1.384 +- PIXMAN_COMPOSITE_RECT_GENERAL() is horribly complex 1.385 + 1.386 +- switch case logic in pixman-access.c 1.387 + 1.388 + Instead it would be better to just store function pointers in the 1.389 + image objects themselves, 1.390 + 1.391 + get_pixel() 1.392 + get_scanline() 1.393 + 1.394 +- Much of the scanline fetching code is for formats that no one 1.395 + ever uses. a2r2g2b2 anyone? 1.396 + 1.397 + It would probably be worthwhile having a generic fetcher for any 1.398 + pixman format whatsoever. 1.399 + 1.400 +- Code related to particular image types should be split into individual 1.401 + files. 1.402 + 1.403 + pixman-bits-image.c 1.404 + pixman-linear-gradient-image.c 1.405 + pixman-radial-gradient-image.c 1.406 + pixman-solid-image.c 1.407 + 1.408 +- Fast path code should be split into files based on architecture: 1.409 + 1.410 + pixman-mmx-fastpath.c 1.411 + pixman-sse2-fastpath.c 1.412 + pixman-c-fastpath.c 1.413 + 1.414 + etc. 1.415 + 1.416 + Each of these files should then export a fastpath table, which would 1.417 + be declared in pixman-private.h. This should allow us to get rid 1.418 + of the pixman-mmx.h files. 1.419 + 1.420 + The fast path table should describe each fast path. Ie there should 1.421 + be bitfields indicating what things the fast path can handle, rather than 1.422 + like now where it is only allowed to take one format per src/mask/dest. Ie., 1.423 + 1.424 + { 1.425 + FAST_a8r8g8b8 | FAST_x8r8g8b8, 1.426 + FAST_null, 1.427 + FAST_x8r8g8b8, 1.428 + FAST_repeat_normal | FAST_repeat_none, 1.429 + the_fast_path 1.430 + } 1.431 + 1.432 +There should then be *one* file that implements pixman_image_composite(). 1.433 +This should do this: 1.434 + 1.435 + optimize_operator(); 1.436 + 1.437 + convert 1x1 repeat to solid (actually this should be done at 1.438 + image creation time). 1.439 + 1.440 + is there a useful fastpath? 1.441 + 1.442 +There should be a file called pixman-cpu.c that contains all the 1.443 +architecture specific stuff to detect what CPU features we have. 1.444 + 1.445 +Issues that must be kept in mind: 1.446 + 1.447 + - we need accessor code to be preserved 1.448 + 1.449 + - maybe there should be a "store_scanline" too? 1.450 + 1.451 + Is this sufficient? 1.452 + 1.453 + We should preserve the optimization where the 1.454 + compositing happens directly in the destination 1.455 + whenever possible. 1.456 + 1.457 + - It should be possible to create GPU samplers from the 1.458 + images. 1.459 + 1.460 +The "horizontal" classification should be a bit in the image, the 1.461 +"vertical" classification should just happen inside the gradient 1.462 +file. Note though that 1.463 + 1.464 + (a) these will change if the tranformation/repeat changes. 1.465 + 1.466 + (b) at the moment the optimization for linear gradients 1.467 + takes the source rectangle into account. Presumably 1.468 + this is to also optimize the case where the gradient 1.469 + is close enough to horizontal? 1.470 + 1.471 +Who is responsible for repeats? In principle it should be the scanline 1.472 +fetch. Right now NORMAL repeats are handled by walk_composite_region() 1.473 +while other repeats are handled by the scanline code. 1.474 + 1.475 + 1.476 +(Random note on filtering: do you filter before or after 1.477 +transformation? Hardware is going to filter after transformation; 1.478 +this is also what pixman does currently). It's not completely clear 1.479 +what filtering *after* transformation means. One thing that might look 1.480 +good would be to do *supersampling*, ie., compute multiple subpixels 1.481 +per destination pixel, then average them together.